228 points by lpmay
15 days ago
Whenever sites like this pop up, the really valuable thing to me is finding a book that's a synthesis of two topics. I can find many specific books with code examples here for things like financial maths, portfolio management, etc.
Seeing Data-Centric Introduction to Computing (https://dcic-world.org)
It's written by Shriram Krishnamurthi, who is a really interesting researcher who cares quite a lot about CS education.
He also wrote a great intro to generating finite automata using Scheme macros. It carries you through step by step and motivates the work well.
If this is just a link to a bunch of PDFs, then presumably PDF malware is an issue here. I would have thought computer books would be a fine vector to choose if you were trying to compromise websites and business systems.
Use a in-browser PDF renderer, such as mozilla/pdf.js which Firefox uses, and the true and tested browser sandbox will most likely be very helpful.
Obviously, not helpful against three-letter agencies who like to sit on browser zero-days until they need to use them, but against most threats you'll be fine.
PDF readers are a far more diverse bunch than browsers. It's much less likely someone has a simultaneous break on Adobe Reader/Foxit/okular/evince/xpdf/mupdf/etc. than on merely two browser engines (Chrome and Firefox).
Whenever I download untrusted pdfs I always upload them to virustotal. Does that help at all and if so, is it sufficient enough?
It'll be sufficient for known malware. But if malware hasn't been identified yet, VirusTotal won't pick it up.
Is it possible that you can download malware and VirusTotal not pick it up? there's a small chance. But, in my estimation, no one is really going to burn a novel strain of malware on free ebooks. It's not targeted and that site isn't a good watering hole to deploy novel malware.
Novel strains of malware are usually reserved for specific targets. Unless you're a high profile target of an authoritarian government or known to have a high networth, I don't really think you're going to get hit with the novel stuff.
TL/DR: VirusTotal will probably be enough for the average user. But no one can guarantee safety when you're downloading random files on the internet.
Is pdf dangerous? I guess on linux canonical distributes latest, free-of-known-vulnerables evince, what about windows? One must upload every file?
There has been a case of confirmed code execution vulnerability in Evince in the past, in 2017:
As well as possible code execution vulnerabilities in Evince, in 2019 and 2011:
They have also had a command injection vulnerability, in 2017:
These and other reported security vulnerabilities for Evince are listed here:
I think in the case of all software it is safest to assume that opening a file that you downloaded from the internet has the potential to do harm, regardless of whether you are using Linux, macOS, Windows, or some other operating system, and regardless of what software you use to read the file.
The best mitigation would be to keep a separate device that you use purely for unauthenticated internet browsing and opening files from the Internet. Never accessing any personal data on that device. In reality almost all of us will use the same devices for our personal files and data, and for browsing the internet and opening random files that we downloaded.
It is interesting to note that the statistics for known security vulnerabilities in Evince..
..pales in comparison to the statistics for known security vulnerabilities in Adobe Acrobat Reader:
I wonder if it indicates that Evince is so much more secure than Acrobat Reader.. Or is it simply the case that Evince has not been subject to the level of scrutiny that Acrobat Reader has been? And if so, there might be more unknown security vulnerabilities lurking under the surface of Evince than in Acrobat Reader.
Check out DangerZone. It encodes a .pdf (and other formats) to image data then converts it back to .pdf, optionally preserving OCR'ed text, so that any potential executable code hidden within is lost. For further security, all operations run sandboxed.
Thank you. Saved this comment.
One possibility is that Acrobat Reader is more forgiving of poorly-formed PDFs, as I’ve generally heard is the case, and that by allowing documents that don’t meet the (huge, probably also poorly-formed) standard they open themselves up to more security risks.
See also: https://en.wikibooks.org/wiki/Subject:Computer_science
Is there any comparable resource that also (explicitly) includes copyrighted material? I believe a curated list containing the "cream of the crop" of a specific domain would be an excellent resource for anyone starting out.
You might want the O'Reilly library subscription. It's $50 per month, and all the books are high quality.
> In other words, this site DOES NOT contain any actual content of the books or lecture notes listed in this site.
> Therefore, this website is as legal as search results of any search engines like Google, Yahoo, etc.
It's worth pointing out that this site may link to copyrighted material.
Worth mentioning that usually you don't end up in legal troubles for consuming copyrighted material but for reproducing/sharing copyrighted material, and often not until you do so at scale.
"Usually don't" and "often not" are comforting words but regardless there are always risks involved when committing illegal acts.
Is there actually? In Germany I have heard of people getting served with fines for downloading torrents, as sharing a torrent identifies you through your IP. I've never heard of anyone getting into trouble for streaming or downloading a file, and I'm not sure how it could realistically be detected.
It's not about detection, it's about legality.
Bittorrent is legally clear-cut, because the sharing part is a clear copyright infringement (you are distributing the work without authorization).
Streaming (typically) isn't that clear, because consuming the work is technically not forbidden by copyright law in most countries. It's a bit like reading a book in a bookshop without buying it.
Detection is fairly easy in both cases, they just cannot go after consumers.
> It's not about detection, it's about legality.
I suppose I'm wondering about the realistic ability to detect and ability to prosecute. I'm not really concerned with legality for legality's sake.
The legality gives people power to knock on the doors of ISPs, who can (and often do) know everything. Even without that, authorities can (and occasionally do) run honeypots after they find and disband the main site operators.
There is no technical solution to the anti-piracy war, it's entirely a legal problem.
There is growing legal pressure to not consider an IP address used as evidence that the "owner" of said IP address is the one doing the activity.
For example, my IP address is paid for by me, through a run-of-the-mill ISP subscription. Does that make me legally liable for all the activity of the other person that lives with me and uses "my" network for all their private internet traffic?
I guess there are laws about facilitating piracy, and whatnot, but you can't reasonably expect me to screen all my fiance's activity on the network. Most of it is encrypted anyway. I can't be on the hook for that.
I'm privileged in that I have an ISP that feels the same way as I do about this. They've fought for the privacy of their subscribers before, and will likely keep doing so in the future, because an IP address does not identify any individual.
The idea of respondeat superior (vicarious liability) has been around a really long time. That the legal system would try to apply the concepts to the internet is not really unexpected.
I don't think you should be held responsible for the actions of the other people on your network if they can be held responsible.
What do you propose should happen if your network is in fact used to facilitate criminal or tortious activity?
A couple of years ago the European court of justice actually decided that streaming itself is already illegal. That is, if it's reasonable to assume that you knew it was an illegal stream. I haven't heard about a case of it ever being prosecuted though.
I'm not sure I should point this out, but there are organisations (Microsoft, Apple, maybe Google) that have obvious access to this information. It could be subpoenaed from them.
I used those qualifier words as HN tends to be pretty pedantic, and I happen to know it is illegal in some countries, and pretty sure someone from one of those countries would pass by and be like "Actually, in X it's illegal no matter what".
People “usually don’t” get arrested for possession of a small amount of cocaine; but I sure wouldn’t want to be one of those obscure cases. I agree ‘usually’ is almost never good enough when personal risk is involved.
> often not until you do so at scale
That’s the funny thing about downloading copyrighted works with BitTorrent in Germany, it automatically counts as "at scale" for the courts because you’ll have many peers that all download a bit from you.
>This site neither supports copyright infringement, nor links to web sites that trade copyrighted material. If you find any questionable links on this web site, please contact the webmaster (see the contact info at the bottom).
There's also freetechbooks.com , which is a database of links to actually Freely available / Open Access books , which includes the license information.
What a cool website. Quite surreal to see fresh content presented in a mid-2000s style!
See this for 24TB of books: http://pilimi.org/zlib-downloads.html
libgen is currently 32TB and need some seeding help
Libgen is great too. I wonder if the 32TB includes most of what is in the 24TB though
I prefer libgen for free computer books :)
I checked the OP link. Clicking on any category pops up an ad that needs to be explicitly "closed". It happens again and again as you navigate through the site. Annoying like hell.
1. You browse the web without an adblocker in $currentYear?
2. I turned off my adblocker and checked, it's not fair to call it a "pop-up", it's more like it slides up from the bottom of the screen whenever you load a new page.
Content littered with ads can not be considered free.
The difference from traditionally paid content is that customers pay with their attention and virtual purchasing power instead of spending money directly.
Nonetheless ads obviously allow access for all.