nneonneo 2 years ago

Circa 2009 or so, my absolute favorite app for the iPod Touch was Patrick Collison's Offline Wikipedia (yes, that Patrick Collison: https://web.archive.org/web/20100419194443/http://collison.i...). You could download various wikis that had been pre-processed to fit in a very small space - as I recall, the entire English Wikipedia was a mere 2 GB in size. It was simply magical that I could have access to all of Wikipedia anytime, anywhere offline - especially since the iPod Touch could only connect to the Internet via WiFi. It was particularly useful while travelling, since I could load up articles and just read them on the plane.

As I recall, there were several clever things that the app did to reduce the size of the dump; many stub/redirect articles were removed, the formatting was pared down to the bare minimum, and it was all compressed quite efficiently to fit in such a small space. Patrick gives more technical detail on an earlier version of the app's homepage: https://web.archive.org/web/20080523222440/http://collison.i...

  • NegativeLatency 2 years ago

    Yeah that was awesome, so useful/fun being able to go deep on a wikipedia reading session when there wasn't good mobile coverage and data was expensive.

    In retrospect I do kinda miss _not_ having cell reception on vacations, as it was easier to disconnect from stuff.

    • airstrike 2 years ago

      > In retrospect I do kinda miss _not_ having cell reception on vacations, as it was easier to disconnect from stuff.

      In retrospect I do miss _not_ having the internet or cell service at all. It's part of why I like to watch shows and movies from the 80s or 90s. It's funny because I love technology as much as anyone else on HN, but at the same time my idea of the perfect retired life is one that is almost entirely offline.

      • jkepler 2 years ago

        I don't know if its still the case, but about ten years ago I took ViaRail Canada from Vancouver to Toronto. It was a four-day train ride (with a brief stopover on Bamff) and the vast majority of it was remote Canada crossing the Rockies, with no mobile network coverage.

        I asked the train staff about WiFi onboard, and they said that they didn't have it and the preferred it that way. People take their train, not to get from A to B, but to disconnect and meet other people, read books, or watch the beautiful landscape going by. If people had Internet access, they'd be glued to their devices and wouldn't meet fellow travelers, and that's the magic of this line.

        (They even said that their corporate management wanted WiFi in the trains so that the staff could digitize a lot of their paperwork, but the crew was resisting it because they believed it was exactly the lack of connectivity that keep people taking the train for vacation.)

      • NegativeLatency 2 years ago

        The tea ceremony of: sit at computer desk, boot up computer, dial up the internet connection etc was pretty fun. Also the relative novelty.

        (As I check HN from the bathroom, which would’ve been a childhood dream)

        • zeristor 2 years ago

          Yea ceremony, nice way of terming it. Playing a vinyl LP was a more intricate tea ceremony.

      • prox 2 years ago

        I usually ‘lie’ to everyone that where I am going I will have no connection. Even if I go to London City or some urban hub :)

    • Denvercoder9 2 years ago

      > In retrospect I do kinda miss _not_ having cell reception on vacations, as it was easier to disconnect from stuff.

      What's preventing you from not taking a phone with you on vacation?

      • Aardwolf 2 years ago

        The expectations of other people, society, and, more and more, infrastructure (taking public transport, ...)

        • devilbunny 2 years ago

          I was in Park City, Utah, not during ski season, just looking around, and we decided to stop and walk around. Parking meters theoretically took coins or cards, but all were nonfunctional. The only way to pay for parking was by installing their app. Promptly uninstalled it when I left, but still… with no smartphone, you literally could not legally park on the street.

  • thom 2 years ago

    I have an app called Kiwix that does this now. I also like Organic Maps for offline maps.

    • jimmySixDOF 2 years ago

      I worked on a slide deck we were proposing SD Cards & Thumb drives preloaded with Kiwix and an almost full almost localized version of Khanh Academy (KA Lite at the time) which could be mass distributed in active conflict zones where the schools and education were some of the first casualties. I don't think it ever went too far which was a shame compared to how other monies were spent at the time these could have really made a difference.

    • mat_epice 2 years ago

      Yeah, Kiwix is great! I have that on my phone to browse when I'm on airplanes.

    • FuriouslyAdrift 2 years ago

      HERE WeGo also has offline maps (although you have to download them prior to being offline).

      Useful for when I have been traveling and need GPS nav and there isn't cell service.

  • mattl 2 years ago

    I got a bunch of super cheap WikiReader devices and those were great for offline Wikipedia. I think I paid less than $5 per device.

    • CompuHacker 2 years ago

      Check those prices now. And you know about the test utilities included with them? Incredible devices.

      • mattl 2 years ago

        Bunch of neat Forth stuff in the firmware? Sadly I gave them away, hopefully not sitting in a drawer somewhere

  • nullwarp 2 years ago

    Not directly related but way back in the day I had a Handspring Visor and would use something (i can't remember now what it was called) to download websites to it when I would sync it and it would be available on it.

    At the time i was super into fan stories about Ultima Online like PK Ghost and things and would download that and we would all pass the Handspring around to read them.

    Dang this actually brought up some good memories for me thinking about.

    • dbarlett 2 years ago

      Was it AvantGo?

      • thakoppno 2 years ago

        pocketpc with avantgo i remember that era.

        in fact i invested in avantgo right before the crash. i like to think it was the $500 or so i invested that truly precipitated the original year 2000 dot com bubble crash.

        • karpour 2 years ago

          I fondly remember AvantGo! Surprisingly many sites supported it, it was really handy.

    • amatecha 2 years ago

      that's awesome! I remember totally wanting a Handspring Visor (and soo badly wanting to play UO)! Jealous, hahah :)

  • askvictor 2 years ago

    Had a similar thing on my iRiver H340, running Rockbox, device had no network at all. Those were the days.

armagon 2 years ago

FYI, the internet archive hosts a ZIM archive that has dumps of wikipedia and many other works. https://archive.org/details/zimarchive

I wish it was a little more obvious how to search it, or what all the variations mean, but it looks like a valuable resource.

It is worth noting that Kiwix works on multiple OSes and on phones and has a wifi hostspot version (that you might run on an raspberry pi, for example). Internet-in-a-box similarly works as a wifi hostspot for ZOM archives.

Lastly, it is worth mentioning that there are tools for creating your own ZIM files; it looks like the most straightforward way is to take a static website and use a utility to convert it into one self-contained file.

  • Abishek_Muthian 2 years ago

    Thanks for sharing, Can you explain a bit more about creating our own ZIM files (or) for archived websites on Internet Archive?

    I'm looking for a way to archive all the websites from my browser bookmarks and then download them for offline use.

    • _a9 2 years ago

      Not related to the OP topic or zim but I was looking into archiving my bookmarks and other content like documentation sites and wikis. I'll list some of the things I ended up using.

      ArchiveBox[1]: Pretty much a self-hosted wayback machine. It can save websites as plain html, screenshot, text, and some other formats. I have my bookmarks archived in it and have a bookmarklet to easily add new websites to it. If you use the docker-compose you can enable a full-text search backend for an easy search setup.

      WebRecorder[2]: A browser extension that creates WACZ archives directly in the browser capturing exactly what content you load. I use it on sites with annoying dynamic content that sites like wayback and ArchiveBox wouldn't be able to copy.

      ReplayWeb[3]: An interface to browse archive types like WARC, WACZ, and HAR. The interface is just like browsing through your browser. It can be self-hosted as well for the full offline experience.

      browsertrix-crawler[4]: A CLI tool to scrape websites and output to WACZ. Its super easy to run with Docker and I use it to scrape entire blogs and docs for offline use. It uses Chrome to load webpages and has some extra features like custom browser profiles, interactive login, and autoscroll/autoplay. I use the `--generateWACZ` parameter so I can use ReplayWeb to easily browse through the final output.

      For bookmark and misc webpage archiving then ArchiveBox should be more than enough. Check out this repo for an amazing list of tools and resources https://github.com/iipc/awesome-web-archiving

      [1] https://github.com/ArchiveBox/ArchiveBox [2] https://webrecorder.net [3] https://replayweb.page [4] https://github.com/webrecorder/browsertrix-crawler

      • Abishek_Muthian 2 years ago

        Excellent! Thank you for the detailed answer.

        I'm going to explore all the solutions and start building my setup soon.

bombcar 2 years ago

Kiwix is great - I have a collection of various things from their library https://library.kiwix.org/?lang=eng downloaded for when I'm on a plane or the internet is otherwise unavailable.

That and the TeXlive PDF manuals can get me through anything.

  • daneel_w 2 years ago

    I second Kiwix. I found out about it not too long ago on the topic of portable Wikipedia readers. It really stands out as the best software part of such a solution.

    • vmilner 2 years ago

      I used kiwix Wikipedia for a Polish friend in the Uk who couldn’t afford reliable internet access and was using public library computers. I found the English edition with images was too large for him, but the Polish edition was fine. Ideally I’d have liked a simple update system (Git like?) which he could have run at the library occasionally.

    • 23B1 2 years ago

      I third Kiwix. Immensely useful when I was deployed without internet.

      • kragen 2 years ago

        That sounds interesting, what was the context?

        • 23B1 2 years ago

          Hi sorry for the delayed reply, yes I was military and in the early days of Iraq/Afghanistan we didn't have much access to the internet so I brought it with me.

        • ivan_ah 2 years ago

          Library or school in a remote village. There are computers (usually old computers), there might even be a LAN of some sort, but no internet (or very slow internet).

          In those cases having local access to Wikipedia (and not necessarily just en; Kiwix has archives for all the languages) can be a great learning resource and reference.

          • kragen 2 years ago

            Do you know 23B1? If not, probably you posted a reply to the wrong comment.

  • WesternWind 2 years ago

    Yep, you can download StackOverflow for offline use too

  • lmm 2 years ago

    Does it actually work? I installed the app and tried to download wikipedia two or three times, each time it just failed. Eventually I gave up.

    • makeworld 2 years ago

      Personally I downloaded the larger files (>2GB) from a torrent file using my torrent manager. Much more reliable than over HTTP. You have checksums, it's resumeable, etc.

    • 77pt77 2 years ago

      Yes it does.

      I've downloaded the entirety of wikivoyage for example.

    • bombcar 2 years ago

      I downloaded the files directly from the library if I recall correctly.

  • zekrioca 2 years ago

    I wish one could create new articles in Kiwix’s zim files. Right now, Kiwix is basically a Wikipedia reader. Editing features would be very nice for local wikis to develop, and later on — maybe — to have such local article editions merged into the main Wikipedia, perhaps similar to how git works.

    • int_19h 2 years ago

      The .zim file format is heavily optimized for compactness and ease of serving. For starters, it doesn't even store the original MediaWiki markup, but rather pre-rendered basic HTML. Images only have the thumbnail version (the one that's shown inline when reading the article), there's no full-size to zoom in. And, of course, no edit history. Multiple articles then get bundled into clusters of ~1Mb each, and each cluster compressed using ZSTD.

      https://wiki.openzim.org/wiki/ZIM_file_format

      This all lets you squeeze English Wikipedia into 90 Gb. But it also makes it much more difficult to edit in-place, and, of course, no MediaWiki means that it cannot possibly work like git pull requests.

      • zekrioca 2 years ago

        I totally understand the reason why it is made for read-only consumption. However, we live in a moment where storage is significant cheaper, and so is processing. There could have a compromise, though I do not see any indication of such. SQLite could very well be used here.

ernst_mulder 2 years ago

Some time ago I dreamt that I was in an alien space ship for some reason. Still carrying my phone and laptop bag. They were a friendly lot and asked whether or not I would like to charge my laptop. Do you have 220V sockets I asked. They didn't know what that was. So I needed measurements and definitions. An approximate meter, an approximate second. Coulomb was difficult. I woke up and downloaded Wikipedia the next day. Deleted it again later for lack of harddisk space...

But next time this happens I will have an USB stick with all the necessary knowledge. The definitions for voltage, current and frequency should however be printed out in case my laptop battery charge is insufficient for accessing the USB stick.

  • zeristor 2 years ago

    Usually alien abductions thoughts revolve around an intrusive test to see if you have a chess cheat device inserted, or should that be upserted?

thakoppno 2 years ago

Somewhere around the original ipad era, I believe there was a curated subset of wikipedia articles that may have been called something like Educator’s Edition.

It worked offline and had images and I traveled to Peru with it and learned so much. Does anyone remember this sort of thing?

I’ve tried wix formatted copies and they do work but the experience on an offline ipad was simply better. Thanks in advance.

  • Rediscover 2 years ago

    Yes, I remember - I had a copy on an SD card on my OLPC.

    I believed it morphed into "Wikipedia for Schools" ^0 - possibly this ^1 is a comment about it?

    0: https://en.m.wikipedia.org/wiki/Wikipedia:Wikipedia_for_Scho...

    1: https://www.speedofcreativity.org/2008/11/11/wikipedia-to-go...

    • thehours 2 years ago

      Tangent - I’ve noticed a lot more comments like this using the “^0” syntax for citations vs the traditional “[0]” one I’ve become accustomed to seeing on HN. Is there a real shift happening here and, if so, why?

    • thakoppno 2 years ago

      thank you very much. that page brings me back. it even has technorati tags.

      by the way, do you still have an olpc? i never got to use one but remember seeing them. my one weird piece of similar era tech is a cr48, the early chromebook google gave away. I remember on the form for requesting them it asked what you would do with it. i responded “install linux on it” and they gave me one.

      • Rediscover 2 years ago

        Yes, I still have mine from the Give-One-Get-One program. It's still my favorite screen for sunny day use. It still works, I've been using a power supply from an X30 ThinkPad as I have no idea where the original went.

        My neighbor years ago used to always chuckle at me using it with an Happy Hacking Pro keyboard because of the price difference between the two.

      • ComputerGuru 2 years ago

        I said “develop/add olpc support to various bootloaders to help spur development, adoption, and utility” and they didn’t give me one.

r3trohack3r 2 years ago

Kiwix is an amazing project.

I used a similar approach for https://wikiscroll.blankenship.io

1. kiwix dump

2. unpack to HTML

3. process with cheerio to create json files

4. Create git repo and push to github pages

Works well for infinitely scrolling content, it's just Math.random on top of static files.

https://github.com/retrohacker/wikiscroll

  • Karrot_Kream 2 years ago

    What a cool project, thanks! Sounds like something I'd love to waste time on lol.

  • tsol 2 years ago

    This is really cool. Thanks for posting that

orliesaurus 2 years ago

Oh wow, I thought this was gonna be a REALLY large file, but only 95GB not bad, some worthless videogames are larger haha

  • aendruk 2 years ago

    Circa 2003 I carried around a pared down copy on a Pocket PC. Dropping a few chosen categories (who needs Sports?) allowed it to barely fit on a 1-GB SD card.

    • FeistySkink 2 years ago

      People going back in time need sports. An almonac of some kind.

      • II2II 2 years ago

        While handy, it would be a bit too conspicuous. At least one could claim that an almanac is a novelty print.

  • bscphil 2 years ago

    I was curious how they achieve this. It looks like the underlying file format uses LZMA, or optionally Zstd, compression. Both achieve pretty high compression ratios against plain text and markup.

    > Its file compression uses LZMA2, as implemented by the xz-utils library, and, more recently, Zstandard. The openZIM project is sponsored by Wikimedia CH, and supported by the Wikimedia Foundation.

    https://en.wikipedia.org/wiki/ZIM_(file_format)

    • kragen 2 years ago

      The more important thing is that they aggressively downsize the images and omit the history and talk pages. Even if they were using LZW it would probably only triple the filesize.

  • 988747 2 years ago

    BTW: what's the difference between 95.2 GB file and 45 GB one? There is no info on download page.

    • lavezza 2 years ago

      95.2 is the "maxi" file. 49.48 is the "nopic" file. 13.39 is the "mini".

      From https://www.kiwix.org/en/documentation/

      File size is always an issue when downloading such big content, so we always produce each Wikipedia file in three flavours:

      Mini: only the introduction of each article, plus the infobox. Saves about 95% of space vs. the full version. nopic: full articles, but no images. About 75% smaller than the full version Maxi: the default full version.

stewbrew 2 years ago

How can someone use so many words to say "use kiwix".

jabbany 2 years ago

I recall doing such an offline dump with Wikitaxi (https://www.yunqa.de/delphi/apps/wikitaxi/index) back when WP was getting banned in China a decade or so ago.

IIRC the articles were rather easy to download and convert even on my early 2000s netbook. The media (pictures, video, audio) though was painful to deal with, and it didn't take long to find out that Wikipedia without diagram s and figures was not a great experience.

  • kragen 2 years ago

    Kiwix's maxi-all Wikipedia zimfiles have pretty much all the pictures that are used in articles, but not the video and audio. And the pictures are too small; often you can't read the text in them.

jokoon 2 years ago

So can it remove things like movies and tv shows and other noise?

I remember there was some work done to categorize articles like with the Dewey system, but so far, you can't really reduce the size of those exports.

Of course it would require a lot of work. Maybe it's already possible to categorize articles of they belong to a "portal".

But yeah, it doesn't seem the Wikipedia foundation really care about those kind of problems. To be fair they lack money.

  • a3w 2 years ago

    Uno card: Reverse!

    Is TV Tropes available as a single file ZIM download?

sqrt_1 2 years ago

Article mentions to format to exFat as NTFS has a 4GB limit - I don't think that is true.

  • Wingman4l7 2 years ago

    It's not -- FAT32 is the one with the 4GB limit. NTFS has much less native support on Macs than exFAT, though.

    • ComputerGuru 2 years ago

      FAT32 can even be used with larger sizes if you just format with a larger cluster size. Since each bundle/shard is 1MB minimum that is not a problem here.

      • dmitrygr 2 years ago

        File size is still limited to 0xffffffff (the dir entry only has 32 bits to store it). Some broken implementations even treat it as signed, and files over 2GB become problematic

        • ComputerGuru 2 years ago

          Ah sorry, you're right. That allows FAT32 to be used for larger partition sizes, but the file size limit remains in place.

yieldcrv 2 years ago

protip: you need to download wikipedia in other languages as well

they are not translations, they are completely different articles under the name brand and platform of Wikipedia

an entry that may be just a blurb in English may be one of the most comprehensive and fully fleshed out and researched entries on the site in German, for example

pupppet 2 years ago

Can anyone recommend a hardy device for viewing the content? As nutty as it sounds, in some post-apocalyptic world it would sure be nice to have. I'd keep it under the bed just in case..

  • SahAssar 2 years ago

    If you follow the logic that anything is at about half its life that would probably be an older thinkpad laptop, like an x61 or x200. If you are willing to spend the money on something newer perhaps a thoughbook. I have a modded kobo ebook reader (I upgraded mine to 256GB storage and have project gutenberg, wikipedia and a few other things on it) with a good solar powerbank.

    • bscphil 2 years ago

      > If you follow the logic that anything is at about half its life

      I don't think that makes any sense. By that logic any currently working device should be assumed to last another $currentlifetime. My 20 year old car is not gonna last another 20 years. My 10 year old laptop won't last another 10. If my car somehow did last another 20 years, it would not then make sense to assume it would still be running in another 40.

      Makes more sense to look at all objects of the same class. If 75% of laptops are dead in 10 years and 95% are dead in 15, and your laptop is 10 years old, you can infer that 5 out of 25 surviving laptops will make it another 5 years, or 20%. (These numbers completely made up, just an example.)

      • ScottEvtuch 2 years ago

        I think the idea of "everything is about half its life" is to account for survivorship bias in longevity. The only units that make it to the 95th percentile lifetimes clearly got luckier with parts and can reasonably be expected to last longer.

        • sgerenser 2 years ago

          Reliability of most complicated devices (cars, electronics) is usually thought to follow a “bathtub curve.” Some early mortality due to defective parts or manufacturing defects, a long trough of reliability from say, 1-10 years, then a rapid rise in failures due to aging. “Everything at half life” is a pretty bad approximation of this.

          • titoCA321 2 years ago

            Not just electronics, go read the print quality of some of your paper receipts from three years ago and see if you can make heads or tails or where you purchased the item. Ever see photos from photo albums long ago?

        • bscphil 2 years ago

          Right, correcting for survivorship bias is very important. If an object lasts one year, its expected life isn't now $average_use_life - 1; that's too low an estimate.

          The problem with the "half life" rule is that it corrects for this in the dumbest possible way, not only providing an inaccurate estimate for most of the object's life, but even getting the first derivative wrong for most objects. Usually, lasting longer does not make the expected remaining years of service go up, but the rule implies it does!

          Take people for example. At birth, a woman in the United States has a life expectancy of 81. If she makes it to 60, she can now expect to make it to ... 85. Not a big change! Every year she lived (even her first), her remaining life expectancy went down, not up. See this chart I made comparing the life expectancy of people versus a theoretical "half-lifer": https://0x0.st/otZ_.png

    • MandieD 2 years ago

      What kind of mods did you make, aside from either inserting a 256 GB card, or swapping out the built-in storage? Which model?

      • SahAssar 2 years ago

        Swapping the built in storage, changing the init scripts to start the kiwix webserver, and installing some homebrew apps. It's a Kobo Clara HD.

  • int_19h 2 years ago

    It doesn't sound all that nutty given the world politics today. And pretty much any ruggedized Android device will do, so long as it has enough storage - best get something with an SD slot.

    You might want a device like that to have offline maps as well, especially as those are more likely to be immediately useful. The easiest way to get there is the OsmAnd app - like Kiwix, it does a number of tricks to compress things, so it's quite feasible to have a complete offline road and topographic map of US in your pocket.

    (Note that Google Play Store availability on the device is immaterial, since Kiwix and OsmAnd are also available as as downloadable .apk, and are also listed in F-Droid store.)

  • bombcar 2 years ago

    Honestly a generic PC would probably be best, because it may be a bit harder to find power, etc, but you will have infinite amounts of replacement parts.

  • c7b 2 years ago

    Have you looked at e-Ink readers?

barbs 2 years ago

Is there a portable version of Kiwix? Would be cool if you could plug the USB into any computer and start reading Wikipedia without having to install anything.

  • tehnicaorg 2 years ago

    Yes. You download a zip archive. Unpack from 121MB to 263MB, and start the exe. (assuming you're using Windows)

peter_d_sherman 2 years ago

>"After reading this article, you’ll be able to save all ~6 million pages of Wikipedia so you can access the sum of human knowledge regardless of internet connection!"

[...]

>"The current Wikipedia file dump in English is around 95 GB in size. This means you’ll need something like a 128 GB flash drive to accommodate the large file size."

Great article!

Also, on a related note, there's an interesting philosophical question related to this:

Given the task of preserving the most important human knowledge from the Internet and given a certain limited amount of computer storage -- what specific content (which could include text, pictures, web pages, PDFs, videos, technical drawings, etc.) from what sources do you select, and why?

?

So first with 100GB (All of Wikipedia is a great choice, btw!) -- but then with only 10GB, then 1GB, then 100MB, then 10MB, then 1MB, etc. -- all the way down to 64K! (about what an early microcomputer could hold on a floppy disk...)

What information do you select for each storage amount, and why?

?

(Perhaps I should make this a future interview question at my future company!)

Anyway, great article!

smukherjee19 2 years ago

Wow, this is so cool! 95 GB and I can browse the entire Wikipedia offline!? Thanks so much!

https://library.kiwix.org/?lang=eng

I was looking at what other sites are available, and seems there are quite a few. Are there any specific ones apart from Wikipedia that HN readers would recommend?

  • nephanth 2 years ago

    If you end up programming offline, I remember they had a dump of stackoverflow

sjducb 2 years ago

There is a ZIM file that contains all of stack overflow. Super useful if you have to program without access to the internet.

icod 2 years ago

But but if all of Wikipedia fits on a USB drive, what do they need the millions and millions of Dollars for? /s

squarefoot 2 years ago

How does this scale with the need to update data with time, corrections etc? Having to download everything again doesn't seem that elegant. I think this wold benefit a lot from some form of incremental backup support, that is, download only what was changed since last time. A possible implementation of that could be a bittorrent distributed git-like mirror so that everyone could maintain their local synced one and be able to create its snapshot on removable media on the fly.

  • lxgr 2 years ago

    Given that the ZIM format is highly compressed, I'd assume that any "diff" approach would be computationally quite intensive [1] – on both sides, unless you require clients to apply all patches, which would allow hosting static patch files on the server side.

    Bandwidth is getting cheaper and cheaper, and arguably if you can afford to get that initial 100 GB Wikipedia dump, you can afford downloading it more than once (and vice versa, if you can download multi-gigabyte differential updates periodically, you can afford the occasional full re-download).

    One application where I could see it making sense is a related project [2] which streams the Wikipedia over satellite: Initial downloads at this point probably take several days of uninterrupted reception.

    [1] Google has once implemented a custom binary diff optimized for Chrome updates, but I'm not sure if it still exists. [2] https://en.wikipedia.org/wiki/Othernet

e-clinton 2 years ago

Do you think Apple would approve an app that just offlines Wikipedia?

  • paulmd 2 years ago

    Yes, being a "content library app" (dictionary, for example) seems perfectly fine. You just need to be more than a frame for a website... but accessing device-local reference material is fine.

  • m348e912 2 years ago

    I have Minipedia installed on my iPhone and it does just that.

bArray 2 years ago

I think there are better ways to open ZIM files. I've had massive trouble with Kiwix. The old version seems broke beyond repair and the new version is too heavy.

ZIMply on branch `version2` has worked pretty well for me [1]. The search works a lot better and it's really nicely formatted.

[1] https://github.com/kimbauters/ZIMply/tree/version2

bori5 2 years ago

Apropos of nothing stumbled upon encyclopedia Britannica the other day, anyone know what’s up with that and if there are any pros to it vs Wikipedia ?

  • davidwritesbugs 2 years ago

    I used Britannica while in prison due to the obvious "No Internet". It works well enough: the articles are OK and from authoritative authors unlike many wikipedia pages but I found them a bit lacking in full detail; the main problem is that the range of topics is much much smaller, to the point where it was far less useful for detailed research. For prison, use as a basic reference, it was probably perfectly OK but for more demanding research it's not adequate.

sixhobbits 2 years ago

The Library page has three identical looking entries, 100gb, 50gb, and 15gb without any explanation about what is or isn't included in each

CGamesPlay 2 years ago

Can anyone explain to me how the kiwix library site works? There’s 3 Wikipedia listings that all have the same name, description, language, and author, but seem to have different content. This pattern repeats for the “Wikipedia 0.8” and “Wikipedia 100” sets. One of the latter says that the top 100 pages on Wikipedia require 889 MB? What’s going on here?

londons_explore 2 years ago

Note that it's possible to make wikipedia substantially smaller if you're happy to use more aggressive compression algorithms.

Kiwix divides the data into chunks and adds various indexes and stuff to allow searching data and fast access, even on slow CPU devices. But if you can live with slow loading, you can probably halve the storage space required, or maybe more.

  • kragen 2 years ago

    What compression algorithms would help? It's already using lzma for the text (in the form of .xz).

    • londons_explore 2 years ago

      The hutter prize is a competition for compressing Wikipedia:

      http://prize.hutter1.net/

      So the best algorithm to use from there is starlit, with a compression factor of 8.67, compared to lzma in 2MB chunks which can only achieve about 4:1 compression.

      • londons_explore 2 years ago

        Oh, and if you are happy to wait days or weeks for your compressed data, Fabrice bellards nncp manages even higher ratios (but isn't eligible for the prize because it's too slow)

        • planede 2 years ago

          Submissions for the Hutter price also include the size of the compressor in the "total size". So I assume that's hard to beat if you use huge neural networks on the compression size, even if decompression is fast enough.

          • londons_explore 2 years ago

            nncp uses neural networks, but 'trains itself' as it goes, so there is no big binary blob involved in the compressor.

            The only reason it isn't eligible are compute constraints (and I don't think the hutter prize allows a GPU, which nncp needs for any reasonable performance).

            • planede 2 years ago

              Ah, OK, fair enough.

  • int_19h 2 years ago

    They embed full-text indices into the .zim file these days, but they used to be separate originally. IIRC at that time the index for English wiki took up around 12 Gb, with the actual data in the ballpark of 65 Gb

kloch 2 years ago

I wonder if there is an offline backup of Wikipedia on ISS? There should be. And on every manned space mission.

  • bagels 2 years ago

    Why should there be?

    • mhh__ 2 years ago

      The next Apollo 13 will probably be a software problem , doesn't hurt if they can read up about it

      • bagels 2 years ago

        You're proposing that if something goes wrong on the ISS, the crew will need wikipedia to solve it? Not... talking to Houston or just taking the Soyuz back?

        • mhh__ 2 years ago

          I wrote a crap SDR basically just from wikipedia, maybe their radio broke.

          Presumably you have seen a science fiction film before, use your imagination.

      • tablespoon 2 years ago

        > The next Apollo 13 will probably be a software problem , doesn't hurt if they can read up about it

        What good would an "offline backup of Wikipedia" do in that situation?

        Wikipedia is good for one thing, and one thing only: getting some cursory knowledge on a topic you're unfamiliar with. It's the tourist map to the "sum of all human knowledge." If you expect to use it for anything else, you're asking too much of it.

        • mjcohen 2 years ago

          I have found a lot of the math articles to be quite good.

    • kragen 2 years ago

      So its contents aren't lost if Earth's surface gets depopulated.

      • samatman 2 years ago

        Putting it on ISS wouldn't help with that, although I'm sure this comes as no surprise to you, given that its orbit is a decaying one.

        I like the idea of periodic Wikipedia moonshots, although the storage format is kind of an open question, I've wondered for awhile if a DVD made from e.g. quartz, platinum, and titanium might be up to the job.

        A full backup would fit on 12 double-layer, single-sided disks; I'm being conservative and not using Blu-Ray numbers, since density and longevity are always somewhat in tension. Probably more expensive to put them safely on the moon than to manufacture in the first place.

        • kragen 2 years ago

          Agreed. I think even bare nickel or iron would probably be fine. Holographic glass laser damage can in theory handle higher recording densities and, like your DVD, isn't vulnerable to surface damage.

          In space you probably don't have to worry as much about minor surface scratches and oxidation, though. You just have to worry about outgassing and meteoroid impacts. Some of them you can stop, and some you can't. On the bright side, they're very rare.

          I think common media formats like DVDs are designed with a lot of emphasis on speed, both of reading and of duplication. This compromises both density and longevity. If you, instead, allow yourself the luxury of FIB milling to write and an electron microscope to read, you can manufacture in any vacuum-stable material at all, and you can engrave your archival message with, say, 50-nanometer resolution. At one bit per 50 nanometers square, you get five gigabytes per square centimeter.

          I think that with e-beam photoresist cross-linking followed by etching you get about 500 kilobits per second, and I think FIB milling is a little slower, so it might take a few weeks to make the copy — obviously unacceptable for a consumer CD burner but fine for periodic Wikipedia moonshots.

      • titoCA321 2 years ago

        And what's the point of it in space? Knowledge doesn't disappear when it's not on wikipedia. If humans are still around they will continue contributing to knowledge. Just because it's not printed or recorded doesn't mean that information or knowledge doesn't exist.

        • mynameisvlad 2 years ago

          > if Earth's surface gets depopulated

          There are 14 people on the ISS. If they were the only ones left, they would certainly not have the breadth knowledge of a Wikipedia dump.

          • titoCA321 2 years ago

            And how would these 14 survive if they are the only ones left? Do you know that there's a whole support team to support them in space? It's not just 14 people. There are hundreds on the ground supporting them.

            • kragen 2 years ago

              Yeah, ISS itself is not that realistic.

  • Dig1t 2 years ago

    Why not just every space mission, period?

    • Rebelgecko 2 years ago

      How much would the science capabilities of a telescope like JWST be reduced if 1/3 of its SSD was repurposed for storing the latest wikipedia dump (that 1/3 number is assuming it's only English, compressed, and without images)? To me that seems like an easy cost/benefit analysis.

      • autoexec 2 years ago

        How much would the science capabilities of a telescope like JWST be reduced if we left its SSD alone and just taped a USB drive to the side of it somewhere that contained wikipieda?

        • Rebelgecko 2 years ago

          Would duct tape pass the pre-launch vibe check? You'd have to do some engineering work to make sure it's sturdy, doesn't have any impact wrt oscillations, won't create FOD (debris) etc.

          Once you've done all that work, I'm not sure what you've actually accomplished. By the time any sentient being gets around to visiting JWST, I wouldn't be surprised if an unshielded commercial drive would be rendered totally unusable by radiation.

    • vorpalhex 2 years ago

      Well the robots don't read too well..

quickthrower2 2 years ago

I wonder if they snapshot Wikipedia for this, or if they stagger it per article to avoid very recent unreviewed edits getting in to such a download (that would say disappear off the site if those were bad edits or vandalism)

  • sanroot99 2 years ago

    They have snapshots ,there is also official Wikipedia torrent links of dumps

jscipione 2 years ago

Do not store 96gb of anything on exfat, use ext4 or APFS or zfs or some journaled file system. Does NTFS really have a 4GB file size limit? Structures should match exfat so that part seems suspect to me.

  • BizarroLand 2 years ago

    >> Does NTFS really have a 4GB file size limit?

    No, but FAT32 does. Exfat, on the other hand has a file size limit of 16 exibibytes. That, combined with exfat's cross-platform mounting (NTFS has a lot of limitations in this regard) makes it a superior formatting system for flash based offline file transfer.

    On a network? Use zfs+ or something.

  • int_19h 2 years ago

    This is the kind of thing that you download once and then never write anything on that media until you decide to refresh the content. In fact, you might as well mount it read-only. A journaling FS wouldn't do anything useful here.

  • DrSiemer 2 years ago

    Afaik NTFS max filesize is 256TB

dangrie158 2 years ago

In the old days :tm: I remember doing this as well with a 1GB drive ( and room to spare for some mobile apps).

Would be interesting to see a graph of usb size easily available vs. Wikipedia dump size.

  • Rediscover 2 years ago

    That would be very interesting. Thanks, I now have another entry on my To Do list.

breck 2 years ago

Love it! Imagine if USB Flash drive manufacturers just loaded up new drives with content like this. I mean, why not right? I think the physics means it would even be lighter ;)

  • sml156 2 years ago

    I for one would not be happy.

    When I buy a storage device I usually have an intended purpose for that storage and would not like to have to delete all of the files that some manufacturer thought would be useful information but I would have to delete to make room for what I want.

    • breck 2 years ago

      Sorry I wasn't clear—I 100% agree with you I definitely would generally want a blank one, but would be fun to have some options.

  • BizarroLand 2 years ago

    Especially if you didn't know which one you were going to get. Plug it in for a big surprise! (From a verifiable manufacturer who has their customers' happiness and enjoyment at heart)

kibwen 2 years ago

Now I'm curious: if, hypothetically, wikipedia was just backed by a single git repo and every edit was a commit, how big would it be and how long would it take to clone?

PaulDavisThe1st 2 years ago

Can someone explain what the role of kiwix in all this, please?

  • kragen 2 years ago

    It provides access to the content of the zimfile and an interface for downloading zimfiles.

    • PaulDavisThe1st 2 years ago

      Thanks. I had not understood that the download is the actual "raw" wiki files, not the pages as they would be delivered to a browser.

      • kragen 2 years ago

        It's actually a compressed archive, but I think the contents are in fact HTML and other browser-accessible media types.

ryanmercer 2 years ago

But why? If civilization collapses I'm not going to think "oh, let me consult Wikipedia" I'm going to think "man, this sucks".

SargeDebian 2 years ago

This has to be one of the most poorly structured pieces of writing I've seen in a while. It's way too verbose, and on the one hand there are separate sections like:

* Getting a flash drive

* Formatting a flash drive (which includes a subsection on not formatting it but buying one that's already formatted instead, while there was a separate section just before this one on buying a drive)

* Waiting for a file to download

At the same time downloading both Wikipedia and Kiwix are in the same section. Then, installing Kiwix is in a section called "You're done" which isn't next to the section on downloading Kiwix.

milkshakes 2 years ago

I want to like Kiwix -- I downloaded Wikipedia AND StackOverflow -- but it keeps crashing every time I try to search for anything on this M1 macbook.

garfield322 2 years ago

This would be useful to drop into North Korea?

  • bubblethink 2 years ago

    That's exactly what I was thinking of as well. I remember listening to an episode on Darknet Diaries with this theme where dvds and usb drives are a common way to smuggle things into North Korea.

iamwil 2 years ago

I think I'd rather have stack overflow offline, before I'd want wikipedia offline, though.

_int3_ 2 years ago

Anyone doing ZIM of news.ycombinator.com ? Once in a week package would be fine. How to make one?

haolez 2 years ago

Could I use something like this to train my own GPT that's obsessed with Wikipedia? :)

jhatemyjob 2 years ago

Is there something like this that downloads the full edit history as well?

mellowhype 2 years ago

would be cool if kiwix came with an auto-update feature, but given the database size, I believe it's difficult to implement.

porbelm 2 years ago

95 GB? I remember when it was like 2 GB haha

sprash 2 years ago

Is there something similar for Stack Overflow?

gbraad 2 years ago

Still using a WikiReader?

yCloser 2 years ago

and now donate to Wikipedia, because you just caused them to pay for 95Gb of (useless) traffic

  • snvzz 2 years ago

    That really isn't how it works.