Winners of the $10k ISBN visualization bounty

601 points by rzk a year ago

The winning submission [0] was discussed on HN recently [1]. It's highly impressive from both technical decisions and graphic design viewpoints, it somehow elegantly visualizes 2 billion books (in a way that resembles a bookcase no less).

[0]: https://phiresky.github.io/blog/2025/visualizing-all-books-i...

[1]: https://news.ycombinator.com/item?id=42897120

c-fe a year ago

Im slightly surprised mine won 3rd place, I believe they liked my simplicity and visualisation. Hosted at https://isbnviz.pages.dev

But honestly, I find both of these better: - https://bwv-1011.github.io/isbn-viewer/ - https://anna.candyland.page/map-sample.html

in particular the one from bwv is technically similar but just all around better than mine, it is what I would want mine to be

abetusk a year ago

I'm also surprised that I got 3rd place.
But in terms of comparison of yours to bwv, I don't agree that bwv's is technically superior in every way. It lacks comparison, ISBN selection and link creation. bwv's main focus looks to be that one feature to highlight the rare books without trying to get the other requirements that AA wanted.
- c-fe a year ago
  
  Congrats to you too! Indeed, I think they could have improved the visual and comparison part, its a bit dark and not too interesting to look at. But I am envious of how smooth their tiling is. My tiles are 4096x4096 which allows me to satisfy both the 20,000 file limit and the max 20mb file limit imposed by cloudflare. I had some issues with smaller tiles, and wanting to host it on cloudflare restricted me from doing 512x512 tiles iirc. Also I really like that they extracted the publisher information and put that as a pmtile vector, thats something I attempted but ultimately ran out of time with.
matsemann a year ago

What is it that make yours and bws' have a floating island with spain/italy/++ in addition to them being represented in the main blob?
- c-fe a year ago
  
  Its due to how those ISBN ranges were handed out - I think they probably gave a block like 978-53 (for example) to those countries, meaning the right to distributed ISBNs 978-530-000-000 to 978-539-999-999 and then later they ran out or had all subblocks distributed to publishers, and then they got a new block further away (so not 978-54 in my example) and therefore those blocks are not numerically close to each other and thus also they are separate "islands" in the hilbert space.
  
  matsemann a year ago
  
  I see, thanks for explaining. Cool that your visualization then shows these idiosyncrasies!
  
  c-fe a year ago
  
  Thanks! That is indeed all thanks to using the hilbert curve fractal which has the property that it maps numbers which are close together onto 2d (or higher dimensional) coordinates which are close together, its a very cool property! Its used in lots of contexts for that reason
highcountess a year ago

I’m glad you said that, because I was also surprised by the fact that the bwv-1011 only made it to honorable mention even though its technical focus was on visualizing the rarity of books, which ostensibly was the primary objective of the whole effort.
- gknoy a year ago
  
  I really like that your page talks about _why_ a Hilbert curve is good. I don't remember ever learning about those before, and now hopefully if I'm ever trying to visualize 1D data, I might remember that :)

rahimnathwani a year ago

This is amazing.

One thing I found odd.

I searched for 'Stubborn Attachments' which worked.

On the same bookshelf there are several other Stripe Press books.

One of them is called Zero to One Hundred, by Stephanie Friedman.

When you search that book on Amazon, it has a different title, which I guess is reasonable as the book hasn't been published yet and they may not have finalized the decision: https://a.co/d/bQX5CNf

Here's where it gets weird:

- if you search for the book 'Zero to one hundred' (the title shown on the 'shelf') it doesn't come up

- if you search for the book by its ISBN, it does come up, but the name displayed in the search results is yet another alternate title. And the bookshelf displays that title. So the same part of the bookshelf looks different depending on what you searched for.

I haven't yet read the blog post about how this impressive visualization works, so I don't have an idea of why this is the case.

spondyl a year ago

I don't think it's the tool that's the issue, I think it's the book itself?
If you search the ISBN on the web, you'll get "Zero to One Hundred" with the cover of "Built to Grow" and vice versa.
There's also "Experiment, Build, Scale" which is the book that the visualisation shows, also with the same ISBN attributed to the previous two.
Experiment, Build, Scale seems to be the only book of Stephanie's that is in Google Books while Worldcat has "Zero to One Hundred" with the cover art for "Built to Grow".
Most of the online bookstore pages have this mess so I wouldn't blame the tool for what seems like an upstream data quality issue.
- closewith a year ago
  
  > Most of the online bookstore pages have this mess so I wouldn't blame the tool for what seems like an upstream data quality issue.
  I think that's an uncharitable read of the GP's comment. I read it as curiosity about how the upstream data issues present in the tool, which also interests the part of my brain that likes to solve minor mysteries.
- rahimnathwani a year ago
  
  Sorry I didn't mean to make it seem like I think the tool is at fault.
  I just think it's interesting that the book title shows differently on the shelf depending on whether you reach it via an ISBN search, vs. if you discover it by panning from a nearby book.

TomK32 a year ago

Fascinating. It allows for some interesting observations when you as I zoom in on this one (sadly no direct links to coords/zoom level) https://archive.anarchy.cool/maps/isbn.html You can find publishers like Hueber Verlag[1] in the eastern part of the German language section. They spread their ISBN numbers in a pattern with something like 1360000 between them (I know, ISBN having a checksum leads to gaps in the numbering), which generates a repetitive pattern with plenty of empty space. It is so wasteful on this huge chunk they have.

Are there no rules on how publishers have to assign their numbers? Just so they could hand back an unused block if they don't need it any longer.

[1] I can see how publishing learning material in 30 languages can give people "ideas" when assigning ISBN numbers https://de.wikipedia.org/wiki/Hueber_Verlag

bawolff a year ago

I feel like visualizations of large datasets which are viewer-directed (i.e. they want you to "explore" the data instead of trying to tell you something specific about it or communicate a narrative) are often "pretty" but never particularly enlightening. I feel like that holds true for these in particular.

pphysch a year ago

That's my issue with attempts to 3D-ify viz. Unless you are actually modeling a 3D volume, like medical imaging or CAD, the added "forced exploration" of 3D simply hides insights.
WillAdams a year ago

The thing is, ISBNs map to:
- publisher - assigned title - (roughly) order of publication
That's all that they communicate --- there is no hierarchy here to aid in discovery or to organize the content (and further complicating things, the same text may appear multiple times in a different binding --- a differentiation which is immaterial to an e-book).
The elephant in the room of course is the matter that "Anna's Archive" is not a legitimate book repository, but a piracy site, so what they are showcasing is how compleat (and brazen) their theft (and attendant lack of compensation) is.
This would be far more interesting if it were based on an hierarchical system such as LoC, and instead afforded an interface for accessing legitimately available books as are available from https://www.gutenberg.org/ or listed at: http://onlinebooks.library.upenn.edu/ or worked on at: https://www.wikibooks.org/
- bawolff a year ago
  
  > The thing is, ISBNs map to: > - publisher - assigned title - (roughly) order of publication
  I assume the task isn't just to visualize isbns literally. Presumably you are allowed to cross reference with other data.
  > The elephant in the room of course is the matter that "Anna's Archive" is not a legitimate book repository, but a piracy site,
  I think its pretty clear that the target audience doesn't care. I don't think the target audience holding differing political views is really a valid critcism of the project. It should be evaluated in the context and audience it was created for.
  
  WillAdams a year ago
  
  This is not a political stance, but one of basic questions of authorship and what compensation authors should receive and what control they should have over their work.
  See arguments by Alexander Pope in Pope _V._ Curll.
  
  mistrial9 a year ago
  
  when China decided to wholesale ignore Western copyright in the digital age, completely.. the equation changed IMHO.
  
  WillAdams a year ago
  
  Yes, but dealing with that politically would be made easier by having the moral high ground.
  
  mannyv a year ago
  
  Not really, because it depends on the basis of morality. In fact, this 'morality' problem is shown in the existence of libraries in the US.
  Is a book a collective good? Or property? In the US the answer is 'both' in an awkward way. But the US does know that having books behind a paywall is not in society's best interest.
  And in reality 99% of the books will never be read, which makes their 'value' as property suspect.
  
  WillAdams a year ago
  
  If so few books are to be read, then why is it so difficult to pay for those which are?
  
  bawolff a year ago
  
  > This is not a political stance, but one of basic questions of authorship and what compensation authors should receive and what control they should have over their work.
  Questions of compensation and ownership are one of the most political questions of all.
  What exactly do you think communist revolutions were revolting over?
  
  WillAdams a year ago
  
  How to employ and feed and compensate the masses.
- zozbot234 a year ago
  
  > This would be far more interesting if it were based on an hierarchical system such as LoC, and instead afforded an interface for accessing legitimately available books
  Isn't this exactly what Open Library does?
  
  WillAdams a year ago
  
  Given that "Textbooks" are separated out and "Animals" and "Childrens' Books" and "Health & Wellness" are top-level categories? and that it mixes in books which are not available for download, not really.
  The UI is not all that great either.
  I would like to see:
  - an hierarchical list with a hierarchy which actually makes sense and truly organizes knowledge
  - of legitimately available downloadable books
  - which has a nice UI
  but it's far more important that LLMs have training data without consideration of recompense than any other consideration.

dylan604 a year ago

I had a Pavlovian response to reach for the defrag program at first sight of the top image.

2Gkashmiri a year ago

win 98 had the best animation. pity everything beyond that was dogshit

robingchan a year ago

This was great fun to enter nevertheless, congrats all involved.

My entry is still live for now for anyone curious:

https://d199hl4t3ts6d9.cloudfront.net/

ofou a year ago

My most sincere love to all shadow libraries out there, you're doing god's work.

xtracto a year ago

They do half of the work (which is a helluva lot)... the other half is done by the volunteers that digitize books.
I was looking at my country's "shelve" and it's so sad to see so many missing titles. I almost wanted to go to my local livrary and digitize sone of them. The old ones that are out of print and imposible to acquire right now...
So much knowledge lost.
- FabHK a year ago
  
  To be fair, the authors of the books also contribute quite a bit.

franciscop a year ago

I'm curious why there's no clear "Spanish" in these ISBN visualizations; there's 2 slots for English, one for France, Germany, Japan, Soviet Union, China, etc. but no big one for Spain. Do we really have so few books in Spanish? Or is this a predominantly English distribution?

I say this as someone who grew up in Spanish libraries and book shops, surrounded and immersed in Spanish books, so it feels a bit strange to see the tiny bit we occupy in the world map here.

rsecora a year ago

The dataset consists of books from the Anna Archive, each identified by an ISBN. The ISBNs and titles are extracted from datasets [1], which include magazines and books primarily in Chinese, English, and French.
Example: Germany publishes five times more books than the Netherlands [2], and Spain publishes twice as many books as the Netherlands. However, in visualizations, Germany appears similar to the Netherlands, while Spain and Mexico do not aligned with the high-level labels [3].
[1] https://annas-archive.li/datasets
[2] https://internationalpublishers.org/wp-content/uploads/2023/...
[3] https://software.annas-archive.li/AnnaArchivist/annas-archiv...
glenstein a year ago

>I'm curious why there's no clear "Spanish" in these ISBN visualizations
I had the exact same question, and I do have a completely unsupported theory. There's one large block that appears to be Argentina, or possibly Peru, although their titles are on the fringes of the large block. The block is otherwise unlabled, no name sitting at the center of the block like you see with the other major ones. I would be slightly surprised if it were entirely argentina, but it would make a lot of sense if that block were Spanish.

bondant a year ago

The winning submission kind of remind me of the Eagle mode file manager where you can zoom into a directory to see files in it and keep zooming to access subdirectories.

https://eaglemode.sourceforge.net/emvideo.html

soneca a year ago

Where the database is from? How and how often is it updated?

I have two self-published books with ISBNs. Neither of them has the details in the 1st place submission (I assume it won’t be in any other as well?).

One was published on Feb 23 and the other on Dec 24. I had hoped at least the older one would be there. Does anyone know why they are not?

The ISBNs:

- 9786500718836

- 9786501276830

ziddoap a year ago

From https://annas-archive.org/blog/all-isbns.html :
>We started mapping ISBNs two years ago with our scrape of ISBNdb. Since then, we have scraped many more metadata sources, such as Worldcat, Google Books, Goodreads, Libby, and more. A full list can be found on the “Datasets” and “Torrents” pages on Anna’s Archive. We now have by far the largest fully open, easily downloadable collection of book metadata (and thus ISBNs) in the world.
So, it your books would need to be present in one of the databases that Anna's Archive scraped, at the time they scraped it.

layer8 a year ago

Does Anna’s Archive track and account for duplicate ISBNs?

https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...

rishikeshs a year ago

Noob here, but can someone explain like im fivr, why this is important? It looks beautiful nevertheless

_mitterpach a year ago

I'll start off by quoting the winning submission.
Libraries have been trying to collect humanity’s knowledge almost since the invention of writing. In the digital age, it might actually be possible to create a comprehensive collection of all human writing that meets certain criteria. That’s what shadow libraries do - collect and share as many books as possible.
One shadow library, Anna’s Archive (which I will not link here directly due to copyright concerns), recently posed a question: How could we effectively visualize 100,000,000 books or more at once? There’s lots of data to view: Titles, authors, which countries the books come from, which publishers, how old they are, how many libraries hold them, whether they are available digitally, etc. - https://phiresky.github.io/blog/2025/visualizing-all-books-i...
Basically, legally gray online book repositories such as Anna's Archive, who was the creator of this bounty, are trying to collect a lot of books. The question quickly arises - how many books are there?
The best way to track books is by using ISBN, international standard book number, basically the personal id of any given books, given to books by an international agency. Now that you know which books exist, you can check which books your repository already has and which ones are missing.
But ISBN covers the space of over 2 billion possible existing books. That's a lot. So, Anna's Archive has created a contest to display this space in the cleanest way possible. The winning submission is very nicely done, and in my view very well deserving of the 6,000$ bounty.
- tokai a year ago
  
  I like Annas Archive but its definitely not legally gray.
  
  _mitterpach a year ago
  
  There are multiple ways to look at this, but for example, my middle European country's laws explicitly state that breaking copyright is okay, if the material is used for teaching purposes. Downloading for personal use is also allowed.
  Are they breaking the laws of the country where they host their own data? I can't really say.
  In honesty, I don't believe copyright laws will survive this decade, much less this century. With models being trained on copyrighted material and no cases setting the precendent that this is not okay, I feel like the new reality is that you can steal anything, as long as you 'launder' it through an AI model.
  Maybe that may be the next big startup, re-creating copyrighted books through AI models, just different enough to skirt the laws. Who wouldn't like to read 'Owner of Numerous Pieces of Jewelery' instead of 'Lord of the Rings'?
  
  spudlyo a year ago
  
  There are places that have a minimal or no formal recognition of IP rights. Not counting stateless or breakaway regions like Transnistria and Sealand, countries like Somalia and South Sudan either do not have a government-run IP system, or in the case of South Sudan are not part of the Berne Convention. I doubt that Anna's Archive operates in one of these places, but there are still safe harbors for their mission.
- rishikeshs a year ago
  
  Ok so from what I understood, this visualisation displays all the ISBNs that are assigned into countries, then across publishers. Books that are not highlighted are the ones that are not present on Annas Archives? Is that so?
  Also what do you mean by unassigned?
  
  c-fe a year ago
  
  Annas Archive has both books in their archive, but they also have other datasets that connect a book ISBN to the metadata (title, author, publisher, ...).
  In my visualisation https://isbnviz.pages.dev you can see which books they actually have the files of (blue) and which ones they know exist because they have the metadata from some other source (like google books, ...) (red). Finally, there are also ISBNs not contained in any of the sets that Annas Archive has, and these are either assigned or not assigned. A lot of the 979 prefixed ISBNs are not assigned, that means, no country/publisher has the right to assign them to a book. Other ISBNs are assigned to a publisher, but they just haven't published a book with that ISBN yet. Or they may have published a book, but Anna's archive doesnt know about the book because its not in their (or the ones they scraped) dataset.

vessenes a year ago

Public request: anybody here who hates Anna’s and wants to make a principled complaint about it? I love it and the idea of it so much, but I imagine some feel differently and I’d like to hear your best takedown shot.

WillAdams a year ago

Well, I made a comment at:
https://news.ycombinator.com/item?id=43193432
Does that count?
The thing is, if we're going to have GPL software, then we need copyright.
Yes, the terms/lengths need to be adjusted, but one can't do that by fiat/unilaterally.
- vessenes a year ago
  
  To me it's not a substantive critique. I agree that copyright is useful. But I don't think it is so useful that much of the world's knowledge creation stands to be lost, which is the alternative.
  And, the copyright exists because it's a societal benefit -- it is considered to encourage the creation of knowledge, art and so on -- it puts significant limitations on the rest of society. So, when we're evaluating, we need to evaluate both whether we're getting the benefit of copyright that we "pay for" and whether increasing or decreasing what society pays would be a net good.
  I'd propose that in the case of books, approximately zero books don't get written because of Anna's. In fact, some pretty thoughtful authors like Cory Doctorow feel that piracy increases their net writing and impact.
  So, increasing enforcement seems like it would have no net benefit on adding to our world's social capital. On the other hand, it is clear that when data (on paper or digitally) is held by only a small number of hands, it is often lost, full stop. Television archives from the first fifty years of broadcast TV, whether US or England, much less I imagine, say Poland or East Germany, are largely gone. And in places like England that had criminal penalties for illegal TV watching, I'd guess that more is gone. You can search for a recent story about this regarding Dr. Who -- the BBC had dumped a bunch of Dr. Who tape they didn't want in the (60s?). An unnamed employee saved it. Which was (and is) illegal in England. The BBC has asked for it back, but can't/won't promise there are no legal repercussions for the person who saved the data.
  Enforcement like that is a net negative in my opinion; the question would be moot if we had a full archive on Anna's of every Dr. Who episode.
  
  WillAdams a year ago
  
  Why are books likely to be lost?
  Any decent library has a "last copy" policy where the last copy of a given book/edition is kept in a vault for reference at need.
  >Libraries will get you through times of no money better than money will get you through times of no libraries.
  >--- Anne Herbert
  Piracy directly interferes with the actual business of publishing books --- I missed out on buying _Traditional Archery from Six Continents_, and the price quickly climbed to 4 digits after it went out of print, so I made arrangements to get the rights and re-print it --- shortly before I picked up the books at the printer someone released a PDF scan of the book --- it took me over 5 years to sell out the print run and get my living room closet back.
  Similarly, I'd like to arrange for re-printing J.R.R. Tolkien's _The Old English Exodus_: https://tolkiengateway.net/wiki/The_Old_English_Exodus but have settled instead for binding a photocopy I was sent the second time I requested it on Interlibrary Loan because a certain archive site is handing it out for free as a poor quality scan.
  If folks want books to be free, why not focus on either public domain texts, or authoring new works which have copyleft and similar licenses? Why deny authors their right to control how their work is distributed _and_ what compensation they receive.
  I've done a fair bit at wikibooks.org and have put up:
  https://willadams.gitbook.io/design-into-3d
  and my articles in _TUGboat_ are freely downloadable --- that doesn't give me the right to take texbook.tex, edit it, typset it to a PDF and then pass out that PDF.
  Dr. Who episodes are not books. Don't move the goal posts.
kaladin-jasnah a year ago

Their download wait time is upsetting to me because I'm impatient and cheap (else you have to pay). At least they have the libgen links now.
I don't hate Anna's Archive though.
- ksynwa a year ago
  
  The "external links" section is the only thing making the website usable for non-subscribers.
oguz-ismail a year ago

it's not libgen

jonplackett a year ago

Is there anywhere that lists/publicises/collates competitions like this?

I would like to have had a go at this but you often only find out about these things when winners are announced.

no-reply a year ago

I don't see any arabic literature. Curious whether that due to lack of actual digital/ocr text or lack of availability of the pdf/epub formats of the books.

boznz a year ago

no wonder nobody can find my book :-)

divbzero a year ago

These ISBN visualizations remind me of the maps of IPv4 address space.

https://xkcd.com/195/

https://ant.isi.edu/address/

https://www.caida.org/archive/id-consumption/census-map/

ChrisMarshallNY a year ago

Love the Trantor reference!

jaybro867 a year ago

[dead]

henglihong-jsu a year ago

[dead]

ivolimmen a year ago

I have no idea whats on the site as my provider blocks it because European sanctions against Russia as this is on of the RussiaToday sites.

chungus a year ago

Judging by your profile location being in the netherlands, I think you are confusing the generic Ziggo ISP blocked page[1], where it lists Russia Today and Sputnik News and then in another post ThePirateBay
In this case, the ISP blocked it because the website is anna's archive [2], which was blocked around a year ago, but they have not made a post about that.
If you put "pcm." in front of the link it will work (for now)
You should probably edit your post, so as not to misinform. But I have to admit this confusion stems from bad decisions at the ISP.
[1] https://www.ziggo.nl/website-geblokkeerd
[2] https://en.wikipedia.org/wiki/Anna%27s_Archive#Netherlands
- ivolimmen a year ago
  
  Seems that editing is not possible due to the negative point I gathered; which is weird as I just reported that I can not watch it. People seem to view everything though a political lens now. But thank you for your information; I saw the post.
  
  tokai a year ago
  
  No you were down voted because you claimed something false.
  
  svdr a year ago
  
  It it false but I would not blame the parent; the ISP blocked page is unclear and suggests the block is linked to Russia.
notpushkin a year ago

Do you have any evidence?