Endless AI-Generated Wikipedia

24 points by Twixes 2 days ago

bawolff 17 hours ago

> I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70. I don’t really understand why anybody would do that.

Guess it wasn't so endless after all.

Author is assuming malice, but honestly bots clicking links is just what happens to every public site on the internet. Not to mention going down the link clicking rabbit hole is common among wikipedia readers.

All that said, i don't really see the point. Wikipedia's human controls is what makes it exciting.

haileys 17 hours ago

It’s a poetic end, considering that the very same scraping activity without regard for cost to site operators is how these models are trained to begin with.
leobg an hour ago

Would have been ironic if it was the crawler from OpenAI… :)
kristianp 12 hours ago

New page generation has been re-enabled, with a rate limit and "using openai/gpt-oss-120b instead of Kimi-K2".
dpark 16 hours ago

> but honestly bots clicking links is just what happens to every public site on the internet.
As a CS student ~20 years ago I wrote a small website to manage my todo list and hosted it on my desktop in the department. One day I found my items disappearing before my eyes. At first I assumed someone was intentionally messing with my app but logs indicated it was just a scraping bot someone was running.
It was a low stakes lesson on why GET should not mutate meaningful state. I knew when I built it anyone could click the links and I wasn’t bothered with auth since it was one accessible from within the department network. But I didn’t plan for the bots.
- vunderba 16 hours ago
  
  Reminds me of the Spider of Doom which was a similar issue where "Get/Delete" links were hidden by simple javascript to see if the user was logged in. All of a sudden pages and content on the website began to mysteriously vanish.
  You know what doesn't care about Javascript and tries to click every link on your page? A search engine's web crawler.
  https://thedailywtf.com/articles/The_Spider_of_Doom
userbinator 16 hours ago

Google and all the other search engines will crawl any public site too.
blourvim 17 hours ago

more clicks means a bigger wiki which I guess should be the point, unless the generated articles lead to nonsensical strings which sucks, but should be reasonable to prevent
UltraSane 13 hours ago

You should always have per-IP rate limiting.

000ooo000 17 hours ago

>I’m not worried about one power user costing me a lot of money in inference

>edit: I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70.

AaronAPU 5 hours ago

This was literally the first idea I had at the initial GPT release. Prototyped it in about 30 minutes and then thought “bots will obviously just destroy this” and discarded it.

kristianp 12 hours ago

I noticed it isn't that eager to generate links, for example the game names "Virtua Fighter" and "Daytona USA" are italicized, but not links in https://www.endlesswiki.com/wiki/sega_studio_tokyo

kiriberty 16 hours ago

This is a slippery slope to hallucinated hell

visarga 16 hours ago

I would use Deep Research mode outputs. Sometimes I run multiple of these in parallel on different models, then compare between them to catch hallucinations. If I wanted to publish that, I would also doublecheck each citation link.
I think the idea is sound, the potential is to have a much larger AI-wikipedia than the human one. Can it cover all known entities, events, concepts and places? All scientific publications? It could get 1000x larger than Wikipedia and be a good pre-training source of text.
Covering a topic I would not make the AI agent try to find the "Truth" but just to analyze the distribution of information out there. What are the opinions, who has them? I would also test a host of models in closed book mode and put an analysis of how AI covers the topic on its own, it is useful information to have.
This method has the potential to create much higher quality text than usual internet scrape, in large quantities. It would be comparative analysis text connecting across many sources, which would be better for the model than training on separate pieces of text. Information needs to circulate to be understood better.

dcreater 10 hours ago

So will this end up being part of the training dataset for future LLMs?

j_juggernaut 12 hours ago

Solved the Neon Genesis Evangelion challenge using Chatgpt Agents, take a look

hliyan 17 hours ago

Wouldn't this be better as a browser extension where the user can highlight some text on it and have it explained, like these: https://chromewebstore.google.com/search/ai%20explain?filter...

blourvim 17 hours ago

I wonder if first link chain here also would lead to "Philosophy"

indigodaddy 14 hours ago

I'm trying to link to Philip Glass. This could take a while. Kinda fun and a bit reminiscint of googlewhacking or maybe the LLM equivalent of Six Degrees of Kevin Bacon, but it's gonna be way more than six to get to Philip Glass.

Edit, well shit looks like there is a Minimalism page, but it didn't make any names clickable. Sean, looks like you need to tweak the code a bit?

https://www.endlesswiki.com/wiki/minimalism

_def 16 hours ago

Huh i found a dead end, 404

tehjoker 16 hours ago

interesting idea, but while it is sold as a way to interact with the knowledge in a model, i suspect the rabbit hole effect and the most tantalizing information in it will be subtlety hallucinated. an efficient delivery vehicle for “computer madness”

oidar 17 hours ago

hugged