Supporting Our AI Overlords: Redesigning Data Systems to Be Agent-First

67 points by derekhecksher 3 days ago

Interesting paper, and something I have been thinking about. I am retired so I am a lighter user of AI than most people here but I still have Gemini and ChatGPT to a half dozen deep research studies for me a week. It is sobering to see how many web sites are speculatively searched. I mostly find the results useful and I prefer this new process to manual web search. After deep research asking for 'the best' reference link usually produces something else worth reading in addition to the research report.

Someone else here recommended sites maintaining their own CLAUDE.md file, good idea but too vendor specific. Ten months ago someone online was recommending the name llms.txt as a generic markdown file for agent use and I added one https://markwatson.com/llms.txt I stopped collecting web page visit statistics so I have no idea how often that file is discovered however.

cs702 2 days ago

The paper's title is too clickbaitish for my taste, but its subject is important:

How should we rethink query interfaces, query processing techniques, long-term data stores, and short-term data stores to be able to handle the greater volume of agentic queries we will likely see, whether we want it or not, in coming years, if people and organizations continue to adopt AI systems for more and more tasks.

The authors study the characteristics of agentic queries they identify (scale, heterogeneity, redundancy, and steerability) and outline several new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores.

andai 2 days ago

The issue we have is that websites (including small websites) are getting hammered by bots. Apparently ChatGPT makes 2000 http requests per web search.
I think the real problem here is answering the question. But there's no way to intelligently get information out of the internet. (I assume Google is building one, but it apparently hasn't yet, and if they did, it's not what OpenAI would use.)
Hammering every WP site with infinite queries every time someone asks a question seems like the wrong solution to the problem. I'm not sure what the right solution looks like.
I got an 80% solution in like ten lines of python by doing "just Google it then look at the top 10 search results" (i.e. dump them into GPT). That works surprisingly well, although the top n results are increasingly AI generated.
I had a funny experience when Bard first came out (the original name for Gemini). I asked it a question, it gave me the precise opposite of the truth (the truth but negated). It even cited sources. The sources were both AI blogspam. That still makes me laugh.
- yunohn 2 days ago
  
  > Apparently ChatGPT makes 2000 http requests per web search.
  Can you source that claim? It sounds absolutely ridiculous and costly/wasteful. It would be nigh impossible to ingest 1000s of webpages into a single chat.
  
  andai 2 days ago
  
  It turned out I remembered the number incorrectly. It was actually 5000 http requests!
  https://news.ycombinator.com/item?id=42726827
  However, upon further investigation, this is a special case triggered by a security researcher, and not the normal mode of operation.
  
  yunohn 2 days ago
  
  If one reads the security advisory - the security researcher’s claim is that a particular API endpoint would accept URLs without deduping, so they were able to send 5000 URLs to it - nothing more sophisticated.
croes 2 days ago

Isn’t it bad to tailor the data for specific type of AI?
That could hinder other and maybe better approaches.
- cs702 2 days ago
  
  That's why my comment was conditional (emphasizing the "if" here, for clarity): "... if people and organizations continue to adopt AI systems for more and more tasks".
  If people and organizations don't do that, the research evidently becomes pointless.
- lyu07282 2 days ago
  
  It sounded to me that's not what they are doing, it's more about making your existing data accessible via *hand-waving* "agentic" architectures (= an unimaginable inefficient burning of tokens/s) it's all nonsense if you asked me

frenchmajesty 2 days ago

The proposed design in this paper is bad, but the core of the idea is very interesting.

At a high-level, 90% of the complexity of their data retrieval system can be deleted by simply having attaching a `CLAUDE.md` file to every data store that is automatically kept up to date the agents can read.

High-throughput queries by an agent don't feel much different than high-throughput querying that large scale systems Instagram and Youtube need to service on a daily basis. Whatever works for 10M active users per second on IG would also work for 50 agents making 1M queries per second.

I can see a need for innovation in data store still. My little startup probably can't afford the same AWS bill than Meta but the tide would lift all boats, not just AI-specific use cases.

apwell23 2 days ago

15 ppl worked on this glorified blogpost?

lyu07282 3 days ago

Is there an appendix with prompts separately somewhere?

Towaway69 2 days ago

Is the title to be taken seriously or is “AI Overlords” become some type of well-meaning indication of the positivity of having overlords?

I thought AI can do anything, why do I have to help it if it’s so smart and powerful and intelligent and useful? Is it really just a complex computer program that is actually trained to do very narrowly defined activities?

croes 2 days ago

Current AI has its limits and now we must tailor our data in the hope it fixes some problems.
Would be a shame to invest all those billions of dollars and resources to get unreliable mediocre results.
david_shaw 2 days ago

> Is the title to be taken seriously or is “AI Overlords” become some type of well-meaning indication of the positivity of having overlords?
The abstract blurb (linked) doesn't mention AI overlords in either context, so I think it's mostly just an edgy title.
- croes 2 days ago
  
  Maybe it really mean AI company overlords because they are seen as the next rulers of the economy.

pevcr 2 days ago

[flagged]

pevcr 2 days ago

[flagged]

unisyncd 2 days ago

It is a debate for the main visitors the Internet services serve.

A few decades ago, people visit each other using IP protocol, it is people themselves that collect news, read information, and publish new data.

After that, browsers visit each site using HTTP protocol, it is browsers that collect data, translate pages, and interact with user.

Nowadays, it is highly possible that, AI, WILL, involves into our daily life, and the above rewrites by, AIs request each <what> using <new> protocol, it is AIs that <do a lot thing>, and interact with user.

Information never get unavailable, but main method for retrieving info does change. We could of course manipulate command-line utilities instead of browsers when browser became popular, we could of course continue to search and click everywhere on browser instead of AI-enhanced searching when AI got hot today. However it is a trend that AI will bring us to a new evolution in fast-pace information era.

Users are whom sit behind the screen, they never changes, but their methods/agents/proxies change over time.

rixed 2 days ago

Exactly how I picture things. AI is the next step after good search engines. We dreamed about the semantic web but never really delivered on this. AI is the semantic search we were longing for; Still a bit fuzzy, but already very useful.
- croes 2 days ago
  
  This will be the best we‘ll get.
  Next step is enforced bias.
  https://fortune.com/2025/07/08/elon-musk-grok-ai-conservativ...
  https://www.tomshardware.com/tech-industry/artificial-intell...
  
  mallowdram 2 days ago
  
  All Western languages come enforced (or defanged doyv) with agentic status, control perhaps manipulation. This is the West. The sender is always staking claim to the opening lines right to status. And now add the conduit metaphor problem/paradox that infects every word in training.
  Overlord indeed, but not interesting. The West isn't interesting, or interested, it is biased to claim status from the get-go.