Show HN: NERDs – Entity-centered long-term memory for LLM agents

12 points by tdaltonc 7 hours ago

Long-running agents struggle to attend to relevant information as context grows, and eventually hit the wall when the context window fills up.

NERDs (Networked Entity Representation Documents) are Wikipedia-style entity pages that LLM agents build for themselves by reading a large corpus chunk-by-chunk. Instead of reprocessing the full text at query time, a downstream agent searches and reasons over these entity documents.

The idea comes from a pattern that keeps showing up: brains, human cognition, knowledge bases, and transformer internals all organize complex information around entities and their relationships. NERDs apply that principle as a preprocessing step for long-context understanding.

We tested on NovelQA (86 novels, avg 200K+ tokens). On entity-tracking questions (characters, relationships, plot, settings) NERDs match full-context performance while using ~90% fewer tokens per question, and token usage stays flat regardless of document length. To highlight the methods limitation, we also tested it on counting tasks and locating specific passages (which aren't entity-centered) where it did not preform as well.

nerdviewer.com lets you browse all the entity docs we generated across the 86 novels. Click through them like a fan-wiki. It's a good way to build intuition for what the agent produces.

Paper: https://www.techrxiv.org/users/1021468/articles/1381483-thin...

rnunery13 4 hours ago

I agree with Elevaes, this was absolutely fascinating and I love the use of books to help understand the concepts. I could relate right away. The token usage reduction potential is massive especially when it comes to enterprise usage and costs - many companies are experiencing sticker shock because they weren't prepared / didn't anticipate the usage. The potential for better costing and estimation with the process could have widespread impacts to financials (in a good way) and allow for more accurate pricing estimates and models.

mmayberry 6 hours ago

If the agent builds entity pages incrementally while reading, how do you prevent early incorrect assumptions about relationships or attributes from propagating through the entity graph? Is there support for belief revision?

tdaltonc 5 hours ago

Yes this sort of auto-regressive error propagation is a real concern for the same reason it's a real concern with LLMs in general.
If you force the output of an LLM to begin with an error, the LLM tends to continue down that erroneous path.
In practice, we didn't see much of this kind of EP. A solution to this would be to give some agent the task of occasionally reviewing the NERDs for contradictions as well as the ability to search through the source material as needed. That of course creates the possibility of catastrophic forgetting, where the agent rewrites a NERD in an effort to remove a contraction and end's up deleting something important.
We didn't see a lot of error propagation, but one example where we did: in Harry Potter, Prof Dumbledore is introduced as a mysterious hooded character. So the NERD-writer would create a NERD for "mysterious hooded man." There's no tool for the agent to change the title of a NERD, so the system is stuck with that title now. Sometimes the system would build the entire Dumbledore entry under "mysterious hooded man"; sometimes it would make a new Dumbledore entity and like a reference back to the "mysterious hooded man" entity, and sometimes it wouldn't link them. None of those outcomes are great.

elevaes 7 hours ago

This is fascinating, I'm wondering if it works as well with other use cases like papers, conversations, or any other human written text.

tdaltonc 6 hours ago

We originally developed NERDs inside of my last startup for monitoring the progress of solar developments. There are many different multi-modal event feeds that you need to monitor for a wholistic view of the project. NERDs helped glue together the event around entities.
Only later did we adapted to the technique to work to long books. The existing long book benchmarks seemed like the most appropriate way to show the core idea to a wider audience.
So ya, I'm confident that this central idea can be applied in many different domains.

droongta05 3 hours ago

[dead]

noliver 6 hours ago

[dead]

aawinters 6 hours ago

[dead]