Show HN: I built an open-source AI data layer that connects any LLM to any data
github.comExcited to share a project I’ve been building for months! Would love to receive honest feedback :)
My motivation: AI is clearly going to be the interface for data. But earlier attempts (text-to-SQL, etc.) fell short — they treated it like magic. The space has matured: teams now realize that AI + data needs structure, context, and rules. So I built a product to help teams deliver “chat with data” solutions fast with full control and observability (agent tracing, quality scores, etc) — am I wrong?
The product allows you to connect any LLM to any data source with centralized context (instructions, dbt, code, AGENTS.md, Tableau) and governance. Users can chat with their data to build charts, dashboards, and scheduled reports — all via an agentic, observable loop. With slack integration as well!
* Centralize context management: instructions + external sources (dbt, Tableau, code, AGENTS.md), and self-learning
* Agentic workflows (ReAct loops): reasoning, tool use, reflection
* Generate visuals, dashboards, scheduled reports via chat/commands
* Quality, accuracy, and performance scoring (llm judges) to ensure reliability
* Advanced access & governance: RBAC, SSO/OIDC, audit logs, rule enforcement
* Deploy in your environment (Docker, Kubernetes, VPC) — full control over infrastructure
GitHub: github.com/bagofwords1/bagofwords
Docs / architecture / quickstart: docs.bagofwords.com
The hardest problems building this weren’t in the LLM logic, but in everything around it —-observability, access control, and managing context across dbt, Tableau, and code. Finding the balance between a strict semantic layer and LLM agency was tricky. Too rigid and it loses llm magic, too loose and reliability breaks
What worked for me and my users was leaning on instructions + AGENTS.md + metadata as a lighter abstraction layer — structured enough for trust, but flexible enough to keep the model useful.
If you’ve been exploring similar ideas or trying to productionize AI analysts, I’d love to hear how you’re approaching it
how do you make sure there's no context bloat?
Thanks for the question. Avoiding context bloat and overall engineering the context is (still) most of work. What’s been working:
- Role scoped calls: data modeling, code gen, are separate calls where each gets its own tailored context
- Context is divided into sections (tables, dbt, instructions, code) and each is getting a hard limit budget (required some experimentation, liked Cursor’s priompt project)
- agentic retrieval: agents can call tools to fetch or search data/metadata when needed
- summaries for different objects: messages, widgets; reports, data samples/profiles.
I wrote some more about how the agent and context work in the docs