Dash: Open sourcing OpenAI’s in-house data agent

OpenAI recently published how they built their internal data agent. 6 layers of context, a self-learning memory system, and real lessons from running it in production. One of the best enterprise

Dash: an open, self-learning data agent to watch

If you work with data, this is worth a few minutes of your attention. Recently, Ashpreet Bedi published a concise thread explaining how they built an internal data agent and then open-sourced their version, called Dash. You can read the original thread here, https://x.com/ashpreetbedi/status/2018059495335764273.

At its core, Dash tackles a familiar pain, even if you haven’t named it: models hallucinate without context, SQL agents forget past fixes, and teams lose tribal knowledge when people move on. The project leans on six layers of context, which the author outlines as: table usage, human annotations, query patterns, institutional knowledge, memory, and runtime context. That stack helps generate SQL that’s actually grounded in how a company uses its data.

There’s a neat twist, and it’s practical. Instead of expensive fine-tuning, Dash learns through a hybrid of curated static knowledge and lightweight continuous learning, what the author calls *gpu-poor continuous learning* (no GPUs are harmed in these experiments). Over time it records error patterns, discovered fixes, and mappings like when a column named state actually means status elsewhere, or when “revenue” denotes ARR not bookings. Small, repeatable wins add up.

Want to play with it? The repo is public on GitHub, and it ships with a UI and an evaluation suite (they demo with an F1 dataset). Check it out here, https://github.com/agno-agi/dash.

A quick thought from a fellow tinkerer: systems like this feel like the bridge between ad hoc analytics and an institutional memory that actually helps teams move faster. Expect more teams to adopt data agents, and expect those agents to get steadily smarter, not by burning GPUs, but by learning the little habits that make data usable.

Kommentar abschicken