Every few years a new interface paradigm arrives and the analysts declare it a toy. Voice interfaces, chatbots, low-code builders — each one got a wave of breathless coverage followed by a chorus of "it doesn't really work for serious use cases." Natural language querying for data is next in line for that cycle. We think this time is genuinely different.
SQL has been the dominant interface for relational data for fifty years. It's powerful, expressive, and understood by maybe 0.5% of the people who need to answer questions with data. The other 99.5% have three options: learn SQL, wait for a data analyst, or make decisions without data. None of those are good.
The BI industry's answer for decades has been visual query builders — drag-and-drop interfaces that abstract SQL away. They work, sort of, for simple aggregations. But the moment you need a join, a subquery, or a date window function, you're back to either writing SQL or filing a ticket. Visual builders hide complexity rather than solving it.
Pre-LLM natural language interfaces for data (there were several, going back to the early 2010s) failed for a consistent reason: they were rigid. They worked on a fixed vocabulary parsed by rule-based systems. Ask "what were sales in Q1" and it handled it. Ask "how did the northeast region perform compared to where we were this time last year" and it broke entirely.
The brittleness wasn't a bug — it was the architecture. Rule-based NLP can't generalise. Every novel phrasing required a new rule. Real users phrase the same question a hundred different ways, and none of those products could keep up.
LLMs generalise. That's the whole thing. A model trained on billions of tokens of text and code has seen enough variation in how humans express data questions that it can handle phrasings that were never explicitly in the training set. "Show me where we're bleeding margin" produces valid SQL. So does "which customers are on the fence based on their last login date."
The accuracy numbers tell the story. Internal benchmarks on business data questions — the kind that real operations teams actually ask — show LLM-based text-to-SQL at 70–85% accuracy on first attempt, compared to 30–40% for the best pre-LLM systems. And with schema awareness (feeding the model your actual table and column names), the first-attempt number climbs further.
We're not pretending it's perfect. Complex multi-step analysis, questions that require business context that isn't in the schema, and edge cases in dialect-specific SQL still trip up the model. The right mental model is: NL querying is extremely good for the 80% of questions that are straightforward aggregations, filters, and comparisons. For the 20% that require deep domain logic, you still want an analyst writing considered SQL.
But consider what that means in practice. If your team asks 50 questions a week and 40 of them can now be answered in 30 seconds by anyone on the team, you've effectively made your analyst 4× more available for the 10 questions that actually need them.
There's a subtler argument for NL interfaces that doesn't get made often enough: they change who asks questions. When the barrier to a data question is "write SQL or wait two days," only a certain type of person asks questions. When the barrier is "type a sentence," everyone asks questions. Marketing ops, finance, customer success, the founder — they all start pulling on threads in the data that they would have never touched before.
That cultural shift — from a world where data is mediated by specialists to one where everyone is a data consumer — is the real prize. The accuracy rate is a detail. The interface shift is the point.
Try it yourself
Connect your database and ask your first question in plain English. Most teams are seeing real results within the first session.
Get started free