AI Retrieval Systems: What's The Missing Layer? | HackerNoon

I’ve spent a lot of time lately listening to people talk about AI retrieval (RAG) systems, and it’s almost always about retrieval quality, whether it’s better embeddings, hybrid search, reranking, graph structures, or agentic reasoning that infers which sources to query. And it makes sense, because retrieval is a hard technical problem. But there’s another aspect that I think gets way less attention than it should: whether users can tell the system where to look.

If you watch how people actually work with documents, you’ll notice they don’t search everything. They’ve already narrowed it down. They know which sources are relevant and which ones they can ignore. The problem is that most AI systems don’t give them a way to express that knowledge.

The Limits of Inference

There’s a hidden assumption buried in most RAG architectures that the system can infer where to look based on the query alone.

Sometimes it can. If I ask, “What is the acceptance criteria for this feature?” a well-designed agent can reasonably guess that our work tracking system is the right starting point. But what if the acceptance criteria reference a technical constraint placed in another system? What if there’s context in the code repository that changes the interpretation? Only I know which sources matter for this particular question.

The reasoning engine is guessing. And when it guesses wrong, users can’t easily correct it. Some enterprise AI platforms have built impressive connector ecosystems, with hundreds of integrations to different systems. But more connectors don’t solve the fundamental problem; the system still has to infer which of those connected sources matter for any given question.

Two Kinds of Knowledge

When I look at how retrieval systems are built, I see two kinds of knowledge at play. There’s the knowledge contained in the documents, and then there’s knowledge about which documents matter.

Current systems focus almost entirely on the first kind. They try to infer the second kind through reasoning. But the thing is, some knowledge can’t be inferred. It has to be provided.

The user knows things the system can’t figure out on its own. They know which version of the spec is current; that the March update superseded the January one, even though both documents exist in the system, and both match the query. When they ask about “the billing workflow,” they mean the one for their team, not the three other billing workflows documented elsewhere in the organization.

And that knowledge isn’t static either. It shifts with the task at hand. An engineer asking “How does the deployment pipeline work?” during onboarding is solving a very different problem than the same engineer asking the same question while a deploy is failing in production. A reasoning engine can only see queries and sources, but the user understands context. And that context changes as the work unfolds.

The primary lever most systems offer is the prompt itself. With more specificity, a good reasoning engine might find the right sources. But this requires users to articulate context they often hold tacitly, and to do it with every query, which hardly scales.

What would scale is letting users define their own boundaries once and invoke them when needed.

User-Defined Boundaries

This is the approach we took when we designed our internal RAG system. Instead of trying to infer everything through reasoning, we gave users the tools to define their own knowledge boundaries. There’s still an agentic layer that orchestrates across sources, but it operates within boundaries that users set.

We made knowledge base creation a first-class capability. Users can upload their own documents and create knowledge bases without filing a ticket or waiting for an admin. They can further select specific documents to include or exclude right in the chat UI. They can organize their knowledge around how they actually work and compose these knowledge bases with skills that connect to wikis, document repositories, work tracking systems, code repositories, and databases.

The key building block is what we call a persona skill. A persona skill goes beyond a prompt template. It defines how the system should approach a particular class of problems: what role it should adopt, what knowledge bases it should consult, what enterprise systems it should connect to, and what reasoning approach it should follow. Users create these themselves.

We expected people to create personas for common tasks, like answering questions about policies. Some did. But the personas that caught our attention were the deeply specific ones. A nurse case manager created a persona that combines medical record search with fee schedule compliance rules. A DevOps engineer built one that generates AWS provisioning files from infrastructure requests, using their team’s specific naming conventions and security policies.

A QA analyst put together one that validates test coverage against acceptance criteria across three different systems. No product roadmap would have prioritized them. They exist because we gave the people doing the work the agency to build what they needed.

The reasoning engine is still there. It still plans, routes, and synthesizes. But it operates within boundaries the user defines through their persona skills, drawing from both their scoped knowledge bases and live connections to enterprise systems.

The Question of Maintenance

There’s an obvious objection, and it’s a fair one: user-created knowledge bases go stale. Repositories move, teams reorganize, and documents get superseded. If users create hundreds of personas, who maintains them?

But this objection assumes central governance is the alternative. Most enterprise AI platforms treat knowledge base creation as an admin function. Users consume what’s been configured for them. A central team defines the taxonomy: “Corporate Policies”, “Product X Knowledge Base”, “Development Practices.” Documents get assigned to buckets. Users query the buckets they’re granted access to.

The problem is that central governance can’t keep up with how work actually happens because the right scope may shift with each task. A product manager working on a billing feature needs documents that span engineering specs, compliance guidelines, customer feedback, and a handful of wiki pages that don’t fit neatly into any predefined category. They either query everything and drown in noise or pick one bucket and get incomplete answers.

Yes, decentralized ownership is messy. But central control creates the illusion of order while hiding decay. A taxonomy that looks clean in the admin console may be full of stale documents that nobody with actual domain knowledge has looked at in years. When a team owns a knowledge base they actively use, broken results affect their work directly. They notice when something stops working, and they have both the context to diagnose the problem and the motivation to fix it. Centrally administered knowledge bases, by contrast, decay invisibly until someone complains loud enough.

Usage patterns also surface quality on their own. Knowledge bases that work get shared; ones that don’t get abandoned, and those inactive long enough get flagged for review or automatically purged. The system doesn’t need a central team to evaluate quality when users are voting with their behavior.

The Missing Layer

When building agentic AI tools, you can put agency in the system, letting the reasoning engine decide where to look, or you can put agency in the user’s hands, providing building blocks that let users define their own retrieval context.

These aren’t mutually exclusive. Inference works well for exploration, when users don’t yet know where to look. User-defined retrieval context works better for the repeated queries that make up most of the actual work, where users have already identified the sources that matter and want the system to remember that.

The industry has invested enormously in the first approach with better reasoning and more connectors, but reasoning can only get you halfway there. There’s a whole category of knowledge that lives exclusively in the user’s head.

And that’s the missing layer in the AI retrieval stack: interfaces that let users compose their own retrieval configurations, combining curated knowledge, live system connections, and reasoning into reusable units that reflect how they actually work. Basically, asking users to contribute what they already know.

Most systems never ask. Perhaps because there’s an uncomfortable implication here that their value depends partly on what users bring to them, not just what they can figure out. That’s a less satisfying story than autonomous reasoning, but it might be the more honest one.