How Max Barinov Cut AI Token Consumption By 10x Analyzing Medical Records At Adentris

The problem is daunting. Imagine layers of human medical records, stacked on top of each other, often jotted down in a hurry with barely legible handwriting. Diagnoses, lab results, medication adjustments, billing codes, doctor’s notes. Even the most seasoned archaeologist would be intimidated. And even if they have been fully digitalized, electronic medical records are still a swamp to navigate. They are the problem that nobody warns you about. A real challenge.

But Max Barinov is not afraid of such challenges.

The Promise of Large Language Models

When navigating EMRs, large language models are an appealing tool. But they also have their drawbacks and limits. If you input too much information. If you feed them haphazardly, recklessly, and they become overwhelmed. That’s not something anyone involved in healthcare can afford. LLMs were supposed to make our lives easier, to eliminate the tedious, grunt work.

With Adentris, an Austin, Texas-based, Y Combinator-backed company, Barinov has set to work on developing an AI-supported platform that can help hospitals keep their EMRs compliant with US healthcare regulations, and 1996’s Health Insurance Portability and Accountability Act in particular. This has not proven easy, as these medical records are larger than any model’s context window. Checking the compliance of a single patient chart can be very, very expensive.

The Failure of the Naive Approach

To remain compliant with HIPAA, which was created in part to protect patient healthcare data, personally identifiable information, such as patient names, social security numbers, or dates of birth are replaced with secure, unique identifiers called data tokens. Processing a token has a cost of course, and if one were to take what is referred to as the “naive approach” and simply feed a tokenized EMR into a model and ask it to assess its compliance, the costs will balloon. While the elegance of LLMs was supposed to be their ability to crunch masses of information, when it comes to EMRs, this approach falls flat.

First, all of those tokens are billable, making the use of LLMs unsustainable. Processing EMRs also takes more time, and clinicians don’t see a benefit as it takes time for the model to accommodate all of those data tokens. Finally the model itself breaks down in the face of such overload. Overcome with data noise for enormous EMRs, it cannot really do its job properly. LLMs thus make faulty judgements about compliance, making their usage almost irrelevant.

Max knew this was the case and last year set out to provide an off-the-shelf solution that could tame these massive EMRs. With more than a dozen years of programming under his belt, having moved from producing web applications into founding engineer roles at Y Combinator startups like Ziina, Max knew that resolving the issue would require some architectural redesign.

Cutting EMRs to Size, Chart by Chart

“My goal has been to build reliable AI systems that reduce cognitive load for users,” says Max. “And I do this by focusing on deterministic interaction layers, token efficiency, and measurable outcomes.”

He credits his experience with helping to hone this design philosophy. “Over time I gravitated to systems that make teams faster and products more reliable,” Max recalls, “and then applied that approach to AI and conversational systems where determinism and cost control are critical.”

This was the case with Adentris and the challenge of bloated EMRs. His solution was elegant in its simplicity. Max designed a multi-agent architecture that processed it chart by chart, rather than naively, all at once and in one go. These Adentris AI agents were connected directly to hospitals’ EMR systems through a custom model context protocol (MCP) server exposing structured EMR data contained in Kipu Health, a software platform that allows EMR sharing.

As the AI agents went to work, each chart was cut to size for a compliance check. Information on medications, diagnoses, doctor’s notes became bite sized and digestible for the LLM. For once, the model was not overwhelmed, but processed in a systematic and efficient way. The data stack was attainable for this project. Max used the Nest.js microservices, React and Next.js interfaces, MongoDB and PostgreSQL for storage. These were all containerized and deployed on Azure Kubernetes Service. He also kept its operations restrained and minimalistic, and trained the architecture to stick to its primary purpose: maintaining compliance with the HIPAA.

“This has been my defining engineering challenge at Adentris,” he says, “because medical records are far larger than any LLM’s context window.”

With Adentris, Max Barinov had developed a solution for a problem that was plaguing hospitals. The only remaining question was if it worked.

Optimization and Evaluation

There was another innovation that helped make the Adentris platform worthwhile. As Barinov notes, medical records are constantly being added to. And a LLM processing that data would hypothetically need to parse all of that data over and over again to assess its compliance. This still would bump up the processing costs, which is why he hit upon another novel idea: track deltas. Rather than rescanning everything over, Adentris’s system keeps tabs on what’s been scanned. Only update chart sections trigger additional analysis. The other data is left cached.

When Max carried out an assessment of cost reduction, he found that token consumption using his system was 10 times less. Inspired by OpenAI’s evals methodology, he also embedded an evaluation framework into the development lifecycle of Adentris’s platform. What he found was that the system he had designed passed an HIPAA audit on the first audit and reduced clinician documentation time by about 80 percent. He was able to ship the system within three months. This was a real achievement in the slow-maving world of regulated healthcare. The company then began offering its tool to US hospitals in a series of commercial pilots. Adentris had arrived.

For Max Barinov, it was par for the course. A computer science major at ITMO University, he had cut his teeth rebuilding a web platform at Ziina and once launched a UK-based investment platform from scratch in fewer than four months. “I have a fascination with making complex systems navigable,” he says. It’s a design mindset that he will continue to use in future projects.

:::tip
This article is published under HackerNoon’s Business Blogging program.

:::