The best judge of artificial intelligence could be AI — at least that’s the idea behind Databricks Inc.’s new tool, Agent Bricks.
Built on Databricks’ Mosaic AI platform, Agent Bricks allows users to request task-specific agents and then generates a series of large language model “judges” to determine that agent’s reliability.
Databricks’ Jonathan Frankle talks about developing Agent Bricks.
“Agent Bricks is really the generalization of the best practices, the verticals that we saw, the styles that people use, the techniques that we saw work the best, all in one product,” said Jonathan Frankle (pictured), chief AI scientist of Databricks Inc. “It reflects philosophically how we think people should build agents. It reflects what worked and what didn’t work. Now it’s ready for prime time.”
Frankle spoke with theCUBE’s John Furrier at the Databricks’ Data + AI Summit, during an exclusive broadcast on theCUBE, News Media’s livestreaming studio. They discussed how Agent Bricks evolved from internal best practices into a full-fledged product designed to evaluate AI with AI.
Agent Bricks teaches you to think like an engineer
The seed for Agent Bricks came from customers’ need to evaluate their agents, according to Frankle. Ensuring that an agent is reliable starts with defining a criteria and a set of practices for comparing agent performance against it.
“AI is a little bit unpredictable, non-deterministic, fuzzy,” Frankle explained. “That’s where LLM judges come in. You have an LLM that evaluates when the LLM is working well. To do that, you have to make sure the LLM judge knows what you’re trying to do, knows how to measure it. It’s really about, ‘Does the LLM judge agree with a human judge?’”
Getting all of the humans to agree on what the model should look like can be half the battle, Frankle suggested. That’s why humans are in the loop throughout the agent development process. Databricks has essentially created scaled reinforcement learning, wherein the judges can train an agent to behave how developers want it to.
“You don’t need to give a bunch of labeled data,” Frankle said. “Getting labeled data is really hard for humans. But getting a judge is not that hard. And we took a lot of time to figure out what was easy and hard for our customers to get, how we could do the science to make it possible to customize an LLM using that data.”
Despite the rise of vibe coding — which Databricks’ recent updates enable — Frankle hopes that tools such as Agent Bricks will push all its users to think more like software engineers. Agent Bricks forces customers to test and evaluate over and over again until the model is extremely reliable.
“An AI demo, you can slap together, you can show to your CEO, it’ll have some cool behaviors and everybody will be excited,” Frankle said. “That’s not how you get into production. AI engineering is building a system that is carefully calibrated to solve a particular problem. You can measure how well it’s solving that particular problem. When it doesn’t work the way you want it to, you add more measurement to make sure you never see that problem again.”
Here’s the complete video interview, part of News’s and theCUBE’s coverage of the Databricks’ Data + AI Summit:
Photo: News
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU