Databricks Inc. today unveiled a new suite of tools aimed at simplifying the development of artificial intelligence agents for enterprise use.
The centerpiece of the announcement is Mosaic Agent Bricks, a unified workspace that automates agent building and optimization using customers’ enterprise data and synthetic equivalents. Agent Bricks enters public beta at the company’s annual Data + AI Summit today in San Francisco, alongside new offerings that include serverless graphic processing unit support and MLflow 3.0, Databricks’ latest platform for managing machine learning and generative AI applications.
Databricks also announced updates to its Unity Catalog that expand support for Apache Iceberg — the open-source table format designed for large analytical datasets in data lakes — and introduce new features designed to bridge the gap between data platforms and business users.
Databricks positions Agent Bricks as a response to a growing challenge in enterprise AI: the complexity and cost of bringing prototype agents into production. The company said many organizations rely on trial-and-error methods, spot-checking AI outputs or relying on subjective judgments about quality in an inconsistent, time-consuming and financially inefficient process.
Evaluation shortfall
“One of the biggest things that keeps these models from getting into production is that there’s no good way to evaluate whether or not agents are going to do what you expect them to do,” said Joel Minnick (pictured), Databricks’ vice president of marketing.
Agent Bricks attempts to eliminate that guesswork through a series of automated steps. Users start by describing the task they want an agent to perform and connecting their enterprise data.
The platform then generates domain-specific synthetic data that matches data already collected and furnished by the organization. It also creates evaluation benchmarks using its optimization engine called Test-time Adaptive Optimization.
Rob Strechay, managing director at theCUBE Research, a News sister company, said Databricks’ focus on cost and performance addresses “a huge fear for organizations moving or looking to move from proof-of-concept to production.” Agent Bricks’ ability to generate synthetic data targets “the lack of training data for specific use cases, giving agents a larger pool of information to learn from.”
Here come the judges
Large language model “judges” generate questions and expected answers to assess model performance. This enables the system to iterate through various configurations of models, retrieval setups, and tuning parameters and suggests options that balance performance and cost.
Minnick said the approach allows organizations to experiment with the trade-offs between accuracy and cost. “Maybe we chose Llama 7B and were able to achieve 98% quality at this cost, while we achieved 85% quality using Anthropic at a much, much lower cost,” he said. “You have a lot of control over exactly how the LLM judges perform.”
Strechay called the AI judges “one of the most important parts” of the announcement. “It creates evaluation criteria, generates vetting data and provides detailed insights on agent performance,” he said. “It also creates a way for users to customize and create their own judging criteria for each agent.”
Built-in governance and integration with existing enterprise controls allow organizations to move AI projects from experimentation to deployment without requiring additional tooling or infrastructure.
Built-in problem-solvers
Agent Bricks comes with agents that address several common use cases.
The Information Extraction Agent pulls structured data from unstructured documents like PDFs and emails. This allows retail companies, for example, to extract pricing and product details from supplier catalogs with varied formatting.
A Knowledge Assistant Agent improves chatbot accuracy by grounding responses in verified internal documents. The agent aims to enable technicians to retrieve answers from standard operating procedures or manuals without searching.
A Custom LLM Agent tackles tasks like summarization or classification, with the ability to tailor output to industry-specific languages. Healthcare providers, for example, can deploy models that reformat patient notes into clinician-friendly summaries, Databricks said.
A Multi-Agent Supervisor allows organizations to orchestrate several agents working in tandem. A financial services example combines agents specializing in intent detection, document search and compliance checks to deliver more personalized responses for advisors and clients.
Minnick said customers are seeing dramatic results in early trials. Biopharmaceutical company AstraZeneca plc “built an agent in 60 minutes that has parsed over 400,000 clinical trial documents, pulled out relevant information and compiled it for their researchers,” he said. Another healthcare use case is summarizing patient notes and lab results to support clinicians.
Broader Iceberg access
The updated Unity Catalog now supports Apache Iceberg managed tables through Representational State Transfer Catalog application programming interfaces. Databricks said this makes Unity the only catalog that allows external engines such as the open-source Trino, Snowflake Inc., and Amazon Web Services Inc.’s Elastic MapReduce to read and write to performance-optimized Iceberg tables with governance controls intact. The company said this eliminates table format lock-in and enables interoperability across data environments.
Unity Catalog now offers three key Iceberg-related features: the ability to create Iceberg-managed tables, transparent governance of Iceberg tables in external catalogs, and integration with the Delta Sharing ecosystem. That means organizations can manage and share data regardless of table format or compute engine. Databricks Assistant, a natural language interface embedded in Unity Catalog, now helps users explore data, assess its quality, and understand context.
Databricks is also targeting business users with Unity Catalog Metrics and an internal marketplace that aims to make data more accessible by surfacing curated data assets organized by business domain.
Unity Catalog Metrics allows key performance indicators and business metrics to be defined and governed as first-class data assets. They can be queried using SQL and are decoupled from specific business intelligence tools for consistent reporting.
Simpler GPU management
Serverless GPU support enables users to run machine learning and generative AI workloads without directly managing GPU infrastructure. This option, now in beta test, lowers the barrier for smaller teams or pilot projects that previously struggled with the complexity and cost of GPU provisioning.
In addition, Databricks released MLflow 3.0, the latest version of its open-source platform for managing machine learning workflows. The update includes a new architecture called LoggedModel, which directly ties model weights and code to training runs. Enhanced visualization and debugging tools allow teams to compare performance across environments, while tighter integration with the Databricks Lakehouse aims to simplify governance, traceability and production deployment.
The company said the three offerings constitute an effort to make AI development more predictable and cost-efficient for enterprises that may lack deep in-house machine learning expertise. It’s betting that many organizations will adopt advanced AI tools if the process can be simplified and performance made more transparent.
Agent Bricks and serverless GPU compute are available in beta test starting today. MLflow 3.0 is generally available.
Photo: News
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU