Metadata management has become the practical dividing line between AI systems that scale and those that stall.
As organizations push AI from experimentation into sustained production, the limiting factor is no longer models, but visibility into sprawling data estates. Survey results and field experience from companies such as HighFens Inc. show that without usable metadata, operational expansion increases cost and complexity instead of value, according to Frederic Van Haren (pictured), chief technology officer and founder of HighFens, a technology consultancy and services firm specializing in high-performance computing, AI infrastructure and big data.
“AI by itself is all about scaling. Here’s how it goes,” Van Haren explained. “Somebody comes up with an idea, they have some data, they build a model and the most valuable data you can collect is actually the data from your customers. Scaling becomes a real challenge … every organization that is successful will have to deal with it.”
Van Haren spoke with theCUBE’s Rob Strechay at the Future of Data Platforms Summit, during an exclusive broadcast on theCUBE, News Media’s livestreaming studio. They discussed how metadata has become the foundation for scaling AI across growing, hybrid data environments.
Why metadata management now defines AI scale
As AI workloads multiply, organizations are discovering that simply storing more data does not translate into better outcomes. Growth without visibility introduces cost, inefficiency and risk, especially when teams cannot easily determine what data exists, where it lives or how it is being reused. Metadata management is emerging as the connective tissue that allows infrastructure, data engineering and AI teams to operate with shared context instead of isolated assumptions, Van Haren explained.
“The amount of data you have access to, I think today there are organizations that have to drop data, simply because they can’t afford to not only store it, but also process all of that data,” he said. “Just accumulating that data doesn’t help by itself; you have to process it.”
Data quality compounds the problem because relevance is contextual, not absolute. What improves one model may degrade another, making static definitions of “good data” unreliable at scale. Without metadata that captures usage, lineage and intent, teams are left guessing which data assets actually contribute value, slowing iteration and increasing waste, Van Haren added.
“There is no magic way to figure this out,” he said. “What does that really mean is let’s assume I’m building a model to recognize pictures of cats and dogs and somebody gives me pictures of elephants and giraffes. That could be very good data for recognizing giraffes and elephants, but to me, that’s not data quality.”
Why metadata matters more than formats in hybrid environments
The push toward best-of-breed architectures and open formats further elevates metadata’s role. As enterprises mix tools, clouds and data stores, interoperability becomes less about physical access and more about shared understanding. Open formats reduce friction, but metadata is what makes those formats usable across environments and teams, according to Van Haren.
“We have customers who spend probably 40% of their time just massaging data from format A to format B,” he said. “It sounds really simple, but if you have petabytes of data and you have to read that data, you have to reprocess it.”
Hybrid environments intensify the need for metadata-driven control. Training and inference increasingly live in different places, with inference often pushed to public cloud or edge environments powered by platforms such as Nvidia. In that model, metadata becomes the only consistent layer tying together where data is created, processed and reused over time.
“When I talk about storage and data platforms, I see two things,” Van Haren said. “One is the metadata, which is what is it, where is it, all kind of information around it. And then you have the actual, the bits and the bytes where you store things. I think that in 2026, there’s going to be a lot more focus on metadata and metadata management. I think that is really key.”
Here’s the complete video interview, part of News’s and theCUBE’s coverage of the Future of Data Platforms Summit:
Photo: News
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
