Shutterstock – DC Studio
A common misconception about AI is to reduce it to mere ‘data processing’. While traditional systems transform data according to fixed rules, AI extracts, processes and generates information. It recognizes connections in data, interprets information and generates content that leads to new knowledge. It is information processing based on data processing. The transition from data to information processing also explains the current paradigm shift through AI.
Given this fact, it becomes understandable why so many AI projects fail due to the quality of their data. AI can only develop its full potential if it is based on a solid, well-prepared data basis. However, this does not only consist of data that is processed in various business applications, but also of unstructured content such as texts, images, videos or electronic communication. It is precisely this content that holds the real potential for AI, because it contains context, experiences and implicit knowledge, for example in emails, protocols, training videos or customer conversations.
The big problem in most companies is that there is no consistent, structured database that is used consistently across all departments and business applications. On the one hand, the structured data is stuck in isolation in various applications such as CRM, ERP or local databases. On the other hand, the unstructured content is not in a form that the AI can process. What is missing is a platform that enables both technical preparation and comprehensive content governance, i.e. defining clear responsibilities, quality standards and compliance rules. Without this framework, knowledge remains unused and AI cannot deliver on its promise.
The data stock is not capable of learning
This fact also explains why, according to Bitkom’s last survey on AI in September last year, 36 percent of those surveyed currently see the quality of the results as one of the biggest obstacles in AI projects. The Bitkom study cites topics that are directly related to content governance as even greater hurdles: compliance with high data protection requirements (48 percent) and the risk of data falling into the wrong hands (39 percent).
A high-profile study from MIT shocked the AI industry last summer by finding that at that point, only 5 percent of all AI projects could be moved into production in a way that demonstrated measurable business value. The study also pointed to an effect that the authors called a “learning gap”: Many AI systems were unable to improve their output through corrections and feedback from users – clearly a problem whose cause can be found in the company’s data infrastructure.
In its State of AI in the Enterprise 2026, Deloitte identifies the main reason for the failure of AI projects as the fundamental mismatch between the requirements for pilot projects and those for productive operation. A pilot project can usually be carried out within an isolated environment and using cleaned content. Production operations, on the other hand, have significantly higher requirements, not only in terms of computing resources and integration with other systems, but also in terms of governance, compliance and the performance of the underlying data infrastructure.
Transform unstructured data into knowledge
Processing unstructured data is considered the biggest challenge in building an AI-friendly data infrastructure. However, this is essential in order to use AI profitably, because this is where most of a company’s knowledge lies. AI, in turn, requires the availability of this knowledge in a processable form. “Ignoring unstructured data is becoming increasingly expensive,” warns the management consultancy Bain & Company in a current advisory. Instead, unstructured content should be viewed as a strategic resource. A data platform must do more than just store information. Rather, it should provide unstructured content in a timely, usable and trustworthy manner.
But while the cleansing and preparation of structured data is possible with some effort but using methods that have been widely used to date, this discipline is much more complex when it comes to unstructured content. Tagging, semantic analysis and interpretation as well as enrichment with metadata are sometimes necessary for this. In this way, documents of all kinds that are distributed throughout the company become AI-enabled resources. However, without the right structure and the right metadata, even high-quality AI models are susceptible to hallucinations, inaccuracies and missing insights.
AI pioneers achieve higher ROI
Part of a modern data infrastructure also includes effective governance mechanisms for the entire database and its use in AI applications. “In the age of the AI-driven economy, effective governance has emerged as a critical enabler of sustainable AI adoption, balancing innovation and appropriate risk management,” notes Box’s The State of AI in the Enterprise study. The sensitivity of the data processed by AI systems and the potential consequences of algorithmic decisions are therefore increasingly making this topic a top priority in companies.
In the Box study, 74 percent of companies surveyed cited data protection and data security as the most important concern when implementing AI applications, and almost as many (73 percent) considered data security and compliance to be the most important aspects when selecting an AI platform that processes content and unstructured data. However, only 24 percent of the companies surveyed have governance frameworks with uniform guidelines across all AI initiatives. Most (47 percent) frameworks only cover some of the governance aspects.
If companies cannot rely on the quality and correctness of the results of their AI applications, the risks to their operation are unacceptable. In addition to the GDPR, companies within the EU have recently also had to comply with other regulations such as the AI Act or the new Product Liability Act. The problem from the company’s perspective is that many details regarding their implementation have not yet been determined. For this reason, in the Bitkom survey, the implementation and compliance with legal requirements that have not yet been stipulated is the most frequently mentioned obstacle, at 53 percent.
In the Box study, the AI pioneers show how much companies can benefit from a modern data infrastructure and governance. According to the Box study, they achieved an average increase in productivity of 37 percent with their AI projects, while this is 24 percent for the “advanced” group and 15 percent for the “developing” group. In addition, the pioneers are more able to automate routine activities, reduce operating costs, increase customer satisfaction and increase their sales.
A DACH study that will be published shortly will systematically show for the first time how far companies actually are in terms of AI governance and production maturity – and where the biggest gaps lie between ambition and operational reality.
The State of AI Report shows how far companies really are in terms of AI maturity and governance and where expectations and reality diverge. Read the State of AI report now.
