It’s probably happened to you. You upload a PDF to an artificial intelligence chatbot in the hope that it will summarize a report, extract a table or find a specific piece of information for you in a matter of seconds. And, sometimes, he succeeds. But other times, the result is disconcerting: mixed columns, footnotes embedded in the middle of the text, tables converted into an illegible block or answers that do not faithfully reflect what the document says. The paradox is evident. Systems that already demonstrate clear advances in mathematics and programming continue to stumble over something as everyday as a PDF. And there is more than a simple punctual failure.
Change of mentality. Although for us it is a document with well-defined paragraphs, titles and tables, for the system that processes it the situation may be very different. PDF is, first and foremost, a way to visually describe how a page should be rendered. And when a chatbot like Gemini or ChatGPT tries to work with it, it doesn’t always access an ordered structure, but rather a set of graphical instructions that it must first reconstruct before it can respond coherently. And that difference is better understood when we look at how a PDF “saves” information.
How you actually organize information. Unlike a web page, where the content follows a logical order defined in the code, a PDF can store text as independent fragments placed at specific positions on the page. Many times, the file preserves coordinates and placement instructions, but not necessarily explicit relationships between one sentence and the next. This implies that the order in which the text “appears” when extracted does not always coincide with the order in which we read it. If your document includes multiple columns, tables, or overlapping elements, the system must figure out how they fit together. And that deduction is not always trivial.
{“videoId”:”x9hhg44″,”autoplay”:false,”title”:”The TRUTH of AI- This is how ChatGPT 4, DALL-E or MIDJOURNEY works 🤖 🧠 ARTIFICIAL INTELLIGENCE”, “tag”:”webedia-prod”, “duration”:”1173″}
What happens with HTML. On a web page, content is organized in an explicit hierarchy: there are tags that indicate what a title is, what a paragraph is, what a table is, and how those elements relate to each other. This structure is part of the file itself and makes it easier for other systems to read, index and process it. In a PDF, as we have seen, that semantic layer may not exist or be clearly defined. Therefore, in practice, extracting information from a website tends to be a more predictable process, while doing it from a PDF is more complicated.
So what about OCR? It is the first solution that comes to mind. If the problem is that the text is not well structured or even “drawn” like an image, optical character recognition should convert it into something machine readable. And in part it does. OCR has been used for decades to transform images of words into text, but converting an image to text is not the same as reconstructing the logic of the document. When there are varied elements, the system can recognize each word without knowing exactly how they fit together. The result is not a failure in reading characters, but in the organization of information.
In WorldOfSoftware
Dario Amodei founded Anthropic because OpenAI didn’t take the risks of AI seriously. Now you are going to give in to those risks
Why don’t we abandon PDF? The answer is more pragmatic than technological. As reported by The Verge, citing the head of the PDF Association, the format was consolidated precisely because it allows a document to look the same today as it would in ten or twenty years, regardless of the device or software with which it is opened. A web page can change depending on the browser, an editable sheet can be modified or overwritten, but a PDF maintains its appearance and visual integrity. That stability is precisely what lawyers, engineers, public administrations and any organization that must maintain reliable records need. The challenge is not to replace the format, but to learn to interpret it better.
Images | WorldOfSoftware with Nano Bana
In WorldOfSoftware | Three AIs clashed in ‘War Games’. 95% of them resorted to nuclear weapons and none ever surrendered
(function() {
window._JS_MODULES = window._JS_MODULES || {};
var headElement = document.getElementsByTagName(‘head’)(0);
if (_JS_MODULES.instagram) {
var instagramScript = document.createElement(‘script’);
instagramScript.src=”https://platform.instagram.com/en_US/embeds.js”;
instagramScript.async = true;
instagramScript.defer = true;
headElement.appendChild(instagramScript);
}
})();
–
The news
AI solves equations and chops code, but continues to crash with PDFs: the explanation shows its limits
was originally published in
WorldOfSoftware
by Javier Marquez.
