Mistral AI Launches API For LLM-Based OCR Of Multimodal Documents

Mistral AI Launches API for LLM-Based OCR of Multimodal Documents

Last updated: 2025/03/31 at 7:33 AM

News Room Published 31 March 2025

Now available on Mistral’s la Plateforme SaaS, Mistral OCR aims to provide an OCR solution for digitizing complex documents that interleave text and images, tables, mathematical expressions, and advanced layouts. This makes it particularly suitable for digitizing scientific research, historical documents and artifacts, user manuals, and more, the company says.

Mistral OCR uses Mistral LLMs to understand content extracted by OCR-ing a document. This helps understanding its context and the relationships between document elements, which makes it suitable for use with RAG systems taking multimodal documents as input.

According to the company’s own benchmarking, Mistral OCR outperforms other leading OCR solutions, including Google Document AI, Azure OCR, Gemini 1.5 and 2.0, and GPT-4o.

Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.

Mistral AI maintains that its OCR API is the only one that extracts embedded images from documents along with text. The resulting text plus images are exported into a markdown file. Additional formats are supported for structured output, such as JSON, to chain OCR output into a more complex workflow), which can be useful to build agents.

When it comes to multilingual support, Mistral AI emphasizes its solution can parse, understand, and transcribe thousands of scripts, fonts, and languages.

Mistral OCR is already powering Mistral’s le Chat LLM-powered chat solution and will be available soon for on-premises deployments. According to the company, it can process up to 2000 pages per minute on a single node.

To use Mistral OCR API in Python, you install the mistralai package, which provides support for authentication and for using all capabilities provided by Mistral API. To process a file, you need to upload it first, as shown in the following snippet:


# Upload PDF file to Mistral's OCR service
uploaded_file = client.files.upload(
 file={
 "file_name": pdf_file.stem,
 "content": pdf_file.read_bytes(),
 },
 purpose="ocr",
)

# Get URL for the uploaded file
signed_url = client.files.get_signed_url(file_id=uploaded_file.id, expiry=1)

# Process PDF with OCR, including embedded images
pdf_response = client.ocr.process(
 document=DocumentURLChunk(document_url=signed_url.url),
 model="mistral-ocr-latest",
 include_image_base64=True
)

# Convert response to JSON format
response_dict = json.loads(pdf_response.model_dump_json())

The API is currently limited to files that do not exceed 50MB in size or 1,000 pages in length. The price is set to 1,000 pages/USD or 2,000 pages/USD when using batch OCR.

Mistral AI Launches API for LLM-Based OCR of Multimodal Documents

Leave a Reply Cancel reply

Stay Connected

Latest News

PureRAT Malware Spikes 4x in 2025, Deploying PureLogs to Target Russian Firms

Google’s AI-powered Flow won’t make filmmaking great again

Epic New Balance Memorial Day sale live from $18 at Amazon — 15 deals I’d shop on sneakers, apparel and more

To build tomorrow’s power grid, the United States should look to geothermal energy

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News