Chinese AI startup DeepSeek on Tuesday released a research paper and open-sourced its latest optical character recognition (OCR) model, DeepSeek-OCR 2, aiming to improve how machines interpret and process visual information. The company said the model is built on its DeepEncoder V2 architecture, which replaces rigid scanning-based visual encoding with a semantic reasoning approach, enabling AI systems to rearrange image components dynamically according to context and meaning.
DeepSeek said the model improves data compression efficiency and needs only 256 to 1,120 visual tokens to process complex document pages, cutting computational costs for downstream large language models. In benchmark tests on OmniDocBench v1.5, DeepSeek-OCR 2 achieved an overall score of 91.09%, a 3.73% improvement over the previous generation, with strong performance in reading order recognition.
The release comes as Chinese AI developers intensify efforts to improve foundational models and open-source capabilities, amid growing competition in large language models and multimodal AI systems. [Technode Reporting]
Related
