DeepSeek Dropped Another Open-Source AI Model, Janus Pro

Last updated: 2025/01/31 at 5:56 AM

News Room Published 31 January 2025

DeepSeek has released Janus-Pro, an updated version of its multimodal model, Janus. The new model improves training strategies, data scaling, and model size, enhancing multimodal understanding and text-to-image generation.

Janus-Pro separates visual encoding for understanding and generation tasks, addressing stability and performance issues. The model also incorporates synthetic aesthetic data to enhance text-to-image generation, it also follows an autoregressive framework that separates visual encoding pathways for multimodal understanding and generation while maintaining a single transformer architecture. This design increases flexibility and reduces conflicts in the visual encoder’s roles, achieving competitive performance with task-specific models while keeping a unified structure.

Janus-Pro improves multimodal understanding and visual generation performance. Multimodal understanding is measured using the average accuracy of POPE, MME-Perception (scaled), GQA, and MMMU. Visual generation is evaluated on GenEval and DPG-Bench. Janus-Pro outperforms previous unified multimodal models and some task-specific models.

The model is based on DeepSeek-LLM-1.5B and DeepSeek-LLM-7B. The larger model performs better on benchmarks like MMBench and GenEval. It uses SigLIP-L as its vision encoder and supports 384×384 image inputs. Image generation relies on a tokenizer with a downsampling rate of 16.

DeepSeek’s Janus-Pro-7B and OpenAI’s DALL-E 3 are both advanced models in text-to-image generation. According to DeepSeek, Janus-Pro-7B outperforms DALL-E 3 in benchmarks such as GenEval and DPG-Bench. This performance is attributed to Janus-Pro-7B’s improved training processes, data quality, and model size, which contribute to more stable and detailed images.

The release of DeepSeek Janus has generated significant buzz and comments, Vedang Vatsa FRSA shared:

DeepSeek’s Janus-Pro-7B is here. Outperforms DALL-E 3 & Stable Diffusion on GenEval/DPG-Bench. Separates understanding/generation, scales data/models for stable image gen. Unified, flexible, cost-efficient. Open-source win!.

And, AI expert Huzaifa Shoukat posted:

DeepSeek’s new Janus Pro model is impressive. It’s a multimodal LLM that understands images and generates them too. The 1B model runs in the browser using WebGPU via Transformers.js.

Janus-Pro is available on GitHub under the MIT License, with model usage governed by the DeepSeek Model License. Users can refer to the repository for setup instructions.

DeepSeek Dropped Another Open-Source AI Model, Janus Pro

Leave a Reply Cancel reply

Stay Connected

Latest News

Credit Infrastructure & Analytics: Building Smarter Lending Models

How Richard Mille Takes Quartz Watches to a Surprising Level

These 2 AI Agent Frameworks Appear to Be Dominating Headlines—But Which One’s Better? | HackerNoon

How to Watch 2025 Six Nations Rugby Live From Anywhere

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News