By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Qwen Team Open Sources State-of-the-Art Image Model Qwen-Image
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Qwen Team Open Sources State-of-the-Art Image Model Qwen-Image
News

Qwen Team Open Sources State-of-the-Art Image Model Qwen-Image

News Room
Last updated: 2025/08/26 at 10:59 AM
News Room Published 26 August 2025
Share
SHARE

Qwen Team recently open sourced Qwen-Image, an image foundation model. Qwen-Image supports text-to-image (T2I) generation and text-image-to-image (TI2I) editing tasks, and outperforms other models on a variety of benchmarks.

Qwen-Image uses a Qwen2.5-VL for text inputs, a Variational AutoEncoder (VAE) for image inputs, and a Multimodal Diffusion Transformer (MMDiT) for image generation. The combined model “excels” at text rendering, including both English and Chinese text. Qwen evaluated the model on a suite of T2I and TI2I benchmarks, including DPG, GenEval, GEdit and ImgEdit, where it achieved the highest overall score. On image understanding tasks, while not as good as specially trained models, Qwen-Image has performance “remarkably close” to theirs. In addition, Qwen created AI Arena, a comparison site where human evaluators can rate pairs of generated images. Qwen-Image currently ranks third, in competition with five high quality closed models including GPT Image 1. According to Qwen:

Qwen-Image is more than a state-of-the-art image generation model—it represents a paradigm shift in how we conceptualize and build multimodal foundation models. Its contributions extend beyond technical benchmarks, challenging the community to rethink the roles of generative models in perception, interface design, and cognitive modeling…As we continue to scale and refine such systems, the boundary between visual understanding and generation will blur further, paving the way for truly interactive, intuitive, and intelligent multimodal agents.

To create the model’s training dataset, the Qwen Team “collected and annotated billions of image-text pairs” with images from four main categories: nature, design, people, and “synthetic data.” Nature images are about 55% of the data. Design, which includes images of paintings, posters, and GUIs, is about 27% of the data and includes many images with “rich textual elements.” This initial dataset was heavily filtered to remove low-quality images. They also designed an annotation framework to generate detailed captions and metadata for each image. 

Qwen-Image Model Architecture. Image Source: Qwen-Image Tech Report

The Qwen Team designed a pre-training curriculum with multiple strategies that progressively improved the model’s output. The first strategy involved upscaling images from 256×256 to 640×640 then to 1328×1328 pixels. The other strategies involved introducing images containing rendered text, images with a more varied distribution of domains and resolution, and synthetic images with “surrealistic styles or…extensive textual content.”

Finally the model was post-trained in two stages. First was supervised fine-tuning (SFT) on a dataset with “meticulous human annotation” to produce detailed and realistic images. Next was reinforcement learning (RL) using two different policy optimization strategies, where the model produced multiple images for a prompt and human judges picked the best and worst.

Hacker News users generally praised the model’s performance, comparing it to gpt-image-1. One user said of the release, “this seems huge.” Another wrote:

Besides style transfer, object additions and removals, text editing, manipulation of human poses, it also supports object detection, semantic segmentation, depth/edge estimation, super-resolution and novel view synthesis (NVS) i.e. synthesizing new perspectives from a base image. It’s quite a smorgasbord! Early results indicate to me that gpt-image-1 has a bit better sharpness and clarity but I’m honestly not sure if OpenAI doesn’t simply do some basic unsharp mask or something as a post-processing step?

The Qwen-Image code is available on GitHub and model files can be downloaded from Huggingface.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article How to scan a QR code on iPhone and Android | Stuff
Next Article LLVM 21.1 Released With AMD GFX1250 Target, Improved RISC-V, New C/C++ Features
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

I’ve finally ditched OneDrive for this much cheaper (and better) alternative
Computing
Gemini is finally making its way to Google’s smart home devices
Gadget
Blaupunkt Mini QD TVs With Google TV OS, 108W Speakers Launched In India: Price, Specs, Features
Mobile
Robomart Unveils Delivery Robot to Challenge Market
News

You Might also Like

News

Robomart Unveils Delivery Robot to Challenge Market

1 Min Read
News

You have just two more days to get a year of US Mobile unlimited for less than $200

3 Min Read
News

Snag Any M4 MacBook Air for $200 off in This Killer Labor Day Deal

4 Min Read
News

Tourist faces being thrown in Turkish jail for POLE DANCING on flag

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?