By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Allen Institute for AI rivals Google, Meta and OpenAI with open-source AI vision model
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Allen Institute for AI rivals Google, Meta and OpenAI with open-source AI vision model
Computing

Allen Institute for AI rivals Google, Meta and OpenAI with open-source AI vision model

News Room
Last updated: 2025/12/16 at 11:31 AM
News Room Published 16 December 2025
Share
Allen Institute for AI rivals Google, Meta and OpenAI with open-source AI vision model
SHARE
A demo video from Ai2 shows Molmo tracking a specific ball in this cat video, even when it goes out of frame. (Allen Institute for AI Video)

How many penguins are in this wildlife video? Can you track the orange ball in the cat video? Which teams are playing, and who scored? Give me step-by-step instructions from this cooking video?

Those are examples of queries that can be fielded by Molmo 2, a new family of open-source AI vision models from the Allen Institute for AI (Ai2) that can watch, track, analyze and answer questions about videos: describing what’s happening, and pinpointing exactly where and when.

Ai2 cites benchmark tests showing Molmo 2 beating open-source models on short video analysis and tracking, and surpassing closed systems like Google’s Gemini 3 on video tracking, while approaching their performance on other image and video tasks.

In a series of demos for reporters recently at the Ai2 offices in Seattle, researchers showed how Molmo 2 could analyze a variety of short video clips in different ways. 

  • In a soccer clip, researchers asked what defensive mistake led to a goal. The model analyzed the sequence and pointed to a failure to clear the ball effectively.
  • In a baseball clip, the AI identified the teams (Angels and Mariners), the player who scored (#55), and explained how it knew the home team by reading uniforms and stadium branding.
  • Given a cooking video, the model returned a structured recipe with ingredients and step-by-step instructions, including timing pulled from on-screen text.
  • Asked to count how many flips a dancer performed, the model didn’t just say “five” — it returned timestamps and pixel coordinates for each one.
  • In a tracking demo, the model followed four penguins as they moved around the frame, maintaining a consistent ID for each bird even when they overlapped.
  • When asked to “track the car that passes the #13 car in the end,” the model watched an entire racing clip first, understood the query, then went back and identified the correct vehicle. It tracked cars that went in and out of frame.

Big year for Ai2

Molmo 2, announced Tuesday morning, caps a year of major milestones for the Seattle-based nonprofit, which has developed a loyal following in business and scientific circles by building fully open AI systems. Its approach contrasts sharply with the closed or partially open approaches of industry giants like OpenAI, Google, Microsoft, and Meta.

Founded in 2014 by the late Microsoft co-founder Paul Allen, Ai2 this year landed $152 million from the NSF and Nvidia, partnered on an AI cancer research initiative led by Seattle’s Fred Hutch, and released Olmo 3, a text model rivaling Meta, DeepSeek and others.

Ai2 has seen more than 21 million downloads of its models this year and nearly 3 billion queries across its systems, said Ali Farhadi, the Ai2 CEO, during the media briefing last week at the institute’s new headquarters on the northern shore of Seattle’s Lake Union. 

Ai2 CEO Ali Farhadi. (GeekWire File Photo / Todd Bishop)

As a nonprofit, Ai2 isn’t trying to compete commercially with the tech giants — it’s aiming to advance the state of the art and make those advances freely available.

The institute has released open models for text (OLMo), images (the original Molmo), and now video — building toward what he described as a unified model that reasons across all modalities.

“We’re basically building models that are competitive with the best things out there,” Farhadi said — but in a completely open manner, for a succession of different media and situations.

In addition to Molmo 2, Ai2 on Monday released Bolmo, an experimental text model that processes language at the character level rather than in word fragments — a technical shift that improves handling of spelling, rare words, and multilingual text.

Expanding into video analysis

With the newly released Molmo 2, the focus is video. To be clear: the model analyzes video, it doesn’t generate video — think understanding footage rather than creating it.

The original Molmo, released last September, could analyze static images with precision rivaling closed-source competitors. It introduced a “pointing” capability that let it identify specific objects within a frame. Molmo 2 brings that same approach to video and multi-image understanding.

An Ai2 analysis benchmarks Molmo 2 against a variety of closed-source models. (Click for larger image)

The concept isn’t new. Google’s Gemini, OpenAI’s GPT-4o, and Meta’s Perception LM can all process video. But in line with Ai2’s broader mission as a nonprofit institute, Molmo 2 is fully open, with its model weights, training code, and training data all publicly released.

That’s different from “open weight” models that release the final product but not the original recipe, and a stark contrast to closed systems from Google, OpenAI and others.

The distinction is not just an academic principle. Ai2’s approach means developers can trace a model’s behavior back to its training data, customize it for specific uses, and avoid being locked into a vendor’s ecosystem.

Ai2 also emphasizes efficiency. For example, Meta’s Perception LM was trained on 72.5 million videos. Molmo 2 used about 9 million, relying on high-quality human annotations.

The result, Ai2 claims, is a smaller, more efficient model that outperforms their own much larger model from last year, and comes close to matching commercial systems from Google and OpenAI, while being simple enough to run on a single machine.

When the original Molmo introduced its pointing capability last year — allowing the model to identify specific objects in an image — competing models quickly adopted the feature.

“We know they adopted our data because they perform exactly as well as we do,” said Ranjay Krishna, who leads Ai2’s computer vision team. Krishna is also a University of Washington assistant professor, and several of his graduate students also work on the project.

Farhadi frames the competitive dynamic differently than most in the industry.

“If you do real open source, I would actually change the word competition to collaboration,” he said. “Because there is no need to compete. Everything is out there. You don’t need to reverse engineer. You don’t need to rebuild it. Just grab it, build on top of it, do the next thing. And we love it when people do that.”

A work in progress

At the same time, Molmo 2 has some clear constraints. The tracking capability — following objects across frames — currently tops out at about 10 items. Ask it to track a crowd or a busy highway, and the model can’t keep up.

“This is a very, very new capability, and it’s one that’s so experimental that we’re starting out very small,” Krishna said. “There’s no technological limit to this, it just requires more data, more examples of really crowded scenes.”

Long-form video also remains a challenge. The model performs well on short clips, but analyzing longer footage requires compute that Ai2 isn’t yet willing to spend. In the playground launching alongside Molmo 2, uploaded videos are limited to 15 seconds.

And unlike some commercial systems, Molmo 2 doesn’t process live video streams. It analyzes recordings after the fact. Krishna said the team is exploring streaming capabilities for applications like robotics, where a model would need to respond to observations in real time, but that work is still early.

“There are methods that people have come up with in terms of processing videos over time, streaming videos,” Krishna said. “Those are directions we’re looking into next.”

Molmo 2 is available starting today on Hugging Face and Ai2’s playground.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article GreenScale and Vertiv partner on data centre platform deployment in Northern Ireland – UKTN GreenScale and Vertiv partner on data centre platform deployment in Northern Ireland – UKTN
Next Article 58 Best Wellness Valentine's Day Gifts for the Health Gurus Who Have It All 58 Best Wellness Valentine's Day Gifts for the Health Gurus Who Have It All
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Hackers Steal ‘Limited Data’ on 20% of SoundCloud Users
Hackers Steal ‘Limited Data’ on 20% of SoundCloud Users
News
How to Get YouTube Subscribers Using Google Ads | WordStream
How to Get YouTube Subscribers Using Google Ads | WordStream
Computing
Apple plans fabric displays for future devices like the HomePod
Apple plans fabric displays for future devices like the HomePod
News
Best robot vacuum deal: Get 0 off the Shark Robot Vacuum and Mop Combo
Best robot vacuum deal: Get $100 off the Shark Robot Vacuum and Mop Combo
News

You Might also Like

How to Get YouTube Subscribers Using Google Ads | WordStream
Computing

How to Get YouTube Subscribers Using Google Ads | WordStream

10 Min Read
3:2:1 Is Still Necessary. It’s Just No Longer Sufficient. | HackerNoon
Computing

3:2:1 Is Still Necessary. It’s Just No Longer Sufficient. | HackerNoon

20 Min Read
Compromised IAM Credentials Power a Large AWS Crypto Mining Campaign
Computing

Compromised IAM Credentials Power a Large AWS Crypto Mining Campaign

5 Min Read
Microsoft Releases Last Azure Linux 3.0 Update Of 2025
Computing

Microsoft Releases Last Azure Linux 3.0 Update Of 2025

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?