The Allen Institute for AI released a new AI robotics system that uses novel approaches to help robots navigate messy real-world environments, while making all of the model’s code, data, and training methods publicly available under open-source principles.
The system, called MolmoAct, converts 2D images into 3D visualizations, previews its movements before acting, and lets human operators adjust those actions in real time. It differs from existing robotics models that often work as opaque black boxes, trained on proprietary datasets.
Ai2 expects the system to be used by robotics researchers, companies, and developers as a foundation for building robots that can operate in unstructured environments such as homes, warehouses, and disaster response scenes.
In demos last week at Ai2’s new headquarters north of Seattle’s Lake Union, researchers showed MolmoAct interpreting natural language commands to direct a robotic arm to pick up household objects, such as cups and plush toys, and move them to specific locations.
Researchers described it as part of AI2’s broader efforts to create a comprehensive set of open-source AI tools and technologies. The Seattle-based research institute was founded in 2014 by the late Microsoft co-founder Paul Allen, and is funded in part by his estate.
Ai2’s flagship OLMo large language model is a fully transparent alternative to proprietary systems, with openly available training data, code, and model weights, designed to support research and public accountability in AI development.
The institute’s projects are moving in “one big direction” — toward a unified AI model “that can do reasoning and language, that can understand images, videos, that can control a robot, and that can make sense of space and actions,” said Ranjay Krishna, Ai2’s research lead for computer vision, and a University of Washington Allen School assistant professor.
MolmoAct builds on AI2’s Molmo multimodal AI model — which can understand and describe images — by adding the ability to reason in 3D and direct robot actions.