MolmoAct 2
MolmoAct 2 is an open-source robotics foundation model that brings capable, reasoning-driven robot control into real-world environments. Built by Ai2, it outperforms proprietary alternatives on industry benchmarks while remaining fully transparent for researchers to study, extend, and deploy.
Product Highlights
- Adaptive 3D Reasoning: MolmoAct 2-Think uses depth perception tokens with intelligent routing to reason deeply about spatial structure only when needed, improving performance without sacrificing speed.
- 37x Faster Inference: Reduced action call latency from 6,700ms to just 180ms (base) or 790ms (with adaptive reasoning), enabling near real-time robot responsiveness.
- Bimanual Manipulation Ready: Unlike its predecessor, MolmoAct 2 includes dual-arm coordination capabilities directly in the base model—no per-task fine-tuning required.
- Fully Open Ecosystem: Model weights, training datasets (including the 720-hour MolmoAct 2-Bimanual YAM dataset), code, and the open MolmoAct 2-FAST Tokenizer are all publicly available.
- Embodied Reasoning Backbone: Built on Molmo 2-ER, which achieves 63.8 average score across 13 embodied-reasoning benchmarks—surpassing GPT-5, Gemini 2.5 Pro, and other leading systems.
Use Cases
- Laboratory Automation: Deploy in wetlab environments for precise, repetitive tasks like CRISPR gene-editing workflows, sample handling, and equipment operation—tested with Stanford School of Medicine researchers.
- Household & Service Robotics: Handle kitchen organization, table bussing, towel folding, and object manipulation in unstructured home environments without environment-specific training.
- Research & Development: Study and extend a complete open VLA (Vision-Language-Action) pipeline, including novel adapter architectures and adaptive reasoning mechanisms.
- Low-Cost Robot Deployment: Leverage compatibility with affordable open-source hardware like SO-100/SO-101 arms to build accessible robotics solutions.
Target Audience
Robotics researchers, AI engineers, and academic institutions seeking a transparent, high-performance foundation model for embodied AI. Also ideal for automation engineers in laboratories and service industries who need reliable manipulation capabilities without proprietary lock-in.