🤖 AI & Machine Learning

@allen_ai: Today we're releasing MolmoWeb...

@allen_ai
21 views Mar 25, 2026
Advertisement
1
Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf.

Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵
Media image
2
MolmoWeb works by looking at the same screen you do.

Given a task and a live webpage, it views the screenshot, decides what to do next, and takes action—clicking, typing, scrolling, switching tabs, or returning information back to you.
Media image
3
MolmoWeb can handle a wide range of everyday tasks, including navigating websites, filling out forms, searching and filtering product listings, and finding information—all without needing specialized APIs for each site.

youtube.com/watch?v=rzkBE8…
4
MolmoWeb was trained on a mix of datasets including:
◎ Trajectories generated by an AxTree-based LLM agent
◎ Human demonstrations collected via a custom Chrome extension
◎ Data that teaches the model to read & interpret what's on screen
Media image
5
MolmoWeb outperforms all open-weight models on every benchmark we tested, and even surpasses visual agents built on much larger models like GPT-4o-based SoM Agents.

It also beats OpenAI CUA on 3 out of 4 benchmarks.
Media image
Media image
6
Performance improves further by scaling compute at inference time.

On both WebVoyager and Online-Mind2Web, MolmoWeb with 4 parallel attempts surpasses the best single-attempt performance of every model we evaluated, including agents powered by GPT-5 and Gemini CU Preview.
7
While leading the pack among open models, MolmoWeb has limitations.

It can misread text, lose track after a wrong action, & struggle with vague prompts. For safety reasons, it’s also not trained on tasks with logins/financial transactions. These remain active research areas.
8
We're also releasing MolmoWebMix, a dataset for training web agents. It includes 150K+ trajectories:
⁌ 30K+ human trajectories
⁌ 7M GUI grounding examples
⁌ 2.2M screenshot QA examples

Everything needed to inspect, reproduce, & fine-tune MolmoWeb is openly available.
9
The web is the world's largest software platform. Agents that can navigate it reliably could dramatically expand access to information and digital services.

MolmoWeb gives the community a strong open foundation to build on.
Actions
Visual Editor Carousel Maker NEW
Update Thread
What You Can Do
  • Download as PDF
  • Save to Notion
  • Export as Markdown
  • Visual Editor
  • LinkedIn & Instagram Carousel Maker
Create Free Account

Includes 7-day Premium trial

Advertisement