✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Ai2
@allen_ai
Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf.

Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵
Thread image
Ai2
@allen_ai
MolmoWeb works by looking at the same screen you do.

Given a task and a live webpage, it views the screenshot, decides what to do next, and takes action—clicking, typing, scrolling, switching tabs, or returning information back to you.
Thread image
Ai2
@allen_ai
MolmoWeb can handle a wide range of everyday tasks, including navigating websites, filling out forms, searching and filtering product listings, and finding information—all without needing specialized APIs for each site.

youtube.com/watch?v=rzkBE8…
Ai2
@allen_ai
MolmoWeb was trained on a mix of datasets including:
◎ Trajectories generated by an AxTree-based LLM agent
◎ Human demonstrations collected via a custom Chrome extension
◎ Data that teaches the model to read & interpret what's on screen
Thread image
Ai2
@allen_ai
MolmoWeb outperforms all open-weight models on every benchmark we tested, and even surpasses visual agents built on much larger models like GPT-4o-based SoM Agents.

It also beats OpenAI CUA on 3 out of 4 benchmarks.
Thread image
Thread image
Ai2
@allen_ai
Performance improves further by scaling compute at inference time.

On both WebVoyager and Online-Mind2Web, MolmoWeb with 4 parallel attempts surpasses the best single-attempt performance of every model we evaluated, including agents powered by GPT-5 and Gemini CU Preview.
Ai2
@allen_ai
While leading the pack among open models, MolmoWeb has limitations.

It can misread text, lose track after a wrong action, & struggle with vague prompts. For safety reasons, it’s also not trained on tasks with logins/financial transactions. These remain active research areas.
Ai2
@allen_ai
We're also releasing MolmoWebMix, a dataset for training web agents. It includes 150K+ trajectories:
⁌ 30K+ human trajectories
⁌ 7M GUI grounding examples
⁌ 2.2M screenshot QA examples

Everything needed to inspect, reproduce, & fine-tune MolmoWeb is openly available.
Ai2
@allen_ai
The web is the world's largest software platform. Agents that can navigate it reliably could dramatically expand access to information and digital services.

MolmoWeb gives the community a strong open foundation to build on.
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export