Visualize Thread by @allen_ai

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Ai2

@allen_ai

Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf.

Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵

Ai2

@allen_ai

MolmoWeb works by looking at the same screen you do.

Given a task and a live webpage, it views the screenshot, decides what to do next, and takes action—clicking, typing, scrolling, switching tabs, or returning information back to you.

Ai2

@allen_ai

MolmoWeb can handle a wide range of everyday tasks, including navigating websites, filling out forms, searching and filtering product listings, and finding information—all without needing specialized APIs for each site.

youtube.com/watch?v=rzkBE8…

Ai2

@allen_ai

MolmoWeb was trained on a mix of datasets including:
◎ Trajectories generated by an AxTree-based LLM agent
◎ Human demonstrations collected via a custom Chrome extension
◎ Data that teaches the model to read & interpret what's on screen

Ai2

@allen_ai

MolmoWeb outperforms all open-weight models on every benchmark we tested, and even surpasses visual agents built on much larger models like GPT-4o-based SoM Agents.

It also beats OpenAI CUA on 3 out of 4 benchmarks.

Ai2

@allen_ai

Performance improves further by scaling compute at inference time.

On both WebVoyager and Online-Mind2Web, MolmoWeb with 4 parallel attempts surpasses the best single-attempt performance of every model we evaluated, including agents powered by GPT-5 and Gemini CU Preview.

Ai2

@allen_ai

While leading the pack among open models, MolmoWeb has limitations.

It can misread text, lose track after a wrong action, & struggle with vague prompts. For safety reasons, it’s also not trained on tasks with logins/financial transactions. These remain active research areas.

Ai2

@allen_ai

We're also releasing MolmoWebMix, a dataset for training web agents. It includes 150K+ trajectories:
⁌ 30K+ human trajectories
⁌ 7M GUI grounding examples
⁌ 2.2M screenshot QA examples

Everything needed to inspect, reproduce, & fine-tune MolmoWeb is openly available.

Ai2

@allen_ai

The web is the world's largest software platform. Agents that can navigate it reliably could dramatically expand access to information and digital services.

MolmoWeb gives the community a strong open foundation to build on.

Ai2

@allen_ai

🤖 Models: huggingface.co/collections/al…
🎮 Demo: molmoweb.allen.ai
📊 Data: huggingface.co/collections/al…
💻 Code: github.com/allenai/molmow…
📝 Blog: allenai.org/blog/molmoweb

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export