✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Forecasting Research Institute
@Research_FRI
🏆 In October, we invited external teams to submit to ForecastBench, our AI forecasting benchmark.

The challenge? Beat superforecasters—using any tools available (scaffolding, ensembling, etc).

The result? External submissions are now the most accurate models on our leaderboard—though superforecasters still hold #1.

@xai's model (grok-4-fast) is the leading external submission, at #2.

One of Cassi's entries takes the #3 spot

Here's what changed. 🧵
Thread image
Forecasting Research Institute
@Research_FRI
In October, we opened up ForecastBench’s tournament leaderboard to external submissions. Teams are free to use any tools they choose.

Several teams responded, including @xai, Cassi, @fractalai, @lightningrodai, and @_Mantic_AI. Thanks to all of them for participating on this challenging benchmark.

Models from @xai and Cassi outperformed all our baseline LLM configurations.
Thread image
Forecasting Research Institute
@Research_FRI
Here are the headline scores (lower is better, Brier):

• Superforecasters: 0.083

• grok-4-fast (external submission from @xai): 0.098

• ensemble_2_crowdadj (external submission from Cassi): 0.099

• @OpenAI’s GPT-5 (our own baseline run): 0.100

• @GoogleDeepMind’s Gemini-2.5-Pro (our own baseline run): 0.102

• @AnthropicAI’s Claude-Sonnet-4-5 (our own baseline run): 0.103

External submissions hold #2 and #3, ahead of all our baseline runs. However, all LLMs still lag behind superforecasters.
Thread image
Forecasting Research Institute
@Research_FRI
Updated trend extrapolations for LLM-superforecaster parity:

• Overall: Oct 2026 (95% CI: Dec 2025 – Sep 2027)

• Dataset: May 2026 (95% CI: Oct 2025 – Jan 2027)

• Market: Apr 2026 (95% CI: Apr 2025 – Jul 2029)

Estimates remain stable (within ~1 month of our October projections) despite new models and more resolved questions.
Thread image
Forecasting Research Institute
@Research_FRI
More models are coming soon.

Mid-January:

• GPT-5.1
• Gemini 3 Pro
• Grok-4.1
• GLM-4.6
• Kimi K2 Thinking

End of January:

• Claude Opus 4.5

We add models 50 days after their first forecast to ensure enough questions have resolved for stable rankings.
Forecasting Research Institute
@Research_FRI
Think you can do better? Submit to the public leaderboard. 🚀

How to submit: github.com/forecastingres…

Explore the data:

• Leaderboards: forecastbench.org
• Full datasets: forecastbench.org/datasets/
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export