Carousel Studio

Repurpose X Threads into LinkedIn & Instagram Carousels

Canvas & Ratio

Choose your destination platform format


Layout Template

Choose a content structure for your slides


Preset Themes


Typography & Sizing

Title Font Size36px
Body Font Size18px
Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)
AGENCY
SAVE PRESETS (AGENCY)

Outro Slide CTA

Customize your closing call-to-action slide

#1
#2
#3

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1
Forecasting Research Institute
@Research_FRI

Is it possible to spot a good forecast by its rationale? We used LLMs to score the reasoning behind 55,000+ forecasts and test the link between forecast accuracy and written rationales. We found that: • Causal reasoning is much more prevalent than statistical argumentation • It's easier to identify poor forecasters rather than excellent ones • Human ratings of rationale quality can be unreliable. 🧵A thread on the results:

Apply Image
Drag Post #2
Forecasting Research Institute
@Research_FRI

✍️ Good forecasts often come with rationales—written explanations of the reasoning behind a number. In our studies, we've collected millions of words of rationales, where forecasters explain their logic, cite evidence, and weigh competing considerations. But there are crucial things we don't know about rationales. For example—which features of a rationale are good predictors of forecasting accuracy?

Drag Post #3
Forecasting Research Institute
@Research_FRI

To figure this out, we took the following approach: 1) Defined 60 Explanation Quality Markers (EQMs): features like statistical or fact-based reasoning, guessing, confirmation bias, or extreme confidence. 2) Used an LLM to score rationales against each of the 60 EQMs. 3) Collapsed those 60 scores into a single composite number per rationale. 4) Correlated that composite score with forecasting accuracy.

Apply Image
Drag Post #4
Forecasting Research Institute
@Research_FRI

💻 We ran this pipeline on 55,000 forecast-rationale pairs from the ACE geopolitical forecasting tournament—the IARPA-funded competition that led to @PTetlock's original work on superforecasters. Our key findings were...

Drag Post #5
Forecasting Research Institute
@Research_FRI

1️⃣ Statistical reasoning is rare In the ACE tournament, forecasters typically expressed their reasoning in causal, not statistical, terms. Share of rationales that featured each EQM: 77% of rationales featured causal reasoning, but only 19% contained statistical reasoning, a 4x difference.

Apply Image
Drag Post #6
Forecasting Research Institute
@Research_FRI

2️⃣ EQMs predict accuracy (with some caveats) We tested whether a forecast's EQM score was predictive of actual forecast accuracy, comparing our new approach with earlier work on scoring rationale quality. We found that the EQM composite score correlated more strongly with forecasting accuracy than a pre-LLM benchmark did.

Drag Post #7
Forecasting Research Institute
@Research_FRI

👍 The following EQMs were positive indicators of forecast accuracy (upper-right quadrant): • Forecast and rationale align • Fact based • Concrete reasoning 👎 These EQMs were negative indicators of accuracy (lower-left quadrant): • Forecast and rationale misalign • Confirmation bias • Extreme confidence

Apply Image
Drag Post #8
Forecasting Research Institute
@Research_FRI

Important caveat: EQMs are more reliable for flagging weak forecasts and forecasters than picking out excellent ones. In other words, EQMs are mostly a screen, not a talent detector. The graph below sorts rationales into nine bins by EQM score. You can see that the biggest jump in accuracy happens across the bottom third of rationales (far left), with minimal gains towards the top.

Apply Image
Drag Post #9
Forecasting Research Institute
@Research_FRI

3️⃣ What looks good to humans isn't always what's accurate We compared human ratings of rationales from the ACE tournament with our EQM scores for the same rationales to find out which ratings were a better predictor of forecast accuracy. We found that EQM scores had a stronger correlation with forecast accuracy than human ratings.

Drag Post #10
Forecasting Research Institute
@Research_FRI

Why was this? As you can see below, human ratings correlate strongly with rationale length, but length is essentially uncorrelated to forecast-level accuracy. Human raters weren’t wrong directionally, but they appeared to place undue weight on some features, such as underweighting "red flags" like extreme confidence.

Apply Image
Drag Post #11
Forecasting Research Institute
@Research_FRI

Thank you to the study's authors: Chris Karvetski, @sicong_huang, @simas_kucinskas, Nadja Flechner, Jingyu Hu, @PTetlock, and @EzraKarger Read more on our Substack: <a target="_blank" href="https://forecastingresearch.substack.com/p/can-you-judge-a-forecast-by-its-rationale" color="blue">forecastingresearch.substack.com/p/can-you-judg…</a> Read the full working paper on SSRN: <a target="_blank" href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6862818" color="blue">papers.ssrn.com/sol3/papers.cf…</a>