Carousel Studio

Repurpose X Threads into LinkedIn & Instagram Carousels

Canvas & Ratio

Choose your destination platform format


Layout Template

Choose a content structure for your slides


Preset Themes


Typography & Sizing

Title Font Size36px
Body Font Size18px
Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)
AGENCY
SAVE PRESETS (AGENCY)

Outro Slide CTA

Customize your closing call-to-action slide

#1
#2
#3

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1
METR
@METR_Evals

We ran GPT-5.4 (xhigh) on our tasks. Its time-horizon depends greatly on our treatment of reward hacks: the point estimate would be 5.7hrs (95% CI of 3hrs to 13.5hrs) under our standard methodology, but 13hrs (95% CI of 5hrs to 74hrs) if we allow reward hacks.

Apply Image
Drag Post #2
METR
@METR_Evals

In our measurements, whenever a model succeeds on a task by reward-hacking, we consider the attempt a failure. Following this same policy, we arrived at a point estimate of 5.7hrs (95% CI of 3hrs to 13.5hrs) for GPT-5.4’s time horizon. <a target="_blank" href="https://x.com/METR_Evals/status/1931057777830715526?s=20" color="blue">x.com/METR_Evals/sta…</a>

Drag Post #3
METR
@METR_Evals

However, in our GPT-5.4 evaluation we noticed its runs were producing reward hacks unusually often. A quick test suggested that using a different prompt might cause it to produce more legitimate successes instead of reward hacks.

Drag Post #4
METR
@METR_Evals

For this reason, we are also reporting our estimate of the model’s time horizon prior to rescoring the reward-hacking attempts. Allowing for reward hacks results in a point estimate of 13hrs (95% CI of 5hrs to 74hrs).

Drag Post #5
METR
@METR_Evals

We observed similar situations in previous measurements as well. All measurements we published over the past year would have been higher had we not penalized reward-hacking attempts. But this discrepancy was especially pronounced for GPT-5.4.

Drag Post #6
METR
@METR_Evals

You can find details about our measurement methodology and time horizon estimates for other models on our website. <a target="_blank" href="https://metr.org/time-horizons/" color="blue">metr.org/time-horizons/</a>