| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Ryan Shea

@ryaneshea

Introducing AI IQ Bio: the most comprehensive set of biotech benchmarks in the world ...& Bio IQ: the most comprehensive "biotech capabilities index" ever produced Benchmark sources include benchmarksdotbio, FutureHouse, SecureBio, Anthropic, OpenAI & more

Apply Image

Drag Post #2

Ryan Shea

@ryaneshea

You can find the full set of benchmarks as well as the composite Bio IQ score here: <a target="_blank" href="https://www.aiiq.org/bio/" color="blue">aiiq.org/bio/</a>

Drag Post #3

Ryan Shea

@ryaneshea

Thanks to @jperla for his assistance in thinking through the curation of Bio IQ and for his TrustedRouter product which came in handy when putting it together.

Drag Post #4

Ryan Shea

@ryaneshea

One very interesting result is that GPT-5.5, Opus-4.8 and Mythos 5 are each about as capable in biotech as one another if refusals are not counted as wrong answers. However, in the real world, scoring refusals as wrong answers is a more accurate and useful measure of capabilities as it reflects your actual experience when trying to use the model to accomplish a given task. And when you count refusals as wrong answers, GPT-5.5 is the best model by a pretty wide margin, while Opus-4.8 and Mythos 5 drop way down in performance. Refusals are particularly hard to handle because on the one hand you want an accurate reflection of the model's true capabilities when a trusted partner is using the model, so you don't want to penalize the refusals, otherwise you'll be underselling the capabilities of the model. But on the other hand, you don't want to just give models a free pass for not answering a question because then they can easily train on not answering the hardest questions and get higher scores, which is something you don't want to incentivize. Benchmarks should not be easily gameable. All in all this shows that refusals matter quite a bit within sensitive domains like biotechnology. Neither of the two ways that benchmarks handle refusals in scoring are ideal or without issues. And we actually need both to get a complete picture of model capabilities and limits. <a target="_blank" href="https://x.com/ryaneshea/status/2069806849147175146" color="blue">x.com/ryaneshea/stat…</a>

Drag Post #5

Ryan Shea

@ryaneshea

Shoutout to @kenbwork and @LatchBio for the fantastic work on benchmarksdotbio (excellent benchmarks and beautiful site). Shoutout to @SGRodriques and the @FutureHouseSF team for producing an exquisite set of benchmarks with a very wide range. And shoutout to the entire @SecureBio team for their incredible work on benchmarks measuring and advancing biosecurity and biosafety.