Canvas & Ratio
Choose your destination platform format
Layout Template
Choose a content structure for your slides
Preset Themes
Typography & Sizing
Brand Kit Customization
AGENCYConfigure brand assets for headers & footers
Outro Slide CTA
Customize your closing call-to-action slide
Background Pattern
Build Your Carousel
Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Introducing AI IQ Bio: the most comprehensive set of biotech benchmarks in the world ...& Bio IQ: the most comprehensive "biotech capabilities index" ever produced Benchmark sources include benchmarksdotbio, FutureHouse, SecureBio, Anthropic, OpenAI & more





You can find the full set of benchmarks as well as the composite Bio IQ score here: <a target="_blank" href="https://www.aiiq.org/bio/" color="blue">aiiq.org/bio/</a>

Thanks to @jperla for his assistance in thinking through the curation of Bio IQ and for his TrustedRouter product which came in handy when putting it together.

One very interesting result is that GPT-5.5, Opus-4.8 and Mythos 5 are each about as capable in biotech as one another if refusals are not counted as wrong answers. However, in the real world, scoring refusals as wrong answers is a more accurate and useful measure of capabilities as it reflects your actual experience when trying to use the model to accomplish a given task. And when you count refusals as wrong answers, GPT-5.5 is the best model by a pretty wide margin, while Opus-4.8 and Mythos 5 drop way down in performance. Refusals are particularly hard to handle because on the one hand you want an accurate reflection of the model's true capabilities when a trusted partner is using the model, so you don't want to penalize the refusals, otherwise you'll be underselling the capabilities of the model. But on the other hand, you don't want to just give models a free pass for not answering a question because then they can easily train on not answering the hardest questions and get higher scores, which is something you don't want to incentivize. Benchmarks should not be easily gameable. All in all this shows that refusals matter quite a bit within sensitive domains like biotechnology. Neither of the two ways that benchmarks handle refusals in scoring are ideal or without issues. And we actually need both to get a complete picture of model capabilities and limits. <a target="_blank" href="https://x.com/ryaneshea/status/2069806849147175146" color="blue">x.com/ryaneshea/stat…</a>

Shoutout to @kenbwork and @LatchBio for the fantastic work on benchmarksdotbio (excellent benchmarks and beautiful site). Shoutout to @SGRodriques and the @FutureHouseSF team for producing an exquisite set of benchmarks with a very wide range. And shoutout to the entire @SecureBio team for their incredible work on benchmarks measuring and advancing biosecurity and biosafety.