| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Rohan Paul

@rohanpaul_ai

💊 New study finds that clinical LLMs can ace medical exams yet still perform weakly on realistic clinical tasks and safety. models scored 84%-90% on knowledge exams but only 45%-69% on practice tasks and 40%-50% on safety assessments. The authors analyze 39 benchmarks with about 2.3 million questions across 45 languages and 172 specialties, and see knowledge-style exams largely saturated, with top models near 84%-90% accuracy. On practice-focused benchmarks such as DiagnosisArena, MedAgentBench, and HealthBench, success falls to about 45%-69%, showing that models often fail when asked to pick diagnoses, management plans, or recommendations in full cases. Looking at task types, factual lookup stays near 85%-93%, but clinical reasoning drops to 50%-60%, diagnostic accuracy to 45%-55%, and safety checks reach only 40%-50%. The authors argue that exam-style benchmarks are misleading proxies for clinical readiness and that deployment must rely on practice-based evaluation with strict human-in-the-loop oversight instead of autonomous use. --- pubmed.ncbi. nlm.nih .gov/41325597/

Apply Image