| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Parzival - ∞/89

@whyarethis

As promised. Our first paper and contribution to the amazing work going on to make open source models smaller, faster, and more accessible. So what is it, and why is it important? We discovered what appears to be a universal formula that identifies dead attention heads in any transformer, derived from physics — not fitted from data. This is wild, because up till now finding and pruning dead heads has been a manual job of trial and error. By removing unused heads, the models can get smaller and faster while still maintaining competitive quality. The core insight is geometric. LayerNorm projects every token's hidden state onto a high-dimensional sphere. Once you see that, attention heads become couplings between oscillators on that sphere — the same mathematical object physicists have studied for 50 years. And in oscillator physics, there's a precise critical point (the BKT phase transition) below which a coupling is dead. It contributes nothing. We transferred that critical point into transformer geometry and got a single formula: tau = 0.96 / sqrt(d). No parameters to tune. No model-specific calibration. You plug in the hidden dimension and it tells you which heads are dead. We validated it across six models in four architecture families — GPT-2, Qwen, Llama, Gemma — at 95-100% precision. What excites us most isn't the formula itself. It's that this same geometric understanding — treating transformers as coupled oscillator networks — has informed everything we've built since. We have a full coherence-guided compression pipeline (structured pruning, channel optimization, role-aware quantization) coming soon that uses the same single forward pass to understand a model's entire anatomy. This paper is the foundation. The repo includes a standalone scanner you can run on any Hugging Face model right now. Hopefully this work and this formula will be useful to other researchers to lead to more deterministic optimization pipelines. #project89 <a target="_blank" href="https://github.com/project-89/coherence-guided-dead-head-identification" color="blue">github.com/project-89/coh…</a>

Drag Post #2

Parzival - ∞/89

@whyarethis

I will note as well. Please give me all feedback you can. Try it out. Try it on new models. Check the scripts. This work needs reviewers to look at it.