Visualize Thread by @itsalexvacca

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark PRO

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Alex Vacca

@itsalexvacca

8 Google engineers wrote the paper that every AI company now uses as their bible. OpenAI built GPT on it, Anthropic built Claude on it, and Meta built LLaMA on it.

Every LLM worth billions uses this paper's transformer architecture as the foundation...

Alex Vacca

@itsalexvacca

Before 2017, teaching computers human language was torture.
AI would read text like humans reading through a keyhole - one word at a time.

They were slow, forgot context, and choked on long passages.
Then 8 researchers decided to flip things up...

Alex Vacca

@itsalexvacca

They published an 8-page paper titled "Attention Is All You Need"

The idea was simple: Instead of reading word by word, why not look at everything at once? Like how you can glance at a page and immediately see which words relate to each other.

They called it a Transformer.

Alex Vacca

@itsalexvacca

An example: "The bank by the river bank was full of cash."

Old AI would get confused. Two banks?

Transformers see everything at once. "Bank" near "river" = riverbank. "Bank" near "cash" = financial institution.

One formula makes this work & it's worth more than most countries.

Alex Vacca

@itsalexvacca

Attention(Q,K,V) = softmax(QK^T/√d)V

That's it. This equation alone created trillions in AI market value.

Every word calculates relevance with every other word. "Apple" + "stock" = company. "Apple" + "pie" = fruit.

But they didn't stop at one attention mechanism.

Alex Vacca

@itsalexvacca

Eight attention mechanisms ran in parallel.

One tracked grammar
Another found subject-verb connections
A third linked pronouns
The other five caught different meaning patterns. All simultaneously.

When tested, it broke every record.

Alex Vacca

@itsalexvacca

Best translation model: 26.3 BLEU score, weeks to train
Their Transformer: 28.4 BLEU, just 3.5 days

A 2-point jump is like going from dial-up to broadband. 10x faster training.

But OpenAI saw something in those pages that even Google missed.

Alex Vacca

@itsalexvacca

OpenAI made one surgical change that created ChatGPT.

The original Transformer had an encoder (understands text) and a decoder (generates text). OpenAI threw away the encoder entirely. Just kept the decoder.

Why would removing half the system make it better?

Alex Vacca

@itsalexvacca

Encoders need paired data - English sentence, German translation.
Whereas decoders only need raw text, maybe the entire internet.

Just predict the next word which needs no translation needed.

OpenAI turned Google's translation machine into a universal intelligence engine.

Alex Vacca

@itsalexvacca

Anthropic took transformers and made them "safe." First, they had Claude critique their own outputs.

"Am I being harmful? Biased? Lying?"
The AI argues with itself about ethics before answering you.

They called it Constitutional AI. But that wasn't enough.

Alex Vacca

@itsalexvacca

Then came RLHF - humans rating millions of Claude's responses.

Do this millions of times. The transformer learns what humans actually want.

Same 8-page architecture underneath. But Meta went even further.

Alex Vacca

@itsalexvacca

Meta spent millions training LLaMA with months of supercomputers running 24/7.

Then they released the actual AI brain - the files that are the model. Small (7B), medium (13B), large (70B) versions.

You could run AI on your laptop locally. But why give away $100M models?

Alex Vacca

@itsalexvacca

Zuck's play: Let 100,000 developers improve LLaMA. They debug it, optimize it and build tools. Meta gets all innovations back.

While Google/OpenAI charge fees, Meta built an army of unpaid developers. Genius move? I don't know

Alex Vacca

@itsalexvacca

Today, transformers power everything:

ChatGPT: Decoder transformer
Claude: Standard transformer
DALL-E: Vision transformer
Copilot: Code transformer

Same architecture. Different products.

Alex Vacca

@itsalexvacca

Thanks for making it to the end!

I'm Alex, co-founder at ColdIQ. Built a $6M ARR business in under 2 years. We're a remote team across 10 countries, helping 400+ businesses.

Here's how I make $450k+ every month with AI:
tinyurl.com/5n79rd5w

Alex Vacca

@itsalexvacca

RT the first tweet if you found this thread valuable.

Follow me @itsalexvacca for more threads on outbound and GTM strategy, AI-powered sales systems, and how to build profitable businesses that don't depend on you.

I share what worked (and what didn't) in real time.

View Tweet

Generated by Thread Navigator

100%

workspace_premium Upgrade view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export

auto_awesome

Image exported!

Pro export renders embedded tweets & media at 2x Retina resolution.

Upgrade — $5