| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Akshay 🚀

@akshay_pachaar

Self-attention in LLMs, clearly explained:

Drag Post #2

Akshay 🚀

@akshay_pachaar

Before we start a quick primer on tokenization! Raw text → Tokenization → Embedding → Model Embedding is a meaningful representation of each token (roughly a word) using a bunch of numbers. This embedding is what we provide as an input to our language models. Check this👇

Apply Image

Drag Post #3

Akshay 🚀

@akshay_pachaar

The core idea of Language modelling is to understand the structure and patterns within language. By modeling the relationships between words (tokens) in a sentence, we can capture the context and meaning of the text.

Apply Image

Drag Post #4

Akshay 🚀

@akshay_pachaar

Now self attention is a communication mechanism that help establish these relationships, expressed as probability scores. Each token assigns the highest score to itself and additional scores to other tokens based on their relevance. You can think of it as a directed graph 👇

Apply Image

Drag Post #5

Akshay 🚀

@akshay_pachaar

To understand how these probability/attention scores are obtained: We must understand 3 key terms: - Query Vector - Key Vector - Value Vector These vectors are created by multiplying the input embedding by three weight matrices that are trainable. Check this out 👇

Apply Image

Drag Post #6

Akshay 🚀

@akshay_pachaar

Now here's a broader picture of how input embeddings are combined with Keys, Queries & Values to obtain the actual attention scores. After acquiring keys, queries, and values, we merge them to create a new set of context-aware embeddings. Check this out👇

Apply Image

Drag Post #7

Akshay 🚀

@akshay_pachaar

Implementing self-attention using PyTorch, doesn't get easier! 🚀 It's very intuitive! 💡 Check this out 👇

Apply Image

Drag Post #8

Akshay 🚀

@akshay_pachaar

I'll leave you with this visual, which intuitively explains self-attention as a communication mechanism between tokens. This communication can be represented by a directed graph 👇

Apply Image

Drag Post #9

Akshay 🚀

@akshay_pachaar

If you found it insightful, reshare with your network. Find me → @akshay_pachaar ✔️ For more insights and tutorials on LLMs, AI Agents, and Machine Learning! <a target="_blank" href="https://twitter.com/703601972/status/1930603077356462471" color="blue">x.com/703601972/stat…</a>