Carousel Studio

Repurpose X Threads into LinkedIn & Instagram Carousels

Canvas & Ratio

Choose your destination platform format


Layout Template

Choose a content structure for your slides


Preset Themes


Typography & Sizing

Title Font Size36px
Body Font Size18px
Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)
AGENCY
SAVE PRESETS (AGENCY)

Outro Slide CTA

Customize your closing call-to-action slide

#1
#2
#3

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1
Avi Chawla
@_avichawla

The growth of LLM context length with time: - GPT-3.5-turbo β†’ 4k tokens - OpenAI GPT4 β†’ 8k tokens - Claude 2 β†’ 100k tokens - Llama 3 β†’ 128k tokens - Gemini β†’ 1M tokens Let's understand how they extend the context length of LLMs:

Drag Post #2
Avi Chawla
@_avichawla

In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens. Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will. Check this πŸ‘‡

Apply Image
Drag Post #3
Avi Chawla
@_avichawla

1) Sparse Attention It limits the attention computation to a subset of tokens by: - Using local attention (tokens attend only to their neighbors). - Letting the model learn which tokens to focus on. But this has a trade-off between computational complexity and performance.

Apply Image
Drag Post #4
Avi Chawla
@_avichawla

A similar idea was used in ModernBERT. It is an upgraded version of BERT with: - 16x larger sequence length - Much better downstream performance, and - The most memory-efficient encoder They used alternating attention. Check this πŸ‘‡

Drag Post #5
Avi Chawla
@_avichawla

Here's the idea: - Use full global attention in every third layer. - Use local attention otherwise, where a token attends to 128 tokens. This allows ModernBERT to process longer sequences, while also being significantly faster than other encoder models. Check this πŸ‘‡

Drag Post #6
Avi Chawla
@_avichawla

Here's an intuitive explanation taken from the paper: Picture yourself reading a book. For every sentence you read, do you need to be fully aware of the entire plot to understand most of it (full global attention)? Or is awareness of the current chapter enough (local attention), as long as you occasionally think back on its significance to the main plot (global attention)? In the vast majority of cases, it’s the latter.

Drag Post #7
Avi Chawla
@_avichawla

2) Flash Attention This is a fast and memory-efficient method that retains the exactness of traditional attention mechanisms, i.e., it uses global attention but efficiently. The whole idea revolves around optimizing the data movement within GPU memory. Let's understand!

Apply Image
Drag Post #8
Avi Chawla
@_avichawla

Some background details: - AΒ threadΒ is the smallest unit of execution. - Several threads form aΒ block. Also: - Threads in a block share a fast (but scarce) memory called SRAM. - All blocks share a global memory called HBM (abundant but slow). Check this πŸ‘‡

Apply Image
Drag Post #9
Avi Chawla
@_avichawla

Attention moves large matrices between SRAM and HBM: To compute QK: - distribute matrices to threads - compute, and - send the product to HBM To compute softmax: - distribute product to threads - compute, and - send output to HBM Repeat for all layers. Check this πŸ‘‡

Apply Image
Drag Post #10
Avi Chawla
@_avichawla

Flash attention involves hardware-level optimizations wherein it utilizes SRAM to cache the intermediate results. This way, it reduces redundant movements, offering a speed up of up to 7.6x over standard attention methods. Check this πŸ‘‡

Apply Image
Drag Post #11
Avi Chawla
@_avichawla

That's a wrap! If you found it insightful, reshare it with your network. Find me β†’ @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs. <a target="_blank" href="https://twitter.com/1175166450832687104/status/1959141055301132516" color="blue">x.com/11751664508326…</a>