Canvas & Ratio
Choose your destination platform format
Layout Template
Choose a content structure for your slides
Preset Themes
Typography & Sizing
Brand Kit Customization
AGENCYConfigure brand assets for headers & footers
Outro Slide CTA
Customize your closing call-to-action slide
Background Pattern
Build Your Carousel
Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and decode() back from tokens to strings. In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI.


We will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.


Also, releasing new repository on GitHub: minbpe Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. <a target="_blank" href="https://github.com/karpathy/minbpe" color="blue">github.com/karpathy/minbpe</a> In the video we essentially build minbpe from scratch. Don't miss the <a target="_blank" href="http://exercise.md" color="blue">exercise.md</a> to build your own

The actual link to the lecture: <a target="_blank" href="https://www.youtube.com/watch?v=zduSFxRajkE" color="blue">youtube.com/watch?v=zduSFx…</a> (at the end of the thread here (sorry) otherwise X really really dislikes external links and would bury this post. I could eventually upload here too, for now X is missing a lot of very nice features, chapters especially)