✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135Β°

style Card Style

40px
16px

text_fields Typography

16px
Akshay πŸš€
@akshay_pachaar
How LLMs work, clearly explained:
Akshay πŸš€
@akshay_pachaar
Before diving into LLMs, we must understand conditional probability.

Let's consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks πŸ‘‡
Thread image
Akshay πŸš€
@akshay_pachaar
So what is Conditional probability ⁉️

It's a measure of the probability of an event given that another event has occurred.

If the events are A and B, we denote this as P(A|B).

This reads as "probability of A given B"

Check this illustration πŸ‘‡
Thread image
Akshay πŸš€
@akshay_pachaar
For instance, if we're predicting whether it will rain today (event A), knowing that it's cloudy (event B) might impact our prediction.

As it's more likely to rain when it's cloudy, we'd say the conditional probability P(A|B) is high.

That's conditional probability for you! πŸŽ‰
Akshay πŸš€
@akshay_pachaar
Now, how does this apply to LLMs like GPT-4❓

These models are tasked with predicting the next word in a sequence.

This is a question of conditional probability: given the words that have come before, what is the most likely next word?
Thread image
Akshay πŸš€
@akshay_pachaar
To predict the next word, the model calculates the conditional probability for each possible next word, given the previous words (context).

The word with the highest conditional probability is chosen as the prediction.
Thread image
Akshay πŸš€
@akshay_pachaar
The LLM learns a high-dimensional probability distribution over sequences of words.

And the parameters of this distribution are the trained weights!

The training or rather pre-training** is supervised.

I'll talk about the different training steps next time!**

Check this πŸ‘‡
Thread image
Akshay πŸš€
@akshay_pachaar
But there a problem❗️

If we always pick the word with the highest probability, we end up with repetitive outputs, making LLMs almost useless and stifling their creativity.

This is where temperature comes into picture.

Check this before we understand more about it...πŸ‘‡
Thread image
Akshay πŸš€
@akshay_pachaar
However a high temperate value produces gibberish

Let's understand what's going on...πŸ‘‡
Thread image
Akshay πŸš€
@akshay_pachaar
So, the LLMs instead of selecting the best token (for simplicity let's think of tokens as words), they "sample" the prediction.

So even if β€œToken 1” has the highest score, it may not be chosen since we are sampling.
Thread image
Akshay πŸš€
@akshay_pachaar
Now, temperature introduces the following tweak in the softmax function, which, in turn, influences the sampling process:
Thread image
Akshay πŸš€
@akshay_pachaar
Let take a code example!

At low temperature, probabilities concentrate around the most likely token, resulting in nearly greedy generation.

At high temperature, probabilities become more uniform, producing highly random and stochastic outputs.

Check this outπŸ‘‡
Thread image
Akshay πŸš€
@akshay_pachaar
That's a wrap!

Hopefully, this guide has demystified some of the magic behind LLMs.

And, if you enjoyed this breakdown:

Find me β†’ @akshay_pachaar βœ”οΈ
For more insights and tutorials on AI and Machine Learning.
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press ⌘ + S to quick-export