Hi,πŸ‘‹ we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

@akshay_pachaar: How LLMs work, clearly explain...

@akshay_pachaar
18 views May 06, 2025
1
How LLMs work, clearly explained:
2
Before diving into LLMs, we must understand conditional probability.

Let's consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks πŸ‘‡
Media image
3
So what is Conditional probability ⁉️

It's a measure of the probability of an event given that another event has occurred.

If the events are A and B, we denote this as P(A|B).

This reads as "probability of A given B"

Check this illustration πŸ‘‡
Media image
4
For instance, if we're predicting whether it will rain today (event A), knowing that it's cloudy (event B) might impact our prediction.

As it's more likely to rain when it's cloudy, we'd say the conditional probability P(A|B) is high.

That's conditional probability for you! πŸŽ‰
5
Now, how does this apply to LLMs like GPT-4❓

These models are tasked with predicting the next word in a sequence.

This is a question of conditional probability: given the words that have come before, what is the most likely next word?
Media image
6
To predict the next word, the model calculates the conditional probability for each possible next word, given the previous words (context).

The word with the highest conditional probability is chosen as the prediction.
Media image
7
The LLM learns a high-dimensional probability distribution over sequences of words.

And the parameters of this distribution are the trained weights!

The training or rather pre-training** is supervised.

I'll talk about the different training steps next time!**

Check this πŸ‘‡
Media image
8
But there a problem❗️

If we always pick the word with the highest probability, we end up with repetitive outputs, making LLMs almost useless and stifling their creativity.

This is where temperature comes into picture.

Check this before we understand more about it...πŸ‘‡
Media image
9
However a high temperate value produces gibberish

Let's understand what's going on...πŸ‘‡
Media image
10
So, the LLMs instead of selecting the best token (for simplicity let's think of tokens as words), they "sample" the prediction.

So even if β€œToken 1” has the highest score, it may not be chosen since we are sampling.
Media image
11
Now, temperature introduces the following tweak in the softmax function, which, in turn, influences the sampling process:
Media image
12
Let take a code example!

At low temperature, probabilities concentrate around the most likely token, resulting in nearly greedy generation.

At high temperature, probabilities become more uniform, producing highly random and stochastic outputs.

Check this outπŸ‘‡
Media image
13
That's a wrap!

Hopefully, this guide has demystified some of the magic behind LLMs.

And, if you enjoyed this breakdown:

Find me β†’ @akshay_pachaar βœ”οΈ
For more insights and tutorials on AI and Machine Learning.
Actions
Visual Editor Carousel Maker NEW
Update Thread
What You Can Do
  • Download as PDF
  • Save to Notion
  • Export as Markdown
  • Visual Editor
  • LinkedIn & Instagram Carousel Maker
Create Free Account

Includes 7-day Premium trial