You don't need billions to train the next ChatGPT
All you need is a $100 and Andrej's Karpathy's Nanochat
I used it for the last week here is what I found

---
Disclaimer: the cost of compute is expected to go down the next decade. Even though my statement is hyperbolic , you can get a usable version in less than $100. This is a build and not a bold statement. I do agree that the capex right now to train these AI models is insanely high, but I'm expecting that one day will come a time where we will be able to train awesome frontier models at really economical prices.
---
I spent ~$100 and one weekend training a ChatGPT-style model from scratch on my own notes, writing, and exported AI chats.
It now answers in my voice and recalls my own ideas, with no API and no rented brain.
It now answers in my voice and recalls my own ideas, with no API and no rented brain.
This guide is the version I wish I'd had: every command, every code change, and plain-English explanations of the jargon so you don't get stuck.
If you've never trained a model before, you're the target reader. Take it one step at a time.
## Read this first (what you're signing up for)
What you'll end up with: a small GPT, roughly as capable as OpenAI's original GPT-2 (2019), fine-tuned on your own data so it sounds like you and knows your stuff. You can chat with it in a ChatGPT-style web page.
Honest expectations: this is not GPT-4. It's "a kindergartener with your memories", charming, useful for recall and drafting, and confidently wrong sometimes. The magic isn't raw IQ; it's that it's yours, it's private, and you understand every part of it.
What it costs: about $48–$100 in rented GPU time for the full run. You can learn the entire pipeline for ~$0 first (more on that below).
Skills you need:
• Comfort typing commands into a terminal (copy-paste is fine).
• Basic Python literacy helps for the data step, but I'll give you working scripts.
• No machine-learning background required. I'll explain the concepts as we go.
Time: budget a weekend. The actual training is ~3 hours; the rest is setup and preparing your data.
## The 60-second mental model
Training a chatbot happens in two big phases. Keep these straight and everything else makes sense.
1. Pretraining → produces the base model. The model reads a huge pile of internet text and learns one skill: predict the next word. This is where it learns grammar, facts, and reasoning. It's expensive (this is the ~3 hours of GPU time). The result talks like the internet, it can complete text but can't chat.
1. Fine-tuning (SFT) → produces the chat model. You show the base model thousands of example conversations so it learns to answer like an assistant. This is cheap and fast (minutes). This is where your personal data goes in.
Generated by Thread Navigator
Press ⌘ + S to quick-export
