@TheAhmadOsman: INCREDIBLESomeone on r/Local...
@TheAhmadOsman
5 views
Jan 22, 2026
1
INCREDIBLE
Someone on r/LocalLLaMA did an incredibly practical thing
They took a tiny 0.6B model that was trash at task (Text2SQL)
Created a knowledge distiliation agent with a Claude Code skill
And made the 0.6B model behave like a specialist using 100 examples
The problem
> Small Language Models are âgenerally helpfulâ
> but specialized tasks are âexact or you dieâ
> you ask: âWhich artists have >1M album sales?â
> the model answers: âcheck if genre is NULLâ
The old way to fix this
> Finetune the model:
> collect + clean data
> build training pipeline
> tune hparams
> rerun when itâs wrong
> accidentally become the unpaid
> intern of your own experiment
The new way
> Knowledge distillation via a Claude skill
> use a strong teacher (DeepSeek-V3)
> generate synthetic pairs from a small seed set
> train a tiny student to imitate the teacher on your task
> ship it as GGUF / HF / LoRA
> run it locally
Distillation isnât âcreating skillâ
Itâs compressing skill
THE REAL HACK: agent-as-interface
> They wrapped the whole distillation loop in an agent âskillâ:
> picks task type (QA / classification / tool calling / RAG)
> converts messy inputs into clean JSONL
> runs teacher eval first
> kicks off distillation + monitors progress
> packages weights for you to run locally
This is the quiet unlock
Why âteacher eval firstâ is elite behavior
> distillation amplifies competence and incompetence
> if the teacher is wrong, the student learns wrong faster
> garbage in -> efficient garbage out
Adult supervision, but for models
The run breakdown:
> seed: ~100 raw conversation traces
> teacher (LLM-as-judge): ~80%
> base 0.6B: ~36%
> distilled 0.6B: ~74%
> output: ~2.2GB GGUF
> runs locally with llama.cpp
Before vs after (the entire reason you do this)
> before: wrong tables, wrong logic, nonsense SQL
> after: correct JOINs, GROUP BY, HAVING
> aka âthis query actually executes and answers the questionâ
What this really means (bigger than Text2SQL)
You donât need a giant model for every job
You need tiny specialists that understand your world:
> internal schemas
> service / OS logs
> tool outputs
> company-specific workflows
TL;DR
> âfine-tuning is hardâ is mostly âthe pipeline is annoyingâ
> distillation skill turns 10â100 examples into a real specialist
> the agent wrapper turns the whole thing into a conversation
> this is how you get practical local SLMs
> without becoming an MLOps monk
Small & Specialized models
> High-leverage
> Boringly effective
> Exactly where this is going
The future is
Local inference
Lower latency
Fewer secrets leaving the building
Someone on r/LocalLLaMA did an incredibly practical thing
They took a tiny 0.6B model that was trash at task (Text2SQL)
Created a knowledge distiliation agent with a Claude Code skill
And made the 0.6B model behave like a specialist using 100 examples
The problem
> Small Language Models are âgenerally helpfulâ
> but specialized tasks are âexact or you dieâ
> you ask: âWhich artists have >1M album sales?â
> the model answers: âcheck if genre is NULLâ
The old way to fix this
> Finetune the model:
> collect + clean data
> build training pipeline
> tune hparams
> rerun when itâs wrong
> accidentally become the unpaid
> intern of your own experiment
The new way
> Knowledge distillation via a Claude skill
> use a strong teacher (DeepSeek-V3)
> generate synthetic pairs from a small seed set
> train a tiny student to imitate the teacher on your task
> ship it as GGUF / HF / LoRA
> run it locally
Distillation isnât âcreating skillâ
Itâs compressing skill
THE REAL HACK: agent-as-interface
> They wrapped the whole distillation loop in an agent âskillâ:
> picks task type (QA / classification / tool calling / RAG)
> converts messy inputs into clean JSONL
> runs teacher eval first
> kicks off distillation + monitors progress
> packages weights for you to run locally
This is the quiet unlock
Why âteacher eval firstâ is elite behavior
> distillation amplifies competence and incompetence
> if the teacher is wrong, the student learns wrong faster
> garbage in -> efficient garbage out
Adult supervision, but for models
The run breakdown:
> seed: ~100 raw conversation traces
> teacher (LLM-as-judge): ~80%
> base 0.6B: ~36%
> distilled 0.6B: ~74%
> output: ~2.2GB GGUF
> runs locally with llama.cpp
Before vs after (the entire reason you do this)
> before: wrong tables, wrong logic, nonsense SQL
> after: correct JOINs, GROUP BY, HAVING
> aka âthis query actually executes and answers the questionâ
What this really means (bigger than Text2SQL)
You donât need a giant model for every job
You need tiny specialists that understand your world:
> internal schemas
> service / OS logs
> tool outputs
> company-specific workflows
TL;DR
> âfine-tuning is hardâ is mostly âthe pipeline is annoyingâ
> distillation skill turns 10â100 examples into a real specialist
> the agent wrapper turns the whole thing into a conversation
> this is how you get practical local SLMs
> without becoming an MLOps monk
Small & Specialized models
> High-leverage
> Boringly effective
> Exactly where this is going
The future is
Local inference
Lower latency
Fewer secrets leaving the building
2
Skill: github.com/distil-labs/diâŚ
Full example with data: github.com/distil-labs/diâŚ
Detailed walkthrough: distillabs.ai/blog/train-youâŚ
Reddit Thread: old.reddit.com/r/LocalLLaMA/câŚ
Full example with data: github.com/distil-labs/diâŚ
Detailed walkthrough: distillabs.ai/blog/train-youâŚ
Reddit Thread: old.reddit.com/r/LocalLLaMA/câŚ
