INCREDIBLE
Someone on r/LocalLLaMA did an incredibly practical thing
They took a tiny 0.6B model that was trash at task (Text2SQL)
Created a knowledge distiliation agent with a Claude Code skill
And made the 0.6B model behave like a specialist using 100 examples
The problem
> Small Language Models are “generally helpful”
> but specialized tasks are “exact or you die”
> you ask: “Which artists have >1M album sales?”
> the model answers: “check if genre is NULL”
The old way to fix this
> Finetune the model:
> collect + clean data
> build training pipeline
> tune hparams
> rerun when it’s wrong
> accidentally become the unpaid
> intern of your own experiment
The new way
> Knowledge distillation via a Claude skill
> use a strong teacher (DeepSeek-V3)
> generate synthetic pairs from a small seed set
> train a tiny student to imitate the teacher on your task
> ship it as GGUF / HF / LoRA
> run it locally
Distillation isn’t “creating skill”
It’s compressing skill
THE REAL HACK: agent-as-interface
> They wrapped the whole distillation loop in an agent “skill”:
> picks task type (QA / classification / tool calling / RAG)
> converts messy inputs into clean JSONL
> runs teacher eval first
> kicks off distillation + monitors progress
> packages weights for you to run locally
This is the quiet unlock
Why “teacher eval first” is elite behavior
> distillation amplifies competence and incompetence
> if the teacher is wrong, the student learns wrong faster
> garbage in -> efficient garbage out
Adult supervision, but for models
The run breakdown:
> seed: ~100 raw conversation traces
> teacher (LLM-as-judge): ~80%
> base 0.6B: ~36%
> distilled 0.6B: ~74%
> output: ~2.2GB GGUF
> runs locally with llama.cpp
Before vs after (the entire reason you do this)
> before: wrong tables, wrong logic, nonsense SQL
> after: correct JOINs, GROUP BY, HAVING
> aka “this query actually executes and answers the question”
What this really means (bigger than Text2SQL)
You don’t need a giant model for every job
You need tiny specialists that understand your world:
> internal schemas
> service / OS logs
> tool outputs
> company-specific workflows
TL;DR
> “fine-tuning is hard” is mostly “the pipeline is annoying”
> distillation skill turns 10–100 examples into a real specialist
> the agent wrapper turns the whole thing into a conversation
> this is how you get practical local SLMs
> without becoming an MLOps monk
Small & Specialized models
> High-leverage
> Boringly effective
> Exactly where this is going
The future is
Local inference
Lower latency
Fewer secrets leaving the building

Skill: github.com/distil-labs/di…
Full example with data: github.com/distil-labs/di…
Detailed walkthrough: distillabs.ai/blog/train-you…
Reddit Thread: old.reddit.com/r/LocalLLaMA/c…
Full example with data: github.com/distil-labs/di…
Detailed walkthrough: distillabs.ai/blog/train-you…
Reddit Thread: old.reddit.com/r/LocalLLaMA/c…
Generated by Thread Navigator
Press ⌘ + S to quick-export
