Visualize Thread by @TheAhmadOsman

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Ahmad

@TheAhmadOsman

INCREDIBLE

Someone on r/LocalLLaMA did an incredibly practical thing

They took a tiny 0.6B model that was trash at task (Text2SQL)
Created a knowledge distiliation agent with a Claude Code skill
And made the 0.6B model behave like a specialist using 100 examples

The problem
> Small Language Models are “generally helpful”
> but specialized tasks are “exact or you die”
> you ask: “Which artists have >1M album sales?”
> the model answers: “check if genre is NULL”

The old way to fix this
> Finetune the model:
> collect + clean data
> build training pipeline
> tune hparams
> rerun when it’s wrong
> accidentally become the unpaid
> intern of your own experiment

The new way
> Knowledge distillation via a Claude skill
> use a strong teacher (DeepSeek-V3)
> generate synthetic pairs from a small seed set
> train a tiny student to imitate the teacher on your task
> ship it as GGUF / HF / LoRA
> run it locally

Distillation isn’t “creating skill”
It’s compressing skill

THE REAL HACK: agent-as-interface
> They wrapped the whole distillation loop in an agent “skill”:
> picks task type (QA / classification / tool calling / RAG)
> converts messy inputs into clean JSONL
> runs teacher eval first
> kicks off distillation + monitors progress
> packages weights for you to run locally
This is the quiet unlock

Why “teacher eval first” is elite behavior
> distillation amplifies competence and incompetence
> if the teacher is wrong, the student learns wrong faster
> garbage in -> efficient garbage out
Adult supervision, but for models

The run breakdown:
> seed: ~100 raw conversation traces
> teacher (LLM-as-judge): ~80%
> base 0.6B: ~36%
> distilled 0.6B: ~74%
> output: ~2.2GB GGUF
> runs locally with llama.cpp

Before vs after (the entire reason you do this)
> before: wrong tables, wrong logic, nonsense SQL
> after: correct JOINs, GROUP BY, HAVING
> aka “this query actually executes and answers the question”

What this really means (bigger than Text2SQL)
You don’t need a giant model for every job

You need tiny specialists that understand your world:
> internal schemas
> service / OS logs
> tool outputs
> company-specific workflows

TL;DR
> “fine-tuning is hard” is mostly “the pipeline is annoying”
> distillation skill turns 10–100 examples into a real specialist
> the agent wrapper turns the whole thing into a conversation
> this is how you get practical local SLMs
> without becoming an MLOps monk

Small & Specialized models
> High-leverage
> Boringly effective
> Exactly where this is going

The future is
Local inference
Lower latency
Fewer secrets leaving the building

Ahmad

@TheAhmadOsman

Skill: github.com/distil-labs/di…
Full example with data: github.com/distil-labs/di…
Detailed walkthrough: distillabs.ai/blog/train-you…
Reddit Thread: old.reddit.com/r/LocalLLaMA/c…

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export