kolm / distill

Distill Claude, GPT, or Gemini into your own AI.

Pick a frontier teacher. Point at your task. Get a smaller, faster, cheaper model tuned for it. The whole thing ships as one signed .kolm. Every call produces a cryptographic receipt. Frontier prompt cache is on by default.

Start a compile → Read the distill docs Research notes

How distillation works in kolm

Describeyour task in english Distillteacher generates synthetic data AlignDPO · GRPO · ORPO on the student EvaluateK-score gate · judges · PRM Sign.kolm · 4-ring HMAC receipt RunVPC · edge · hosted · cache on

Same process band as the homepage. See every technique we use →

What a compile looks like.

Three runs we have shipped. Numbers are reproducible from the recipe and the K-score gate; see the per-run receipt for the exact prompt-cache attribution.

SQL helper

Distill Claude Sonnet 4.6 into Llama 3.1 8B for SQL-from-english on a domain schema. 4k synthetic pairs, SFT plus DPO.

TeacherClaude Sonnet 4.6

Student baseLlama 3.1 8B

K-score0.93

Wall time~36 min

Compile cost$8.20

Inference vs teacher~12× cheaper

Open the recipe →

Support classifier

Distill GPT-4o into Phi-3.5 Mini for a 6-class email router. Synthetic data from 60 seed examples, ORPO alignment.

TeacherGPT-4o

Student basePhi-3.5 Mini 3.8B

K-score0.95

Wall time~22 min

Compile cost$4.10

Inference vs teacher~30× cheaper

Open the recipe →

Policy Q&A

Distill Gemini 2.0 Pro into Qwen 2.5 7B for grounded policy lookup. Recall-grounded distill, judge-as-recipe verification.

TeacherGemini 2.0 Pro

Student baseQwen 2.5 7B

K-score0.91

Wall time~34 min

Compile cost$7.10

Inference vs teacher~9× cheaper

Open the recipe →

Why distill at all.

Cost. A distilled 8B model serving the same task as Claude Opus is roughly 10−30× cheaper per call. For any production workload past the demo phase, the teacher API line item dominates the bill.

Latency. Frontier APIs are 200−800 ms round-trip. A local distilled model in your VPC is 20−80 ms. Critical for interactive surfaces.

Portability. The artifact is one signed file. It runs in your VPC, at the edge, or on a hosted bridge. Recipe and gate travel with it; the teacher API does not.

Prompt cache, automatic. When the compiled artifact replays the same prelude across calls, kolm passes Anthropic cache_control and OpenAI auto-cache through the bridge. Reads cost about 10% of the full input rate. Surfaces as cache_hit and cache_savings_usd on every receipt.

Auditable. Each call yields a 4-ring HMAC receipt. The chain pre-image → derived → execution → seal is signed with a key on file in your compliance package.

Reversible. If the K-score gate fails on a new domain shift, the recipe rebuilds the artifact against a fresh teacher draw. The recipe is the source of truth; the weights are an artifact of it.

Distill Claude, GPT, or Gemini into your own AI.

What a compile looks like.

SQL helper

Support classifier

Policy Q&A

Why distill at all.

Compile your first one.