kolm / distill
Pick a frontier teacher. Point at your task. Get a smaller, faster, cheaper model tuned for it. The whole thing ships as one signed .kolm. Every call produces a cryptographic receipt. Frontier prompt cache is on by default.
How distillation works in kolm
Same process band as the homepage. See every technique we use →
Three runs we have shipped. Numbers are reproducible from the recipe and the K-score gate; see the per-run receipt for the exact prompt-cache attribution.
Distill Claude Sonnet 4.6 into Llama 3.1 8B for SQL-from-english on a domain schema. 4k synthetic pairs, SFT plus DPO.
Distill GPT-4o into Phi-3.5 Mini for a 6-class email router. Synthetic data from 60 seed examples, ORPO alignment.
Distill Gemini 2.0 Pro into Qwen 2.5 7B for grounded policy lookup. Recall-grounded distill, judge-as-recipe verification.
Cost. A distilled 8B model serving the same task as Claude Opus is roughly 10−30× cheaper per call. For any production workload past the demo phase, the teacher API line item dominates the bill.
Latency. Frontier APIs are 200−800 ms round-trip. A local distilled model in your VPC is 20−80 ms. Critical for interactive surfaces.
Portability. The artifact is one signed file. It runs in your VPC, at the edge, or on a hosted bridge. Recipe and gate travel with it; the teacher API does not.
Prompt cache, automatic. When the compiled artifact replays the same prelude across calls, kolm passes Anthropic cache_control and OpenAI auto-cache through the bridge. Reads cost about 10% of the full input rate. Surfaces as cache_hit and cache_savings_usd on every receipt.
Auditable. Each call yields a 4-ring HMAC receipt. The chain pre-image → derived → execution → seal is signed with a key on file in your compliance package.
Reversible. If the K-score gate fails on a new domain shift, the recipe rebuilds the artifact against a fresh teacher draw. The recipe is the source of truth; the weights are an artifact of it.