Point your existing OpenAI / Anthropic SDK at kolm capture and the proxy records every (input, output) pair your team already paid for. At threshold the captured corpus compiles into a signed local .kolm: a LoRA on an open base, K-score gated, runs on your hardware. Same dollar of frontier spend now also pays for an asset you own.
Every other large recurring line in software (cloud hosting, observability, payments) either is fungible commodity or accrues real switching cost. Frontier API spend accrues neither. Cancel today and you have nothing but log entries. The fix is the inversion: capture the verified pairs your team is already buying, and turn them into a local artifact you own.
Default. Configurable per namespace. Below the threshold, the proxy keeps capturing and the LoRA stays uncompiled. The compiler refuses to ship without a meaningful holdout.
The artifact only releases when the held-out eval clears the gate. If the captured corpus is too noisy or the base model is wrong for the task, the compile fails loud, not silent.
Every call routed to the local artifact stops at your perimeter. The frontier provider, the kolm proxy, and the rest of the network are not in the call path. Latency floor is electricity.
The capture surface is one base-URL change in your existing codebase. The label, distill, and verify endpoints below are everything else you need to turn the captured corpus into a signed local artifact.
Same body shape as the upstream. Forwards to OpenAI / Anthropic with your customer key (passed in a header, stripped before forward). Records (input, output, latency_us, model, namespace) on the way back. One base-URL change in your codebase.
Returns captured pairs as JSONL or parquet, scoped by tenant + namespace + optional verifier-confidence floor. You can take the corpus to any trainer — you are not locked into ours.
Takes {namespace, base_model, target_size}. Fires LoRA training when the namespace clears the threshold, runs the K-score gate on a held-out slice, ships back the signed .kolm. One CLI call, one download.
No SDK fork. No agent harness. The CLI wraps the four endpoints. Pointing your existing OpenAI / Anthropic client at the capture proxy is one base-URL change.
# 1. install + auth $ npm i -g github:sneaky-hippo/kolmogorov-stack $ kolm config base https://kolm.ai $ kolm login # 2. point your existing client at the capture endpoint $ kolm capture --provider anthropic --as support-replies --namespace prod-tickets ✓ wrote ~/.kolm/capture/support-replies.json ✓ set ANTHROPIC_BASE_URL=https://kolm.ai/v1/capture/anthropic in your shell # 3. inspect what has accumulated (run this whenever) $ kolm capture status namespace=prod-tickets pairs=412 verified=326 (79%) namespace=prod-tickets ready_to_distill_at=1000 eta_days=4 # 4. at threshold, compile to a signed local artifact $ kolm distill --namespace prod-tickets --base-model phi-3-mini ✓ trained LoRA on 1000 verified pairs (240 holdout) ✓ K-score 0.86, signature hmac-sha256 ✓ artifact ~/.kolm/artifacts/prod-tickets.kolm $ kolm inspect ~/.kolm/artifacts/prod-tickets.kolm ✓ manifest signed, receipt chain walks, corpus hash matches
Worked example: 50-engineer org running an internal customer-support copilot on Claude Opus, ~80,000 calls/month. Capture turned on month one. Distill at 1,000 verified pairs. The local LoRA serves the easy slice; the long tail keeps escalating to Opus. The local-served fraction grows quarter over quarter.
| Month | Frontier calls | Verified pairs | Local LoRA status | Frontier spend |
|---|---|---|---|---|
| 0 | 80,000 | 0 | None | $12,000 |
| 1 | 80,000 | 62k | Below threshold | $12,000 |
| 2 | 80,000 | 125k | Compiled. K=0.86. ~78% Opus quality. | $12,000 |
| 3 | ~32,000 | +25k | 60% local. Frontier slice is long tail. | $4,800 |
| 4 | ~16,000 | +13k | 80% local. LoRA retrained on tail. | $2,400 |
| 12 | ~12,000 | +10k | 85% local steady-state. | $1,800 |
12-month total: ~$48k frontier spend (down from $144k linear), one signed local artifact owned in perpetuity, sub-100ms p50 latency on the local-served slice, no customer data leaving the perimeter on that slice.
Capture-and-distill is one shape, not the only shape. It earns its keep on stable, high-volume tasks. It is the wrong tool for fast-moving knowledge or pure reasoning. Below is the honest map.
Stable, high-volume tasks repeated thousands of times per month: support replies, classification, extraction, summarization, structured-output generation. Captured corpus converges fast; LoRA gets a real lift. Latency, cost, privacy all improve.
Tasks where retrieval against fresh documents is the binding constraint (RAG territory). Capture-and-distill composes with RAG: local LoRA for the routine slice, frontier-RAG for the long tail. The right pattern is both, not either-or.
Free-form, low-volume creative work where every call is unique. The captured corpus does not converge. Use the frontier model directly. We are honest about this; we ship a verifier that refuses to compile when the corpus is not coherent enough.
The legal frame is the most-asked question in design-partner calls. Three sentences resolve it: provider TOS grants you ownership of outputs you paid for; we train a task-specific LoRA on your prompts and those outputs (not a competing general model); the artifact ships to you and we hold no copy. The privacy frame is parallel: TLS end-to-end through the proxy, tenant-scoped storage, no cross-tenant join path, and a bring-your-own-VPC docker bundle for regulated buyers.
No plaintext on disk; TLS terminates only long enough to capture, then re-establishes upstream. Captured pairs are stored in a row keyed by your tenant id. We do not retrain on captured data without an explicit per-namespace consent flag flipped on your account. For HIPAA / GDPR / ITAR, the proxy ships as a docker image you run inside your VPC.
Existing OpenAI / Anthropic SDK with one base-URL change. Customer API key in a header.
Strips the customer key, forwards to upstream. Records (input, output, latency) on the way back. TLS in, TLS out.
Sees a normal request from the kolm tenant. Bills the customer key. Returns the response.
Pair lands in observations keyed by your tenant id and namespace. Nobody else sees it.
The first call you proxy through capture is the first dollar that goes from rent to deposit. The threshold compiles itself.