use cases / UC-06 · capture & distill

Every frontier call trains a local LoRA you keep.

Point your existing OpenAI / Anthropic SDK at kolm capture and the proxy records every (input, output) pair your team already paid for. At threshold the captured corpus compiles into a signed local .kolm: a LoRA on an open base, K-score gated, runs on your hardware. Same dollar of frontier spend now also pays for an asset you own.

01 · the gap

Frontier spend goes up. No asset accumulates.

Every other large recurring line in software (cloud hosting, observability, payments) either is fungible commodity or accrues real switching cost. Frontier API spend accrues neither. Cancel today and you have nothing but log entries. The fix is the inversion: capture the verified pairs your team is already buying, and turn them into a local artifact you own.

Threshold to compile
1,000verified pairs

Default. Configurable per namespace. Below the threshold, the proxy keeps capturing and the LoRA stays uncompiled. The compiler refuses to ship without a meaningful holdout.

K-score gate
≥ 0.85held out

The artifact only releases when the held-out eval clears the gate. If the captured corpus is too noisy or the base model is wrong for the task, the compile fails loud, not silent.

Egress on local-served call
0bytes

Every call routed to the local artifact stops at your perimeter. The frontier provider, the kolm proxy, and the rest of the network are not in the call path. Latency floor is electricity.

02 · the four endpoints

Drop-in proxy. Three more endpoints close the loop.

The capture surface is one base-URL change in your existing codebase. The label, distill, and verify endpoints below are everything else you need to turn the captured corpus into a signed local artifact.

CAP

POST /v1/capture/<provider>

Same body shape as the upstream. Forwards to OpenAI / Anthropic with your customer key (passed in a header, stripped before forward). Records (input, output, latency_us, model, namespace) on the way back. One base-URL change in your codebase.

LBL

GET /v1/labels/synthesize-corpus

Returns captured pairs as JSONL or parquet, scoped by tenant + namespace + optional verifier-confidence floor. You can take the corpus to any trainer — you are not locked into ours.

DST

POST /v1/specialists/auto-distill

Takes {namespace, base_model, target_size}. Fires LoRA training when the namespace clears the threshold, runs the K-score gate on a held-out slice, ships back the signed .kolm. One CLI call, one download.

03 · install

Five-line CLI surface. Capture, status, labels, distill, verify.

No SDK fork. No agent harness. The CLI wraps the four endpoints. Pointing your existing OpenAI / Anthropic client at the capture proxy is one base-URL change.

~/your-app
# 1. install + auth
$ npm i -g github:sneaky-hippo/kolmogorov-stack
$ kolm config base https://kolm.ai
$ kolm login

# 2. point your existing client at the capture endpoint
$ kolm capture --provider anthropic --as support-replies --namespace prod-tickets
 wrote ~/.kolm/capture/support-replies.json
 set ANTHROPIC_BASE_URL=https://kolm.ai/v1/capture/anthropic in your shell

# 3. inspect what has accumulated (run this whenever)
$ kolm capture status
namespace=prod-tickets pairs=412 verified=326 (79%)
namespace=prod-tickets ready_to_distill_at=1000  eta_days=4

# 4. at threshold, compile to a signed local artifact
$ kolm distill --namespace prod-tickets --base-model phi-3-mini
 trained LoRA on 1000 verified pairs (240 holdout)
 K-score 0.86, signature hmac-sha256
 artifact ~/.kolm/artifacts/prod-tickets.kolm

$ kolm inspect ~/.kolm/artifacts/prod-tickets.kolm
 manifest signed, receipt chain walks, corpus hash matches
04 · what 12 months looks like

The frontier bill pays for the artifact.

Worked example: 50-engineer org running an internal customer-support copilot on Claude Opus, ~80,000 calls/month. Capture turned on month one. Distill at 1,000 verified pairs. The local LoRA serves the easy slice; the long tail keeps escalating to Opus. The local-served fraction grows quarter over quarter.

MonthFrontier callsVerified pairsLocal LoRA statusFrontier spend
080,0000None$12,000
180,00062kBelow threshold$12,000
280,000125kCompiled. K=0.86. ~78% Opus quality.$12,000
3~32,000+25k60% local. Frontier slice is long tail.$4,800
4~16,000+13k80% local. LoRA retrained on tail.$2,400
12~12,000+10k85% local steady-state.$1,800

12-month total: ~$48k frontier spend (down from $144k linear), one signed local artifact owned in perpetuity, sub-100ms p50 latency on the local-served slice, no customer data leaving the perimeter on that slice.

05 · what we do not claim

Where it pays off, where it does not.

Capture-and-distill is one shape, not the only shape. It earns its keep on stable, high-volume tasks. It is the wrong tool for fast-moving knowledge or pure reasoning. Below is the honest map.

+

Where it pays off.

Stable, high-volume tasks repeated thousands of times per month: support replies, classification, extraction, summarization, structured-output generation. Captured corpus converges fast; LoRA gets a real lift. Latency, cost, privacy all improve.

~

Where it is neutral.

Tasks where retrieval against fresh documents is the binding constraint (RAG territory). Capture-and-distill composes with RAG: local LoRA for the routine slice, frontier-RAG for the long tail. The right pattern is both, not either-or.

!

Where it is the wrong shape.

Free-form, low-volume creative work where every call is unique. The captured corpus does not converge. Use the frontier model directly. We are honest about this; we ship a verifier that refuses to compile when the corpus is not coherent enough.

06 · legal & privacy

Your prompts. Your outputs. Your artifact.

The legal frame is the most-asked question in design-partner calls. Three sentences resolve it: provider TOS grants you ownership of outputs you paid for; we train a task-specific LoRA on your prompts and those outputs (not a competing general model); the artifact ships to you and we hold no copy. The privacy frame is parallel: TLS end-to-end through the proxy, tenant-scoped storage, no cross-tenant join path, and a bring-your-own-VPC docker bundle for regulated buyers.

capture path · what crosses what perimeter

The proxy is a passthrough.

No plaintext on disk; TLS terminates only long enough to capture, then re-establishes upstream. Captured pairs are stored in a row keyed by your tenant id. We do not retrain on captured data without an explicit per-namespace consent flag flipped on your account. For HIPAA / GDPR / ITAR, the proxy ships as a docker image you run inside your VPC.

step 1

your client

Existing OpenAI / Anthropic SDK with one base-URL change. Customer API key in a header.

step 2

kolm capture proxy

Strips the customer key, forwards to upstream. Records (input, output, latency) on the way back. TLS in, TLS out.

step 3

upstream provider

Sees a normal request from the kolm tenant. Bills the customer key. Returns the response.

step 4

your tenant store

Pair lands in observations keyed by your tenant id and namespace. Nobody else sees it.

Stop renting the same model a thousand times a day.

The first call you proxy through capture is the first dollar that goes from rent to deposit. The threshold compiles itself.