Rent vs. buy compute. Every API call you proxy through us trains a local LoRA you keep forever.

The problem with renting forever
The thesis in one paragraph
How capture-and-distill actually works
A worked example: 50 engineers on Anthropic
The legal frame: who owns what
The privacy frame: who sees what
Why the signed receipt matters
vs. fine-tuning, vs. observability, vs. RAG
Start here

The problem with renting forever.

If you ship anything that calls a frontier model in production, the bill grows linearly with usage. Every prompt is a network round trip and a row in someone else's log. The model is deprecated on a fourteen-day notice. The data leaves your perimeter on every call. Latency floor is two hundred to eight hundred milliseconds even on the best path. None of these are bugs in your stack — they are properties of a rental architecture.

The line item that this creates on a finance team's monthly close is unique: spend goes up, no asset accumulates. Every other large recurring line in a software company — cloud hosting, observability, payments — either is fungible commodity or accrues real switching cost in the form of integration. Frontier API spend accrues neither. Cancel today and you have nothing to show for it but log entries.

This is the wrong shape for spend that is forecast to keep doubling. The rational move is to convert it.

Most of what your team is paying Anthropic or OpenAI to do is a small set of tasks repeated thousands of times. Each repetition produces a verified label. Save the labels and you can train a model that performs the same tasks for free.

The thesis in one paragraph.

You are already paying the frontier model to do work for you. The output it returns is, under every major provider's terms of service, yours to use. What we add is a thin proxy that captures the verified (input, output, latency) tuple for each call and a compiler that turns the captured corpus into a signed LoRA on an open base. When the LoRA is good enough to retire the proxy on a slice of traffic, you stop paying for that slice. The local artifact compounds in coverage as your traffic does. Every dollar of frontier spend you pass through the proxy is, mechanically, also a deposit into the local model. That is the rent-to-own moment for AI compute.

How capture-and-distill actually works.

There are four moving parts. None of them are exotic.

1. The capture proxy.

Two endpoints, POST /v1/capture/anthropic and POST /v1/capture/openai, accept the same body shape as the upstream provider. You send your customer's API key in a header that we strip before forwarding. The upstream call is unmodified. The response is unmodified. Before returning, we record a tuple of (input, output, model, latency_us, namespace, tenant) to your tenant's observations table. Surface area for the caller is one base-URL change in their codebase.

2. The verifier sample.

By default a configurable fraction of captures are run through k-sample verification — the same primitive that powers verified inference on the compile path — to attach a confidence score to each label. Captures that fail verification are still recorded but tagged so the distiller can quarantine them. The verifier is the brake that stops you from training on outputs the frontier model itself was uncertain about.

3. The label endpoint.

GET /v1/labels/synthesize-corpus?namespace=&format=jsonl returns the captured pairs as JSONL or parquet, ready to feed to any training framework. Tenant-scoped. Filtered by namespace, optional date range, and optional minimum verifier confidence. This is the format used by the next stage, but the export path lets you take the data anywhere — you are not locked into our trainer.

4. The distill bridge.

POST /v1/specialists/auto-distill takes a namespace, a base model, and a target footprint. At a configurable threshold (default one thousand verified pairs) the bridge fires a LoRA training job, runs the K-score gate on a held-out evaluation slice, and ships back a signed .kolm artifact when the gate passes. From the operator's view it is a single CLI call and a download.

The five-line CLI surface that wraps these endpoints:

# point your existing OpenAI / Anthropic SDK at our capture endpoint
kolm capture --provider anthropic --as support-replies --namespace prod-tickets

# inspect what has been captured so far
kolm capture status

# export the corpus for any trainer (returns JSONL)
kolm labels --namespace prod-tickets --format jsonl > corpus.jsonl

# or compile straight to a signed local artifact
kolm distill --namespace prod-tickets --base-model phi-3-mini
kolm inspect ~/.kolm/artifacts/prod-tickets.kolm

A worked example: 50 engineers on Anthropic.

The math only matters at scale. Here is one team-shaped example to make it concrete.

Shape: A fifty-person engineering org running an internal customer-support copilot on Claude Opus through Anthropic's API. Volume: approximately eighty thousand calls per month, average two thousand input tokens and four hundred output tokens per call. Bill: roughly twelve thousand dollars per month at current Opus pricing, climbing as the team grows.

Month	Frontier calls	Captured pairs	Verified pairs	Local LoRA status	Frontier spend
Month 0	80,000	0	0	None	$12,000
Month 1	80,000	80,000	62,400 (78%)	Below threshold	$12,000
Month 2	80,000	160,000	~125k	Compiled. K=0.86 on 240-pair holdout. 78% Opus quality.	$12,000
Month 3	~32,000	~32,000	~25k	60% of traffic served locally. Frontier slice is the long tail.	$4,800
Month 4	~16,000	~16,000	~13k	80% local. LoRA retrained on tail traffic.	$2,400
Month 12	~12,000	~12,000	~10k	85% local steady-state. Tail is novel cases & escalations.	$1,800

The interesting line is month two. After two months of capture and one distill cycle, this team has a local Phi-3-mini-LoRA that scores K=0.86 on a held-out two-hundred-and-forty-pair eval — a signed gate that says, in plain numbers, "this artifact replicates seventy-eight percent of Opus's behavior on this team's prompts." The artifact runs on a single 5090 at sixty milliseconds per call. The marginal cost per call is electricity.

From month three onward the proxy routes the easy slice of traffic to the local artifact and only escalates the tail to Opus. The frontier bill compresses to the slice that the local model cannot yet handle. The team keeps capturing on that escalated slice, retrains the LoRA on a quarterly cadence, and the local-served fraction grows.

Twelve-month total: ~$48k in frontier spend (down from $144k linear), one signed local artifact owned in perpetuity, sub-hundred-millisecond p50 latency on most calls, no PHI or customer data leaving the perimeter on the local-served slice. The frontier bill paid for the artifact. That is the rent-to-own moment.

The legal frame: who owns what.

The legal question every counsel asks first is whether training a model on frontier-model outputs is permitted. Three things resolve it cleanly:

Anthropic and OpenAI both grant the customer ownership of the outputs they paid for, with carve-outs for prompt content and a no-competing-model clause that targets training a general-purpose foundation model. We are not training a competing general model; we are training a task-specific LoRA on the customer's own prompts and the outputs they purchased.
The artifact is task-specific by construction. The signed manifest names the namespace, the prompt schema, the verifier, and the holdout eval. It is a calculator, not a chatbot. Provider TOS allow this use.
The customer holds the keys to the artifact. No part of the LoRA is hosted on our infrastructure after delivery. We do not retain rights to it. We do not retrain on it. The customer can run it offline, on-prem, in a SCIF if they want.

Counsel still has to read the relevant TOS for the customer's specific contract tier. We are not a substitute for that. But the structure of capture-and-distill is the structure that the TOS authors clearly contemplated, not a loophole.

The privacy frame: who sees what.

The capture proxy is a network hop. It sees the prompt and the response in transit. Three things make this safer than it sounds:

TLS end-to-end. The proxy terminates TLS, inspects the body for capture, and re-establishes TLS to the upstream provider. No plaintext on disk.
Tenant-scoped storage. Captured pairs are stored in a database row keyed by tenant id. There is no cross-tenant join path. We never train on captured data without an explicit, per-namespace consent flag flipped on the tenant's account.
Optional bring-your-own deployment. For regulated buyers (HIPAA, GDPR, ITAR), the capture proxy ships as a docker image you run inside your VPC. No data ever transits our infrastructure. The compile bridge is the only network hop, and even that becomes optional in the Enterprise tier with the on-prem trainer bundle.

The privacy model is straightforward: we are a passthrough, the data is yours, the artifact is yours, the receipts let you prove it.

Why the signed receipt matters.

When the artifact ships, it carries a receipt. The receipt is an HMAC-SHA256 chain over (a) the captured corpus content hash, (b) the verifier configuration, (c) the holdout eval set hash, (d) the resulting K-score, (e) the base model identity and the LoRA delta hash, and (f) the build environment manifest. The chain is anchored optionally to Sigstore Rekor for cross-organization audit.

What this buys you in practice: a counsel review can reproduce the exact dataset that produced the exact LoRA that produced the exact eval score. "Did we train on PII" becomes a query against the receipt, not a forensic exercise. "Has this model drifted" becomes a comparison of two receipts. "Is this the model we deployed" becomes a verify call.

The same receipt is the answer to the auditor question that kills most internal-AI projects: what is in this thing. The receipt is the bill of materials. The signature says we did not change it after the fact.

If your local LoRA can answer the question "what data trained you and on what date" with a cryptographic proof, your security review goes from a quarter of meetings to a fifteen-minute conversation.

vs. fine-tuning, vs. observability, vs. RAG.

Three categories of tool look adjacent on the surface. They are not the same product.

vs. OpenAI / Anthropic fine-tuning.

Fine-tuning hosted by the same provider you are renting from gives you a lower per-call rate but no local artifact. The model still lives in their data center, the data still leaves your perimeter, the latency floor is unchanged, and you are still on the deprecation treadmill. We ship a file. They ship a charge code.

vs. observability tools (LangSmith, Langfuse).

Observability captures for diagnostics: trace, debug, replay. The capture is in the right place but the next move is not "train a local model from these traces." There is no verifier on the captures, no distiller, no signed artifact. We are downstream of where they stop.

vs. RAG.

RAG is the right pattern for situations where the relevant knowledge changes on a daily cadence and you want the frontier model's reasoning over fresh documents. Capture-and-distill is the right pattern for tasks that are stable over months and where the marginal call cost is the binding constraint. They compose. A common deployment shape is a local LoRA for the high-volume routine slice and a frontier-RAG fallback for the long tail and novel queries.

capture surfacedrop-in proxy

training dataverified pairs (yours)

output artifactsigned .kolm LoRA

where it runsyour hardware

what it costs to callelectricity

what stays at the frontierlong-tail escalations

capture-and-distill, end to end. The proxy compounds. The artifact persists.

Start here.

Two ways in. Smallest commitment: install the CLI, point one of your existing services at the capture endpoint for a week, run kolm capture status to see what your verified pair count looks like. There is no charge for capture under a hundred thousand pairs per month. Larger commitment: book a thirty-minute design-partner call and we will scope a namespace plan with your team, including the bring-your-own-VPC option if your data residency requires it.

npm i -g github:sneaky-hippo/kolmogorov-stack
kolm config base https://kolm.ai
kolm login

# 30 seconds in:
kolm capture --provider anthropic --as first-run

The capture endpoint is what makes the rest possible. Once your existing traffic is flowing through it, every other piece — the labels endpoint, the distill bridge, the receipt, the K-score — is one CLI call away. The first call you proxy through the capture is the first dollar that goes from rent to deposit.

use case Capture & distill →

The end-to-end shape, with the four endpoints and the CLI surface laid out.

read K-sample verified inference →

The verifier behind every captured label. Why the dataset is trustworthy.

measure The K-score, defined →

The release gate that tells you when the local artifact is ready to retire a slice of traffic.

read What an AI compiler is →

The compile step that turns the verified corpus into a signed local artifact.

Rent vs. buy compute. Keep the artifact.

Contents