Continual learning loop: capture, train, swap, sign

The frozen-model problem
The four-step loop
The K-score gate
Hot-swap mechanics
Provenance, every swap
Cadence and cost
Failure modes

The frozen-model problem.

A frontier model is trained on a snapshot of the web ending some months ago. Its weights are fixed; its tokenizer is fixed; the policy that ranks responses is fixed. For most consumer chat this is fine. For a buyer with an industry-specific tail (clinical workflows, regulatory deadlines, internal jargon, customer-name vocabulary), the tail does not exist in the training distribution. The model is competent on the long shoulder and merely plausible on the tail.

The conventional answer is RAG: paste the buyer's documents into the prompt at inference time and hope the model uses them. RAG works for a class of retrieval-shaped queries and fails for anything that needs the model to have internalized a vocabulary or a policy. The model never learns. Every new shipment of buyer-specific knowledge is a context-window expense, not an asset.

The right answer is a distill that converts the buyer's traffic into weights. The traffic exists already: it is the captures the buyer is sending to a frontier API today. The distill is short: a LoRA over a small base. The risk is regression: the distill is worse than the model it replaces. The fix is a gate.

The four-step loop.

The kolm loop is four CLI verbs and one assertion. The buyer runs each step manually, on a schedule (nightly, weekly), or on an event (volume threshold, semantic drift detected).

# 1. Capture: proxy live traffic, tag, redact, store.
$ kolm capture --tenant acme --tag clinical-intake

# 2. Train: distill the captures into a new .kolm adapter.
$ kolm compile --task "clinical intake triage" --from-captures acme/clinical-intake

# 3. Score: K-score the new artifact against the held-out evals.
$ kolm eval --artifact ./out/intake_v2.kolm --suite clinical-intake-evals

# 4. Swap: hot-swap the adapter only if it K-scores higher than the current one.
$ kolm swap --artifact ./out/intake_v2.kolm --strategy higher-k-wins

The four verbs map to four files in the source tree: src/capture.js, src/spec-compile.js, src/benchmark.js, src/serve.js. The connecting state is the receipt chain: every step writes a receipt that references the previous step's CID. A swap that ships also writes a registry row with the from-CID and the to-CID so a deployer can roll back the adapter on the fly.

The K-score gate.

The K-score is a weighted aggregate of accuracy, size, latency, cost, and coverage. K = 0.40 . A + 0.15 . S + 0.15 . L + 0.15 . C + 0.15 . V. The default ship threshold is 0.85, but in the continual-learning loop the threshold is dynamic: the new artifact must beat the current artifact's K-score, plus a margin.

Current K	New K	Margin	Action
0.871	0.889	+0.018	swap
0.871	0.872	+0.001	refuse (within noise)
0.871	0.850	-0.021	refuse (regression)
0.871	0.823	-0.048	refuse + alert

The margin defaults to +0.01 so noise alone does not flip the adapter. The alert threshold defaults to -0.03 so a meaningful regression pages the engineer who owns the recipe.

Hot-swap mechanics.

The serving runtime loads .kolm artifacts via a versioned adapter registry. A hot-swap is two file renames and one HUP signal. There is no model restart, no service interruption, no cold cache.

# The current adapter symlink points at v1.
adapters/clinical-intake -> adapters/clinical-intake_v1.kolm

# The swap atomically repoints the symlink.
ln -sf adapters/clinical-intake_v2.kolm adapters/clinical-intake.swap
mv -f  adapters/clinical-intake.swap    adapters/clinical-intake

# The runtime polls the symlink every second; on change, it loads the new adapter.
# In-flight requests finish on the old one; new requests use the new one.

The trade is that two adapters live in memory briefly. For LoRA adapters at r=16 on a 7B base, that is two adapters of about 30 MB each, which is negligible against the 14 GB base weights. The window is bounded by the longest in-flight request; on a typical chat path with max_tokens=512 at 200 tok/s that is 2.5 seconds.

Provenance, every swap.

Every artifact carries a receipt chain over task → seeds → recipes → evals → package, signed under the tenant key, with the CID embedded. Every swap writes a registry row that names the from-CID and the to-CID. The deployer can answer five questions without leaving the registry:

What task is this model trained for? The task field in the manifest.
What data trained it? The capture namespace and the seed range.
What code produced it? The deterministic recipe bytes and the trainer version.
What does it score? The K-score from the evals.json block.
What did it replace? The from-CID column in the swap log.

A regulator who asks "show me the trail" gets all five from one query. None of those answers depend on us being alive. The verifier is Rust, forbid(unsafe_code), dependency-pinned, and ships as a 4 MB binary the deployer can vendor.

Cadence and cost.

Three cadences cover most production patterns.

Cadence	Trigger	Typical cost	Suits
Nightly	cron at 02:00 local	$0.05-$0.50	chat, support, internal tools
Volume-gated	10k new captures	$0.10-$1.00	variable workloads, seasonal traffic
Drift-detected	embedding distance > threshold	$0.10-$2.00	regulated workflows, compliance-sensitive

The dollar figures are the median LoRA-distill cost for a Qwen2.5-3B target on a rented A100 via the kolm compute picker (see /compute for the per-backend rates). The total is dominated by the cold-start of the rental container; the actual training pass usually takes 30-90 seconds.

Failure modes.

The honest list of ways this loop breaks.

Catastrophic forgetting. A LoRA distill that overweights the new captures forgets the long shoulder. The mitigation is the held-out evals: if the new artifact scores worse on the broad rubric, the K-score gate refuses.
Eval-set drift. If the held-out evals were authored months ago, they may no longer reflect what the buyer cares about. The mitigation is to mark a fraction of captures as "evaluator" examples in the capture step (kolm capture --tag eval-candidate); the trainer rotates them into the held-out set on the next compile.
Capture pollution. A user types nonsense; the captures absorb it; the next distill regresses on a clean prompt. The mitigation is the receipt chain plus the swap log: roll back to a known-good CID.
Provider deprecation. The frontier API the captures came from raises rates or sunsets. The mitigation is that the captures already exist in your registry; the next distill does not need the API.

The loop is not a silver bullet. It is the only structural answer to the frozen-model failure mode that does not require trusting a vendor to retrain on your schedule. kolm ships the loop; you own the captures; the receipt chain proves both.

previous Multi-token-per-step prediction→

Pre-training papers, serving-side dual, throughput math.

next Receipt chains→

The five-step HMAC ladder over canonical JSON.

Continual learning · capture, train, swap, sign.

Contents