Research · Applied · 2026-05-14 · 11 min read

Continual learning · capture, train, swap, sign.

Frontier models are frozen at the cutoff. The buyer with an industry-specific tail does not get one. kolm closes the loop with captures: live traffic feeds a LoRA distill, the K-score gate refuses to ship a worse one, the receipt chain signs the swap. Four commands, no platform lock-in.

By kolmTag captures · lora · online · applied

The frozen-model problem.

A frontier model is trained on a snapshot of the web ending some months ago. Its weights are fixed; its tokenizer is fixed; the policy that ranks responses is fixed. For most consumer chat this is fine. For a buyer with an industry-specific tail (clinical workflows, regulatory deadlines, internal jargon, customer-name vocabulary), the tail does not exist in the training distribution. The model is competent on the long shoulder and merely plausible on the tail.

The conventional answer is RAG: paste the buyer's documents into the prompt at inference time and hope the model uses them. RAG works for a class of retrieval-shaped queries and fails for anything that needs the model to have internalized a vocabulary or a policy. The model never learns. Every new shipment of buyer-specific knowledge is a context-window expense, not an asset.

The right answer is a distill that converts the buyer's traffic into weights. The traffic exists already: it is the captures the buyer is sending to a frontier API today. The distill is short: a LoRA over a small base. The risk is regression: the distill is worse than the model it replaces. The fix is a gate.

The four-step loop.

The kolm loop is four CLI verbs and one assertion. The buyer runs each step manually, on a schedule (nightly, weekly), or on an event (volume threshold, semantic drift detected).

# 1. Capture: proxy live traffic, tag, redact, store.
$ kolm capture --tenant acme --tag clinical-intake

# 2. Train: distill the captures into a new .kolm adapter.
$ kolm compile --task "clinical intake triage" --from-captures acme/clinical-intake

# 3. Score: K-score the new artifact against the held-out evals.
$ kolm eval --artifact ./out/intake_v2.kolm --suite clinical-intake-evals

# 4. Swap: hot-swap the adapter only if it K-scores higher than the current one.
$ kolm swap --artifact ./out/intake_v2.kolm --strategy higher-k-wins

The four verbs map to four files in the source tree: src/capture.js, src/spec-compile.js, src/benchmark.js, src/serve.js. The connecting state is the receipt chain: every step writes a receipt that references the previous step's CID. A swap that ships also writes a registry row with the from-CID and the to-CID so a deployer can roll back the adapter on the fly.

The K-score gate.

The K-score is a weighted aggregate of accuracy, size, latency, cost, and coverage. K = 0.40 . A + 0.15 . S + 0.15 . L + 0.15 . C + 0.15 . V. The default ship threshold is 0.85, but in the continual-learning loop the threshold is dynamic: the new artifact must beat the current artifact's K-score, plus a margin.

Current KNew KMarginAction
0.8710.889+0.018swap
0.8710.872+0.001refuse (within noise)
0.8710.850-0.021refuse (regression)
0.8710.823-0.048refuse + alert

The margin defaults to +0.01 so noise alone does not flip the adapter. The alert threshold defaults to -0.03 so a meaningful regression pages the engineer who owns the recipe.

Hot-swap mechanics.

The serving runtime loads .kolm artifacts via a versioned adapter registry. A hot-swap is two file renames and one HUP signal. There is no model restart, no service interruption, no cold cache.

# The current adapter symlink points at v1.
adapters/clinical-intake -> adapters/clinical-intake_v1.kolm

# The swap atomically repoints the symlink.
ln -sf adapters/clinical-intake_v2.kolm adapters/clinical-intake.swap
mv -f  adapters/clinical-intake.swap    adapters/clinical-intake

# The runtime polls the symlink every second; on change, it loads the new adapter.
# In-flight requests finish on the old one; new requests use the new one.

The trade is that two adapters live in memory briefly. For LoRA adapters at r=16 on a 7B base, that is two adapters of about 30 MB each, which is negligible against the 14 GB base weights. The window is bounded by the longest in-flight request; on a typical chat path with max_tokens=512 at 200 tok/s that is 2.5 seconds.

Provenance, every swap.

Every artifact carries a receipt chain over task → seeds → recipes → evals → package, signed under the tenant key, with the CID embedded. Every swap writes a registry row that names the from-CID and the to-CID. The deployer can answer five questions without leaving the registry:

A regulator who asks "show me the trail" gets all five from one query. None of those answers depend on us being alive. The verifier is Rust, forbid(unsafe_code), dependency-pinned, and ships as a 4 MB binary the deployer can vendor.

Cadence and cost.

Three cadences cover most production patterns.

CadenceTriggerTypical costSuits
Nightlycron at 02:00 local$0.05-$0.50chat, support, internal tools
Volume-gated10k new captures$0.10-$1.00variable workloads, seasonal traffic
Drift-detectedembedding distance > threshold$0.10-$2.00regulated workflows, compliance-sensitive

The dollar figures are the median LoRA-distill cost for a Qwen2.5-3B target on a rented A100 via the kolm compute picker (see /compute for the per-backend rates). The total is dominated by the cold-start of the rental container; the actual training pass usually takes 30-90 seconds.

Failure modes.

The honest list of ways this loop breaks.

The loop is not a silver bullet. It is the only structural answer to the frozen-model failure mode that does not require trusting a vendor to retrain on your schedule. kolm ships the loop; you own the captures; the receipt chain proves both.