04 · The gate

K-score is one number. Ship or don’t.

Every .kolm artifact carries a single decimal, K, between 0 and 1. Below the gate, kolm refuses to write the file. No score, no eval, no ship.

Definition K = wa·accuracy + ws·size_inv + wl·latency_inv + wc·cost_inv + wv·coverage
where Σw = 1

Components are normalized to [0,1] against the frontier baseline the artifact was distilled from. 1.0 means the artifact ties or beats frontier on every component. 0.0 means it doesn’t exist.

01 · Components

Five inputs. One number.

No mystery. Every component is reproducible from the artifact and the tests it was sealed with. Anyone with the file can re-run the math.

w · 0.40
a
Accuracy
Pass-rate on tests.jsonl, the eval suite stored in the artifact. The dominant term.
w · 0.15
s
Size
Inverse-normalized against base. Smaller is closer to 1. A 38 MB lift is a 0.95.
w · 0.15
l
Latency
p50 ms / token. Locally measured. 80 ms vs frontier 1240 ms = 0.94.
w · 0.15
c
Cost
$/1k runs. Local-only artifacts hit 1.0; cloud-pointer hybrids drop proportionally.
w · 0.15
v
Coverage
Share of expected outputs covered by the recipe pack. Drives speculative-decoding speedup.
02 · The gate

0.70 ships. Anything below fails closed.

≥ 0.70
Default ship gate. Compile fails closed if K < 0.70 — the artifact never gets a signature. Tune with kolm compile --gate 0.85 for high-stakes tasks. The threshold is part of the manifest, anchored to the receipt, immutable post-sign.
# inspect any .kolm
$ kolm inspect support-triage.kolm
  task            : "triage support email, output {label,severity}"
  base_model      : qwen2.5-coder-7b-instruct-q4_0
  k_score         : 0.94
  k.accuracy      : 0.974   (380 / 390 cases)
  k.size          : 0.95    (38 MB vs 80 GB frontier)
  k.latency       : 0.94    (80 ms vs 1240 ms)
  k.cost          : 1.00    ($0 local vs $0.018 frontier)
  k.coverage      : 0.92    (recipe pack hit-rate)
  signature       : ok      (hmac chain verified, 4 layers)
03 · Why one number

Because shipping is binary.

Every model card in the wild today is a tasteful menu of cherry-picked plots. You read down the page and never decide. K is the decision.

Every component is recoverable, but the gate hides the wiring. A green K means the artifact passed its own tests, fits on the device, runs faster than frontier, costs nothing to invoke, and the recipe pack covers most of the outputs it’ll be asked for. A single decimal lets a CI pipeline gate a deploy. Lets a registry rank artifacts. Lets a phone decide whether to download.

If you don’t like 0.70, override it. The number is the contract; the threshold is the policy. They’re separate on purpose.