tests.jsonl, the eval suite stored in the artifact. The dominant term.Every .kolm artifact carries a single decimal, K, between 0 and 1.
Below the gate, kolm refuses to write the file. No score, no eval, no ship.
Components are normalized to [0,1] against the frontier baseline
the artifact was distilled from. 1.0 means the artifact ties or beats
frontier on every component. 0.0 means it doesn’t exist.
No mystery. Every component is reproducible from the artifact and the tests it was sealed with. Anyone with the file can re-run the math.
tests.jsonl, the eval suite stored in the artifact. The dominant term.kolm compile --gate 0.85 for high-stakes tasks.
The threshold is part of the manifest, anchored to the receipt, immutable post-sign.# inspect any .kolm $ kolm inspect support-triage.kolm task : "triage support email, output {label,severity}" base_model : qwen2.5-coder-7b-instruct-q4_0 k_score : 0.94 k.accuracy : 0.974 (380 / 390 cases) k.size : 0.95 (38 MB vs 80 GB frontier) k.latency : 0.94 (80 ms vs 1240 ms) k.cost : 1.00 ($0 local vs $0.018 frontier) k.coverage : 0.92 (recipe pack hit-rate) signature : ok (hmac chain verified, 4 layers)
Every model card in the wild today is a tasteful menu of cherry-picked plots. You read down the page and never decide. K is the decision.
Every component is recoverable, but the gate hides the wiring. A green K means the artifact passed its own tests, fits on the device, runs faster than frontier, costs nothing to invoke, and the recipe pack covers most of the outputs it’ll be asked for. A single decimal lets a CI pipeline gate a deploy. Lets a registry rank artifacts. Lets a phone decide whether to download.
If you don’t like 0.70, override it. The number is the contract; the threshold is the policy. They’re separate on purpose.