kolm / why kolm
Why kolm.
A short, honest read on where kolm wins, where it loses, and what it costs to put in your stack. Five-row receipts table at the bottom against the surfaces you are likely choosing between.
The problem with cloud-only AI.
Your prompts and your customer data go to the vendor's hardware. Your model upgrades happen on the vendor's schedule. Your prompt that worked yesterday rephrases itself differently today because the model under the same name was changed. There is no signed proof of what was running at 04:12 UTC two months ago. If the vendor goes down, your product goes down.
The shape of the lock-in.
The closed-weights frontier API is a contract you cannot replay. If your customer's lawyer asks "what model produced this output," the honest answer is "the one the vendor was hosting under that name, until they changed it." That is not an answer your regulator accepts.
The problem with fine-tuning APIs.
You upload your training data to a vendor's cluster. You get back a model ID that runs only on their hardware. You cannot inspect the resulting weights, you cannot pin a version, and you cannot move it to an air-gapped environment. The cost-per-token is higher than the base model and the base model itself drifts under you.
The data path.
Your sensitive examples were on your laptop, then on the vendor's GPU, then in the vendor's training-data lake, then maybe in the vendor's next foundation-model corpus. The contract says they will not, but the data path was open. Your security team will tell you that is what matters.
The problem with RAG.
Retrieval is the right answer for "your data is in 200,000 documents and the model has to cite the source." But the rest of the pipeline is fragile: embeddings drift when the embedder is updated, the retriever's idea of similarity is a black box, and the final-output citation can hallucinate a document that was not in the top-k. The receipt of "what got retrieved, when, from where" is rarely persisted.
Citation drift.
The same query at the same prompt returns different chunks next quarter because the embedder, the index, or the chunker was changed. You cannot replay the original answer for an auditor. /use-cases/enterprise-search is the kolm answer to this specifically.
The problem with self-hosting a frontier model.
You can run Llama or Qwen or DeepSeek on your hardware. That gets you ownership of weights and air-gap capability. It does not give you a governance layer: no K-score gate, no compliance pack, no signed receipts, no replay. You have a model. You do not have a system your auditor will sign off on.
Missing the system around the model.
Ollama runs the weights. vLLM serves them fast. Neither produces a manifest, a binder PDF, or a receipt your CISO can verify offline. That is the gap kolm fills.
What kolm adds.
Kolm sits on top of those tools. It does not replace your serving engine. It produces a signed artifact that wraps your task description, your evaluator, your compliance pack, and a measured K-score into one file your auditor and your CI can both read.
01 . ownership
One signed artifact you own.
A .kolm is a zip with a canonical manifest, a CID, a signature, and an evaluator. Copy it, archive it, replay it five years from now.
02 . gate
Compile-time K-score gate.
You set k_min in the recipe. The compile fails if the artifact is below the floor. Your CI fails before merge.
03 . receipts
HMAC receipt over every run.
Each kolm run emits a receipt over (cid, input_hash, output_hash, ts, k_score). Replay-verifiable. Auditor-readable.
04 . compliance
Compliance pack pre-baked.
HIPAA, SR 11-7, NIST AI RMF, EU AI Act profiles ship as named packs. The pack is in the manifest. The auditor reads it from the artifact.
05 . dispatch
18-backend dispatch.
Six local (CPU, CUDA, MPS, MLX, ROCm, DirectML), eight remote (Modal, RunPod, Together, Vast, Lambda, Replicate, fal, SSH), four serving engines (vLLM, SGLang, TGI, TRT-LLM). The CLI picks the cheapest available compute that meets your air-gap.
06 . portability
Runs anywhere a .kolm runs.
Laptop, BYOC, hospital workstation, behind-the-firewall jump host, GitLab runner. Same artifact, same receipt shape.
Where kolm is the wrong choice.
We say this on purpose. Three workloads where kolm is overkill or actively a worse fit.
- Sub-100ms toy chat with no proof requirement. If you want a freeform chatbot and you do not care about replay, an OpenAI or Anthropic key + a streaming UI is shorter to build and cheaper to run. Use them.
- Raw frontier-model chat with no specific task. Kolm is a task compiler. If your job description is "be a generalist assistant," there is no task to evaluate against, no K-score to gate on, no compliance pack to bind. Use the frontier API directly.
- No compliance need and no audit need. If your workload is a side-project, an internal hackathon, or a marketing demo, the ceremony around manifest + receipt + binder is friction you do not benefit from.
The integration calculus.
Two numbers that matter when you are evaluating whether to put kolm in your stack.
- 1 line to install.
npm install -g kolmon macOS, Linux, Windows. Or Homebrew, winget, apt. - ~5 minutes to first signed artifact.
kolm login, write a one-line task,kolm compile, watch the gate pass, ship the .kolm. - 0 dependencies on the runtime side. The verifier SDK is pure-stdlib. The TS SDK has no third-party deps. The Python SDK has none either.
- Drops next to your existing serving stack. Already running vLLM or TGI? Point a backend at it. Already using OpenAI?
kolm serve --openai-compatexposes a drop-in shape.
The honest cost is the recipe: writing a 12-line JSON that names your task, your objective, your evaluator, and your k_min. That part is on you. Everything around it is one command.
Receipts table.
Compared like-for-like against five alternatives most teams evaluate alongside kolm. yes means the feature ships as part of the product. no means it does not. partial means there is some answer but it is not the same shape as kolm's.
| capability | kolm | OpenAI fine-tune | Together | Predibase | OpenPipe | RAG (LangChain etc.) |
|---|---|---|---|---|---|---|
| ownership of the artifact | yes | no | partial | partial | no | no |
| compile-time K-score gate | yes | no | no | no | partial | no |
| HMAC receipt on every run | yes | no | no | no | no | no |
| compliance pack pre-baked | yes | no | no | partial | no | no |
| air-gap capable | yes | no | no | partial | no | partial |
Per-vendor side-by-sides at /compare. Every comparison there includes at least one tie row, because they exist.
Next.
The fastest way to evaluate kolm is to compile one artifact for a task you actually have. The PHI redactor tutorial takes 12 minutes end-to-end and ends with a verified .kolm on your disk.