The dirty secret of cloud AI.
Every powerful AI product today rests on a hidden assumption: that to be useful, it has to read everything you give it, on a server you don't control, run by a company you have no leverage over. Your prompt becomes a row in their log. Your photo becomes a tensor on their GPU. Your medical record, your inbox, your location, your calendar: all of it ends up in a third-party perimeter, even when the privacy policy says otherwise.
That's not a bug of the architecture, it's the architecture. The model lives where the GPUs are. The data has to come to the model. Privacy and personalization are presented as a tradeoff. The more your AI knows about you, the more your data has to leave.
The inversion.
The premise of compiled AI is that this tradeoff is an artifact of how the technology was first deployed, not a law of physics. Frontier models, in 2026, are good enough teachers that their intelligence can be distilled into a small open-weight student. Phones shipped since 2021 have NPUs powerful enough to run a 3B-parameter model at interactive latency. The compile step closes the gap.
You bring the frontier model API key you already pay for. kolm compile takes a task description, your seed examples, and your evals, and runs the frontier as a teacher inside your VPC. It distills a personal LoRA, extracts a deterministic recipe pack, builds a multimodal recall index of your data, runs the verifier, and seals everything into a single signed file: .kolm.
The artifact ships to the device. The data already on the device is what the artifact reads. Frontier-class intelligence runs locally, on the user's actual emails, the user's actual health record, the user's actual photos, with zero runtime egress. Privacy enables a depth of personalization that cloud AI structurally cannot match.
The more private it is, the more personal it can be. The artifact reads what the user actually has, because the artifact is where the user already is.
Why deploying raw LLMs is broken.
The status quo for "AI features" in any privacy-sensitive app is one of three options, each broken:
- Call a cloud API. Fast to ship, slow to legal-approve. Every prompt is a BAA-scoped event in healthcare, an examinable record in finance, a discoverable artifact in litigation. Vendors deprecate models on 14-day notice. The cost grows linearly with usage forever. Data leaves on every call.
- Self-host a runtime. Pull a quantized open-weight model into llama.cpp or MLC.ai. Solves data egress, but the model is generic; it doesn't know the user. You still need a way to teach it the task, ground it in the user's data, evaluate it, sign it, and ship it across iOS and Android. That's an ML team you don't have.
- Skip AI entirely. What most regulated teams quietly do. The compliance officer wins, the user loses, the competitor with looser scruples ships first.
Compiled AI is the fourth option. Cloud at compile time, edge at run time. You get the frontier as a teacher in a controlled environment, you get a personal artifact your users own, and you get a runtime that needs no link.
What we mean by compiler.
The word matters. We don't mean wrapper, we don't mean platform, we don't mean framework. We mean compiler in the same sense that gcc is a compiler: a deterministic process that turns a higher-level description (a task spec, examples, evals, a frontier model API key) into a lower-level executable artifact (a signed file with a model, an adapter, a recipe pack, a recall index, and a verifier).
Compilation is not training. Training builds a model from scratch. Compilation takes a model that already exists and produces something narrower, faster, and personal. The frontier did the hard part. The compile step makes it portable.
K-score: the cover number.
Every compiled artifact ships with a single number on the cover and five you can defend. The K-score is the harmonic mean of size, accuracy, latency, cost, and coverage, normalized against the frontier baseline measured at compile time. Below the gate, the artifact does not ship. The gate is configurable; the default is 0.70.
This matters because "my model is good enough" has been a feeling, not a number, for the entire history of applied ML. Compilation lets us measure whether the artifact is worth shipping the way we measure whether a build is worth releasing. If the test suite passes, you ship. If the K-score passes, the artifact ships.
Receipts: the new audit trail.
Every .kolm file carries an HMAC chain over its layers: manifest hash, model hash, adapter delta, recipe pack, recall index, verifier set. Tamper with any layer and the chain breaks. Verifiable offline. Anchored to the public Kolmogorov registry. Useful at audit, defensible at deposition, accepted by App Store review.
The receipts are not paperwork. They are the new way regulated software gets approved. A compliance officer can look at the manifest, see exactly which base model the artifact was distilled from, see exactly which examples it was trained on, see exactly which evals it passed, and verify all of it cryptographically. No black box, no "trust us," no third-party promise. The thing in the file is the thing on the device.
Why this category, why now.
Three forces are converging in 2026.
- Regulation is closing the cloud-AI window. The EU AI Act enters enforcement on systems touching personal or biometric data. HIPAA enforcement on AI-driven workflows in healthcare is escalating. SR 11-7 model risk requirements at banks now extend to LLMs. Cloud calls are increasingly an audit row, not an architectural detail.
- Hardware is no longer the blocker. The Neural Engine, Hexagon, and APU shipped on every phone since 2021 run a 3B-class model at sub-100ms latency. WebGPU is shipping in browsers. Edge boxes are cheap. The runtime layer is solved.
- Open weights match yesterday's frontier. Qwen 2.5, Llama 3, Phi-3, Hermes-3 in the 3B-7B range now match GPT-4 of two years ago on narrow tasks. Distillation closes the rest. The student is good enough; only the compile step was missing.
What we're building.
kolm is the build system for private AI. It is not a runtime, not a vendor SDK, not an ML platform. It is the deterministic compiler that turns a task description into a portable signed file, and the registry that makes those files discoverable, and the receipts chain that makes them defensible.
The runtime is whatever you want: llama.cpp on a server, MLC.ai on a phone, ExecuTorch on iOS, the kolm WASM runtime in a browser. The artifact is the same. The compile step is the new layer.
If Webpack made web apps shippable, Docker made services shippable, and the App Store made mobile apps shippable, kolm makes private AI shippable. The unit is one signed file.
An open invitation.
We are publishing the spec (RS-1, MIT). We are open-sourcing the runtime. We are running the public registry as a neutral commons. The compiler itself is the proprietary part, because the compiler is the moat: every compile teaches the compiler which task descriptions, model architectures, and recipe patterns produce high K-scores, and that flywheel is what compounds. The output of a kolm compile is yours. The compiler stays a service.
If you build software in healthcare, finance, defense, or for end-user privacy, this is the architecture you've been wanting. If you build infrastructure, the artifact format is the new portable unit. If you research deployable AI, the K-score and the receipt chain are open standards that need your eyes.
One paragraph.
Cloud AI got smart by sending your users' data away. Compiled AI gets smart by leaving it there. The frontier teaches the artifact in your VPC. The artifact ships to the device. The data the user trusts your app with stays where they trust it. The K-score on the cover tells you whether the artifact is worth shipping. The receipt chain tells you whether the artifact is the one you signed. The runtime is whatever runtime you want. The artifact is one signed file. That's the inversion. That's the category. That's kolm.
kolm compile →
Ten lines of CLI. A signed artifact in five minutes.
read What's inside a.kolm →
The seven components and the receipt chain that ties them together.
measure How K-score is computed →One number on the cover. Five you can defend at review.
audit Security & compliance posture →HIPAA path, SOC 2 in flight, EU AI Act conformity, the receipt chain in detail.