Runs anywhere it fits

Run your model on the hardware you already have.

Your behavior compiles into one signed .kolm file - model, recipe, evals, and receipt together. Kolm ranks the runtimes it fits, from a laptop to your private cloud to a GPU fleet, and ships it there with the launch recipe and proof in the box. Pick where it runs based on where it actually fits.

See where your model runs Compile your first model

6 runtime families 18 named runtimes One file, recipe and receipt included Blocked paths tell you why

TGT-01 claims-redactor.kolm RANKED · v3.3

Best fit: vLLM · hosted GPU
Targets: 18 ranked
Blocked: llama.cpp · context
Receipt: sha256:a1f0…

One file, every runtime

6runtime families

18named runtimes and platforms

receiptsship with every target

no rewriteto get here, no lock-in to leave

01 · Runtime families

Six places your model can run. Pick the one that fits.

Kolm doesn't replace your serving engine, GPU cloud, or device framework - it ships your model to them, with the launch recipe and proof attached. Each family gets a real fit signal, so you deploy where the behavior actually runs, not where a logo promises it might.

Reg · recipe + receipt

Serving engine

vLLM, SGLang, TensorRT-LLM, NVIDIA Triton, Hugging Face TGI. For high-throughput, OpenAI-compatible serving at scale - batching, tensor parallelism, and GPU-tuned execution when your traffic demands it.

Reg · target manifest

Hosted GPU

Baseten, Modal, Runpod, Replicate, Hugging Face Inference Endpoints. The fastest path from a compiled model to a live endpoint - managed scaling, no infrastructure to stand up yourself.

Reg · fit caveats

Local runner

llama.cpp, Ollama, LM Studio, MLX, MLC LLM. Run it on the laptop you already have - ideal for small, quantized models, demos, support tooling, and local workflows that fit your machine's limits.

Reg · files + hashes

Portable inference

ONNX Runtime, OpenVINO, TVM, Triton model repository. Move one model across CPU, GPU, and accelerators - standard formats and repository packaging that drop into the stack you run.

Reg · target constraints

Device edge

Core ML, LiteRT, ExecuTorch. Ship to the device when the model format, memory ceiling, energy budget, and OS version all line up - Kolm checks the fit before you commit.

Reg · promotion gate

Your own fleet

KServe, Ray Serve, BentoML, Kubernetes, your private cloud, restricted networks. For platform teams that need controlled rollout, liveness checks, audit logs, canaries, and an internal registry - running inside your own perimeter.

02 · Knowing it's ready

Know it runs before you ship it.

No logo list, no guesswork. You see exactly where your model stands for each runtime - and the same model can be ready for one target and blocked for another, with the reason spelled out either way.

State · candidate

Fit candidate

The inputs are in, the runtime looks plausible, and Kolm can write a recipe - but some runtime files or proof may still be missing.

State · recipe

Recipe ready

Everything you need to launch is assembled: manifest, model hints, eval gates, environment, start command, health check, and rollback.

State · receipt

Receipt verified

The package carries hashes, verifier output, eval results, and the identity of who shipped it - proof you can hand to anyone.

State · blocked

Blocked, with the reason

If it can't run somewhere, Kolm tells you why - context length, unsupported tools, missing quantization, data boundary, or a runtime mismatch. Never a silent failure.

03 · Best fit, ranked

Kolm tells you the best place to run it.

No more guessing whether your model fits the box. Kolm scores it against each runtime's model format, memory ceiling, context length, and data boundary, then ranks the targets it fits and names the reason for any it blocks. The ramp below shows each runtime's cost / 1k (dollars per 1,000 calls)"cost / 1k" is the runtime cost to serve 1,000 calls, at the median (p50) speed. On-device runtimes read $0.00 because you run on hardware you already own. Multiply by your monthly call volume to get a monthly figure. and its p50 latency, so you choose on facts.

Model size and quantization checked against each target's ceiling
Context length, tool shape, and data boundary enforced per target
If a path is blocked, you get the reason - never a silent failure

Worked example · 2,000,000 calls / month

Runtime	cost / 1k	x 2,000 (1k blocks)	per month
phone (on-device)	$0.00	2,000	$0
laptop (local)	$0.04	2,000	$80
edge (regional)	$0.21	2,000	$420
server (hosted)	$0.90	2,000	$1,800

2,000,000 calls is 2,000 blocks of 1,000. Cost per month = cost / 1k x 2,000. Figures use the same per-1k numbers shown on the ramp; faster runtimes cost more per call but cut latency.

Best fit at this volume: phone (on-device), $0 per month, because the same 142 MB model already fits hardware you own. Need lower latency for live paths? The laptop runtime holds the knee at $80 per month and 71 ms.

runtime · claims-redactor.kolmfit

The tradeoff

Cheaper or faster - read the curve, then pick.

The same numbers from the ramp, plotted as cost against latency. Moving down and right buys speed for money; moving up and left saves money for speed. The laptop runtime sits at the knee: near-zero cost with latency most workloads can live with.

COST / LATENCY · claims-redactor.kolmlive

BEST FIT laptop holds the knee $0.04 / 1k · 71 ms

laptop holds the knee: almost free per call, fast enough for most paths.

04 · Proof in the box

Every model ships with proof of where it runs.

The deployment decision lives inside the file, not in our dashboard. Open it and see exactly what was compiled, what was tested, why this runtime, and what still blocks release - proof anyone can check, no trust required.

Artifact manifest · recipe, examples, evaluators, tokenizer metadata, policy scope, dependency hashes, and compatible runtimes
Eval gate · regression threshold, failure set, human review state, replay window, fallback rule, and release decision
Target recipe · runtime family, launch shape, package files, required environment, health check, logging path, and incompatible features
Verifier receipt · signature, manifest hash, build identity, who shipped it, timestamp, export destination, and the full audit reference

INSIDE A .KOLM / EXPLODED VIEWlive

FOUR LAYERS model + recipe + evals + receipt Ed25519-sealed

Compile it once. Run it anywhere it fits.

Point your existing calls at Kolm, compile one repeated behavior into a signed model, and see every runtime it can run on - laptop, your private cloud, the edge, or a GPU fleet. No rewrite to get here, no lock-in once you own it.

Get an API key See the platform See pricing Read the docs