Rent compute. Train anywhere.
kolm dispatches training and inference to the box you choose. Local CUDA on your workstation, MPS on a MacBook, Modal for a five-minute serverless burst, Vast for a six-hour H100 rental, or your own SSH host. One CLI surface, one quote-first flow, one receipt that pins the backend it ran on.
The 14 backends
Every backend ships as a uniform adapter: detect(), quote(), run(), teardown(). The CLI verb kolm compute list prints the same shape at the terminal.
| Backend | Kind | Train | Infer | Airgap | $ / hr | Cold start | VRAM cap | Auth |
|---|---|---|---|---|---|---|---|---|
local-cpu | local | yes | yes | yes | $0.00 | 0s | n/a | none |
local-cuda | local | yes | yes | yes | $0.00 | 0s | own | none |
local-mps | local | yes | yes | yes | $0.00 | 0s | shared | none |
local-mlx | local | yes | yes | yes | $0.00 | 0s | shared | none |
local-rocm | local | yes | yes | yes | $0.00 | 0s | own | none |
local-directml | local | yes | yes | yes | $0.00 | 0s | own | none |
modal | cloud-serverless | yes | yes | no | $2.50 | 5s | 80 GB | KOLM_MODAL_TOKEN |
runpod | cloud-serverless | yes | yes | no | $1.20 | 60s | 80 GB | KOLM_RUNPOD_TOKEN |
together | cloud-managed | yes | yes | no | per-token | 0s | n/a | KOLM_TOGETHER_TOKEN |
vast | cloud-marketplace | yes | yes | no | $0.50 | 60s | 80 GB | KOLM_VAST_TOKEN + SSH |
lambda | cloud-marketplace | yes | yes | no | $2.00 | 90s | 80 GB | KOLM_LAMBDA_TOKEN |
replicate | cloud-serverless | yes | yes | no | per-sec | 60s | 80 GB | KOLM_REPLICATE_TOKEN |
fal | cloud-serverless | no | yes | no | per-call | 5s | n/a | KOLM_FAL_TOKEN |
remote-ssh | self-hosted | yes | yes | depends | $0.00 | 0s | own | SSH key + host |
$ / hr is a reference, not a quote. The picker uses it to break ties and to gate the --budget flag. The actual invoice comes from the provider on settlement.
Quote before you spend
Every rentable backend goes through a two-step gate: kolm compute quote first, kolm compute rent --confirm second. The quote step is free, runs no code, and prints duration and cost across every backend so you can compare.
$ kolm compute quote --spec demo-phi-redactor.spec.json BACKEND DURATION COST BASIS local-cuda 11s $0.00 own hardware local-cpu 15m 42s $0.00 own hardware modal 38s $0.03 $2.50/hr x cold-start + train runpod 1m 14s $0.02 $1.20/hr x cold-start + train vast 1m 16s $0.01 $0.50/hr x cold-start + train lambda 1m 47s $0.06 $2.00/hr x cold-start + train $ kolm compute rent --spec demo-phi-redactor.spec.json --backend vast --confirm . quote: $0.01 (1m 16s on vast) . provisioning: vast bundle 8x A100-80GB ssh ready . training: 100% K=0.94 . signing receipt: HMAC-SHA256 . teardown: release vast instance . wrote: ./artifacts/job_phi_redactor_v1.kolm
Budget gate
Pass --budget <usd> to refuse any rent whose quote exceeds the cap. The flag composes with --confirm: the budget gate is checked first, then the confirmation prompt.
$ kolm compute rent --spec long-training.spec.json --backend modal --budget 0.50 --confirm . quote: $1.83 (estimated 44m on modal) . refused: estimate $1.83 exceeds budget $0.50 . exit 2 $ kolm compute rent --spec long-training.spec.json --backend vast --budget 0.50 --confirm . quote: $0.37 (estimated 44m on vast) . ok: under budget, proceeding
Budget refusal is a hard exit, not a warning. The flag is for unattended jobs, cron, CI, anything that calls kolm without a human at the prompt.
Picker scoring
kolm compute pick ranks every detected backend and writes the winner to the spec. The scoring function:
S = 0.35 . availability + 0.20 . cost_inv . (1 - cost / max_cost) + 0.15 . latency_inv . (1 - cold_start / max_cold) + 0.15 . reproducibility . (1 if airgap or pinned wheel else 0.5) + 0.15 . perf_bias . local-cuda=1, modal=0.9, runpod=0.85, vast=0.8 ...
The perf bias is the column that breaks the tie between a free local-cpu and a paid local-cuda: CPU scores 1 on availability and 1 on cost-inverse, but 0.1 on perf-bias, which is enough to flip the pick to the GPU even when GPU costs energy.
Three rental categories
Not every backend rents the same way. The CLI hides the difference, but the contract differs underneath.
modal, runpod, replicate, together, fal. The provider owns the container lifecycle. kolm calls a submit-and-wait API; teardown is implicit when the function returns. Billing is metered by the provider.
vast, lambda. kolm posts a bid or a launch request, SSHs into the instance, runs the trainer, then explicitly terminates the rental in a try/finally. The teardown contract is in our code.
All six local-* backends plus remote-ssh. There is nothing to provision; rent on these falls back to run. Useful as a uniform CLI surface; no money changes hands.
remote-ssh. The trainer ships itself to $KOLM_REMOTE_HOST over SSH, runs there, and pulls the artifact back. Useful for an off-rack workstation, a home lab, or a leased bare-metal box.
Teardown contract
The two backends kolm provisions itself (vast, lambda) follow the same shape:
async function run(spec, opts) {
const handle = await provision(spec); // bid + boot
try {
await waitForSSH(handle); // poll until reachable
await uploadSpec(handle, spec);
const result = await runTrainer(handle, spec, opts.on_progress);
return result;
} finally {
await release(handle); // DELETE the rental
}
}
If the trainer crashes, if the network drops, if Ctrl+C is pressed, the finally block runs and the instance is destroyed. There is no path where kolm leaves a paid box running. Smoke at scripts/test-rent-teardown.mjs proves the four crash cases all release.
Receipt provenance
Every artifact carries a compute block in its manifest. The receipt chain signs that block alongside the K-score and the file hashes, so the same artifact can never be claimed to have trained on a different backend after the fact.
{
"compute": {
"backend": "vast",
"device": "a100-80gb",
"duration_seconds": 76.4,
"cost_usd": 0.011,
"provenance": "kolm-provisioned vast bundle 88471203 . sha256 of trainer image: ..."
}
}
The provenance string includes the bundle id, the docker image hash, and a timestamp. Reproducers can verify the image even after the rental is gone.
CLI
Five verbs cover the surface.
$ kolm compute list 14 backends. local-cpu, local-cuda, local-mps, local-mlx, local-rocm, local-directml, modal, runpod, together, vast, lambda, replicate, fal, remote-ssh. $ kolm compute detect local-cuda . ok (torch.cuda.is_available) local-cpu . ok modal . no token (set KOLM_MODAL_TOKEN to enable) vast . no token (set KOLM_VAST_TOKEN to enable) $ kolm compute pick --spec my.spec.json . local-cuda wins (S=0.92) $ kolm compute quote --spec my.spec.json . table above $ kolm compute rent --spec my.spec.json --backend vast --budget 0.50 --confirm . provisioning, training, teardown
Full verb tables at /docs. The decision matrix that picks defaults is at /spec under "compute-dispatch".
Measured throughput
Most kolm artifacts are pattern-match recipes that run as compiled JS in V8. The bench script at scripts/bench-tps.mjs exercises the recipe directly, no model load. On a Windows x64 box (16 logical CPUs, node v24.14.0) against examples/demo-phi-redactor.spec.json:
$ node scripts/bench-tps.mjs --spec examples/demo-phi-redactor.spec.json --n 5000 --warmup 200
{
"mode": "pattern",
"job_id": "job_phi_redactor_v1",
"n": 5000,
"latency": {
"min_us": 0,
"p50_us": 1,
"p95_us": 1,
"p99_us": 2,
"max_us": 258,
"mean_us": 1,
"calls_per_sec": 1165990
},
"note": "Pattern-match recipes run as compiled JS in V8.
Numbers are end-to-end (input parse + match + output)."
}
~1.17M calls/sec, p50 of 1µs, p99 of 2µs for the four-pattern PHI redactor. Those are measured numbers from one box; expect variance with V8 JIT warm-up, recipe complexity, and host load. Generative artifacts (those that bundle a LoRA pack and call vLLM or llama.cpp) trade microseconds for tokens-per-second on the same axis. Run node scripts/bench-tps.mjs --generative --url http://localhost:8765 --n 32 against a serve endpoint to get that path; numbers depend on the device the serve loop ran on, which the receipt pins.
Both modes write a JSON record suitable for committing to docs/bench/ and publishing on this page. The raw record from the run above lives at docs/bench/tps-phi-redactor.json.
Why renting
- Your laptop is enough for a redactor. A four-pattern PHI redactor takes 11 seconds on local-cuda and 16 minutes on local-cpu. Both produce the same artifact with the same K-score.
- Your laptop is not enough for a 14B distill. Train on Vast for $0.50/hr; the artifact runs on the laptop afterward. The rental is for compile-time, not serve-time.
- Your buyer might require airgap. Keep
local-cpu,local-cuda,local-mps,local-mlx,local-rocm,local-directml, orremote-sshwith airgap enabled and the rental backends fail closed. - Your CI runs on a free runner. Quote-only mode (
kolm compute quote) costs nothing and writes a deterministic table that CI can diff against a budget file.
What we are not promising
Three things, explicit.
We do not resell compute. Your provider token bills your provider account. kolm sits in front of the SDK; it does not proxy the API call, hold the funds, or take a margin.
Quotes are estimates. Cold-start, network, and a 2x slack factor are folded in; the real invoice can drift. The picker uses quotes as ordering, not as billing.
Backends churn. Provider APIs ship breaking changes; we re-pin the registry on every adapter version bump. The version in the top-left corner of kolm compute list is the registry version, not the kolm version.