Compute & rental . v1.0

Rent compute. Train anywhere.

kolm dispatches training and inference to the box you choose. Local CUDA on your workstation, MPS on a MacBook, Modal for a five-minute serverless burst, Vast for a six-hour H100 rental, or your own SSH host. One CLI surface, one quote-first flow, one receipt that pins the backend it ran on.

The 14 backends

Every backend ships as a uniform adapter: detect(), quote(), run(), teardown(). The CLI verb kolm compute list prints the same shape at the terminal.

Backend	Kind	Train	Infer	Airgap	$ / hr	Cold start	VRAM cap	Auth
`local-cpu`	local	yes	yes	yes	$0.00	0s	n/a	none
`local-cuda`	local	yes	yes	yes	$0.00	0s	own	none
`local-mps`	local	yes	yes	yes	$0.00	0s	shared	none
`local-mlx`	local	yes	yes	yes	$0.00	0s	shared	none
`local-rocm`	local	yes	yes	yes	$0.00	0s	own	none
`local-directml`	local	yes	yes	yes	$0.00	0s	own	none
`modal`	cloud-serverless	yes	yes	no	$2.50	5s	80 GB	`KOLM_MODAL_TOKEN`
`runpod`	cloud-serverless	yes	yes	no	$1.20	60s	80 GB	`KOLM_RUNPOD_TOKEN`
`together`	cloud-managed	yes	yes	no	per-token	0s	n/a	`KOLM_TOGETHER_TOKEN`
`vast`	cloud-marketplace	yes	yes	no	$0.50	60s	80 GB	`KOLM_VAST_TOKEN` + SSH
`lambda`	cloud-marketplace	yes	yes	no	$2.00	90s	80 GB	`KOLM_LAMBDA_TOKEN`
`replicate`	cloud-serverless	yes	yes	no	per-sec	60s	80 GB	`KOLM_REPLICATE_TOKEN`
`fal`	cloud-serverless	no	yes	no	per-call	5s	n/a	`KOLM_FAL_TOKEN`
`remote-ssh`	self-hosted	yes	yes	depends	$0.00	0s	own	SSH key + host

$ / hr is a reference, not a quote. The picker uses it to break ties and to gate the --budget flag. The actual invoice comes from the provider on settlement.

Quote before you spend

Every rentable backend goes through a two-step gate: kolm compute quote first, kolm compute rent --confirm second. The quote step is free, runs no code, and prints duration and cost across every backend so you can compare.

$ kolm compute quote --spec demo-phi-redactor.spec.json

BACKEND          DURATION       COST          BASIS
local-cuda       11s            $0.00         own hardware
local-cpu        15m 42s        $0.00         own hardware
modal            38s            $0.03         $2.50/hr x cold-start + train
runpod           1m 14s         $0.02         $1.20/hr x cold-start + train
vast             1m 16s         $0.01         $0.50/hr x cold-start + train
lambda           1m 47s         $0.06         $2.00/hr x cold-start + train

$ kolm compute rent --spec demo-phi-redactor.spec.json --backend vast --confirm
. quote: $0.01 (1m 16s on vast)
. provisioning: vast bundle 8x A100-80GB ssh ready
. training: 100% K=0.94 . signing receipt: HMAC-SHA256
. teardown: release vast instance
. wrote: ./artifacts/job_phi_redactor_v1.kolm

Budget gate

Pass --budget <usd> to refuse any rent whose quote exceeds the cap. The flag composes with --confirm: the budget gate is checked first, then the confirmation prompt.

$ kolm compute rent --spec long-training.spec.json --backend modal --budget 0.50 --confirm
. quote: $1.83 (estimated 44m on modal)
. refused: estimate $1.83 exceeds budget $0.50
. exit 2

$ kolm compute rent --spec long-training.spec.json --backend vast --budget 0.50 --confirm
. quote: $0.37 (estimated 44m on vast)
. ok: under budget, proceeding

Budget refusal is a hard exit, not a warning. The flag is for unattended jobs, cron, CI, anything that calls kolm without a human at the prompt.

Picker scoring

kolm compute pick ranks every detected backend and writes the winner to the spec. The scoring function:

S = 0.35 . availability
  + 0.20 . cost_inv         . (1 - cost / max_cost)
  + 0.15 . latency_inv      . (1 - cold_start / max_cold)
  + 0.15 . reproducibility  . (1 if airgap or pinned wheel else 0.5)
  + 0.15 . perf_bias        . local-cuda=1, modal=0.9, runpod=0.85, vast=0.8 ...

The perf bias is the column that breaks the tie between a free local-cpu and a paid local-cuda: CPU scores 1 on availability and 1 on cost-inverse, but 0.1 on perf-bias, which is enough to flip the pick to the GPU even when GPU costs energy.

Three rental categories

Not every backend rents the same way. The CLI hides the difference, but the contract differs underneath.

platform-managedSDK lifecycle

modal, runpod, replicate, together, fal. The provider owns the container lifecycle. kolm calls a submit-and-wait API; teardown is implicit when the function returns. Billing is metered by the provider.

kolm-provisionedwe drive it

vast, lambda. kolm posts a bid or a launch request, SSHs into the instance, runs the trainer, then explicitly terminates the rental in a try/finally. The teardown contract is in our code.

not-rentableyour hardware

All six local-* backends plus remote-ssh. There is nothing to provision; rent on these falls back to run. Useful as a uniform CLI surface; no money changes hands.

self-hostedyour SSH

remote-ssh. The trainer ships itself to $KOLM_REMOTE_HOST over SSH, runs there, and pulls the artifact back. Useful for an off-rack workstation, a home lab, or a leased bare-metal box.

Teardown contract

The two backends kolm provisions itself (vast, lambda) follow the same shape:

async function run(spec, opts) {
  const handle = await provision(spec);          // bid + boot
  try {
    await waitForSSH(handle);                    // poll until reachable
    await uploadSpec(handle, spec);
    const result = await runTrainer(handle, spec, opts.on_progress);
    return result;
  } finally {
    await release(handle);                       // DELETE the rental
  }
}

If the trainer crashes, if the network drops, if Ctrl+C is pressed, the finally block runs and the instance is destroyed. There is no path where kolm leaves a paid box running. Smoke at scripts/test-rent-teardown.mjs proves the four crash cases all release.

Receipt provenance

Every artifact carries a compute block in its manifest. The receipt chain signs that block alongside the K-score and the file hashes, so the same artifact can never be claimed to have trained on a different backend after the fact.

{
  "compute": {
    "backend": "vast",
    "device": "a100-80gb",
    "duration_seconds": 76.4,
    "cost_usd": 0.011,
    "provenance": "kolm-provisioned vast bundle 88471203 . sha256 of trainer image: ..."
  }
}

The provenance string includes the bundle id, the docker image hash, and a timestamp. Reproducers can verify the image even after the rental is gone.

CLI

Five verbs cover the surface.

$ kolm compute list
14 backends. local-cpu, local-cuda, local-mps, local-mlx, local-rocm,
local-directml, modal, runpod, together, vast, lambda, replicate, fal,
remote-ssh.

$ kolm compute detect
local-cuda . ok (torch.cuda.is_available)
local-cpu  . ok
modal      . no token (set KOLM_MODAL_TOKEN to enable)
vast       . no token (set KOLM_VAST_TOKEN to enable)

$ kolm compute pick --spec my.spec.json
. local-cuda wins (S=0.92)

$ kolm compute quote --spec my.spec.json
. table above

$ kolm compute rent --spec my.spec.json --backend vast --budget 0.50 --confirm
. provisioning, training, teardown

Full verb tables at /docs. The decision matrix that picks defaults is at /spec under "compute-dispatch".

Measured throughput

Most kolm artifacts are pattern-match recipes that run as compiled JS in V8. The bench script at scripts/bench-tps.mjs exercises the recipe directly, no model load. On a Windows x64 box (16 logical CPUs, node v24.14.0) against examples/demo-phi-redactor.spec.json:

$ node scripts/bench-tps.mjs --spec examples/demo-phi-redactor.spec.json --n 5000 --warmup 200
{
  "mode": "pattern",
  "job_id": "job_phi_redactor_v1",
  "n": 5000,
  "latency": {
    "min_us":  0,
    "p50_us":  1,
    "p95_us":  1,
    "p99_us":  2,
    "max_us":  258,
    "mean_us": 1,
    "calls_per_sec": 1165990
  },
  "note": "Pattern-match recipes run as compiled JS in V8.
           Numbers are end-to-end (input parse + match + output)."
}

~1.17M calls/sec, p50 of 1µs, p99 of 2µs for the four-pattern PHI redactor. Those are measured numbers from one box; expect variance with V8 JIT warm-up, recipe complexity, and host load. Generative artifacts (those that bundle a LoRA pack and call vLLM or llama.cpp) trade microseconds for tokens-per-second on the same axis. Run node scripts/bench-tps.mjs --generative --url http://localhost:8765 --n 32 against a serve endpoint to get that path; numbers depend on the device the serve loop ran on, which the receipt pins.

Both modes write a JSON record suitable for committing to docs/bench/ and publishing on this page. The raw record from the run above lives at docs/bench/tps-phi-redactor.json.

Why renting

Your laptop is enough for a redactor. A four-pattern PHI redactor takes 11 seconds on local-cuda and 16 minutes on local-cpu. Both produce the same artifact with the same K-score.
Your laptop is not enough for a 14B distill. Train on Vast for $0.50/hr; the artifact runs on the laptop afterward. The rental is for compile-time, not serve-time.
Your buyer might require airgap. Keep local-cpu, local-cuda, local-mps, local-mlx, local-rocm, local-directml, or remote-ssh with airgap enabled and the rental backends fail closed.
Your CI runs on a free runner. Quote-only mode (kolm compute quote) costs nothing and writes a deterministic table that CI can diff against a budget file.

What we are not promising

Three things, explicit.

We do not resell compute. Your provider token bills your provider account. kolm sits in front of the SDK; it does not proxy the API call, hold the funds, or take a margin.

Quotes are estimates. Cold-start, network, and a 2x slack factor are folded in; the real invoice can drift. The picker uses quotes as ordering, not as billing.

Backends churn. Provider APIs ship breaking changes; we re-pin the registry on every adapter version bump. The version in the top-left corner of kolm compute list is the registry version, not the kolm version.