A .kolm boots cold on Jetson Orin, Coral TPU, Raspberry Pi 5 + Hailo, IL5 boxes, and any x86 with 8GB+ RAM. Air-gapped by design. Updates ship the way firmware ships: signed, atomically, with a rollback. Latency is bounded; the receipt chain proves what ran.
The current edge stack is held together with shell scripts and hope: containerized inference servers, hand-rolled model loaders, ad-hoc validation. .kolm collapses that to one artifact you serve, sign, ship, and verify like any other piece of firmware.
500MB-class for narrow tasks (intent classifier, OCR-+-extract). 2-3GB for general copilots. Fits inside an OTA delta-update channel.
mmap base + LoRA + recipe pack. Bounded by I/O, not by model warm-up. Streaming begins on the first token.
The runtime never reaches a network. Verification happens against a public-key chain shipped in the artifact. Phone-home is not a feature you have to disable; it isn’t there.
Tested on each platform per release. Larger artifacts may be gated on the smaller targets; the kolm doctor command tells you what fits before you ship.
| Target | Acceleration | Max base size (INT4) | Tok/s band | Status |
|---|---|---|---|---|
| Jetson Orin Nano (8GB) | Ampere GPU + 40 TOPS | 3B | 22-38 | supported |
| Jetson Orin NX / AGX | Ampere GPU + 70-275 TOPS | 7B | 40-95 | supported |
| Raspberry Pi 5 + Hailo-8L | 13 TOPS NPU | 3B (offload) | 14-22 | supported |
| Coral Dev Board Mini | Edge TPU 4 TOPS | 1.5B (recipe-heavy) | 8-14 | narrow tasks |
| Intel NUC + Arc A380 | Arc dGPU INT4 | 7B | 34-52 | supported |
| x86 + 8GB RAM (no GPU) | AVX-512 / AVX2 | 3B | 4-9 | degraded |
| IL5 GovCloud edge box | per-customer | per-customer | per-customer | design partner |
If you’ve ever shipped firmware to a fleet, the operational shape will be familiar. The .kolm ships through your existing OTA pipeline. The runtime verifies signatures before it loads; rollback is automatic; the in-field state matches the build artifact.
Every .kolm carries an HMAC-SHA256 chain over manifest, model, LoRA, recipes, recall index. Tampering breaks the chain; the runtime refuses to load.
The runtime swaps artifacts only after verifying the new chain. If verify fails, it stays on the prior artifact. There is no half-installed state.
kolm runtime rollback <sha> reverts to any previously-installed artifact still on disk. K-score regressions trigger automatic rollback if you opt in.
No CUDA juggling, no Python venv, no container daemon. The runtime is one binary; the artifact is one file; the API is one HTTP call.
# 1. install runtime (one binary, signed) $ curl -fsSL https://kolm.ai/install.sh | sh -s -- --target=jetson-orin ✓ verified signature: AAC9D680 ✓ kolm 1.4.0 installed at /usr/local/bin/kolm # 2. drop the artifact $ scp ops-incident-triage-2.1.0.kolm jetson:/var/lib/kolm/ # 3. serve it (systemd unit auto-generated) $ sudo kolm install ops-incident-triage-2.1.0.kolm ✓ verified chain: 11 segments, all good ✓ K-score on holdout: 91.8 (T 93 / C 89 / L 96) ✓ systemd unit: kolm@ops-incident-triage.service ▸ serving on http://127.0.0.1:8000 ▸ first token in 0.7s, sustained 31 tok/s # 4. watch it without a network $ sudo journalctl -u kolm@ops-incident-triage -f
Edge AI buyers are not looking for a model API. They’re looking for an artifact that cleanly fits the way devices already work: signed, versioned, observable, replaceable.
Manufacturing-floor robots, oilfield inference, ag drones, classified ground stations. Inference must work offline forever; kolm is offline-first as a property.
Industrial loops require p99 SLOs, not p50 averages. Recipe-drafted decoding makes the structured-output paths fast and predictable; the runtime measures and reports.
What model ran, on what input, with what prompt, at what time. Every output carries a receipt any auditor can verify after the fact, offline.
It needs an artifact that ships the way every other piece of edge software ships. That’s what a .kolm is.