use cases / UC-04 · embedded & edge

The closest thing AI has to a signed firmware image.

A .kolm boots cold on Jetson Orin, Coral TPU, Raspberry Pi 5 + Hailo, IL5 boxes, and any x86 with 8GB+ RAM. Air-gapped by design. Updates ship the way firmware ships: signed, atomically, with a rollback. Latency is bounded; the receipt chain proves what ran.

01 · the operational shape

Edge AI as a single signed file.

The current edge stack is held together with shell scripts and hope: containerized inference servers, hand-rolled model loaders, ad-hoc validation. .kolm collapses that to one artifact you serve, sign, ship, and verify like any other piece of firmware.

Artifact size band
50MB- 3GB

500MB-class for narrow tasks (intent classifier, OCR-+-extract). 2-3GB for general copilots. Fits inside an OTA delta-update channel.

Cold-boot latency
0.4- 2.0s

mmap base + LoRA + recipe pack. Bounded by I/O, not by model warm-up. Streaming begins on the first token.

Air-gap default
100% offline

The runtime never reaches a network. Verification happens against a public-key chain shipped in the artifact. Phone-home is not a feature you have to disable; it isn’t there.

02 · supported targets

Where .kolm runs at the edge.

Tested on each platform per release. Larger artifacts may be gated on the smaller targets; the kolm doctor command tells you what fits before you ship.

TargetAccelerationMax base size (INT4)Tok/s bandStatus
Jetson Orin Nano (8GB)Ampere GPU + 40 TOPS3B22-38supported
Jetson Orin NX / AGXAmpere GPU + 70-275 TOPS7B40-95supported
Raspberry Pi 5 + Hailo-8L13 TOPS NPU3B (offload)14-22supported
Coral Dev Board MiniEdge TPU 4 TOPS1.5B (recipe-heavy)8-14narrow tasks
Intel NUC + Arc A380Arc dGPU INT47B34-52supported
x86 + 8GB RAM (no GPU)AVX-512 / AVX23B4-9degraded
IL5 GovCloud edge boxper-customerper-customerper-customerdesign partner
03 · the deployment model

Like firmware. Not like a container.

If you’ve ever shipped firmware to a fleet, the operational shape will be familiar. The .kolm ships through your existing OTA pipeline. The runtime verifies signatures before it loads; rollback is automatic; the in-field state matches the build artifact.

SI

Signed at compile time.

Every .kolm carries an HMAC-SHA256 chain over manifest, model, LoRA, recipes, recall index. Tampering breaks the chain; the runtime refuses to load.

AT

Atomic upgrade.

The runtime swaps artifacts only after verifying the new chain. If verify fails, it stays on the prior artifact. There is no half-installed state.

RB

Rollback by version.

kolm runtime rollback <sha> reverts to any previously-installed artifact still on disk. K-score regressions trigger automatic rollback if you opt in.

04 · one-line install on Jetson

From cold OS image to first inference: 90 seconds.

No CUDA juggling, no Python venv, no container daemon. The runtime is one binary; the artifact is one file; the API is one HTTP call.

jetson-orin-nano ~
# 1. install runtime (one binary, signed)
$ curl -fsSL https://kolm.ai/install.sh | sh -s -- --target=jetson-orin
 verified signature: AAC9D680
 kolm 1.4.0 installed at /usr/local/bin/kolm

# 2. drop the artifact
$ scp ops-incident-triage-2.1.0.kolm jetson:/var/lib/kolm/

# 3. serve it (systemd unit auto-generated)
$ sudo kolm install ops-incident-triage-2.1.0.kolm
 verified chain: 11 segments, all good
 K-score on holdout: 91.8 (T 93 / C 89 / L 96)
 systemd unit: kolm@ops-incident-triage.service
 serving on http://127.0.0.1:8000
 first token in 0.7s, sustained 31 tok/s

# 4. watch it without a network
$ sudo journalctl -u kolm@ops-incident-triage -f
05 · what edge teams actually buy this for

The four problems .kolm solves on day one.

Edge AI buyers are not looking for a model API. They’re looking for an artifact that cleanly fits the way devices already work: signed, versioned, observable, replaceable.

1

No reachable network.

Manufacturing-floor robots, oilfield inference, ag drones, classified ground stations. Inference must work offline forever; kolm is offline-first as a property.

2

Bounded latency.

Industrial loops require p99 SLOs, not p50 averages. Recipe-drafted decoding makes the structured-output paths fast and predictable; the runtime measures and reports.

3

Auditable provenance.

What model ran, on what input, with what prompt, at what time. Every output carries a receipt any auditor can verify after the fact, offline.

The edge does not need another inference server.

It needs an artifact that ships the way every other piece of edge software ships. That’s what a .kolm is.