use cases / UC-04 · embedded & edge

The closest thing AI has to a signed firmware image.

A .kolm boots cold on Jetson Orin, Coral TPU, Raspberry Pi 5 + Hailo, IL5 boxes, and any x86 with 8GB+ RAM. Air-gapped by design. Updates ship the way firmware ships: signed, atomically, with a rollback. Latency is bounded; the receipt chain proves what ran.

Start Enterprise → Security & supply chain

01 · the operational shape

Edge AI as a single signed file.

The current edge stack is held together with shell scripts and hope: containerized inference servers, hand-rolled model loaders, ad-hoc validation. .kolm collapses that to one artifact you serve, sign, ship, and verify like any other piece of firmware.

Artifact size band

50MB- 3GB

500MB-class for narrow tasks (intent classifier, OCR-+-extract). 2-3GB for general copilots. Fits inside an OTA delta-update channel.

Cold-boot latency

0.4- 2.0s

mmap base + LoRA + recipe pack. Bounded by I/O, not by model warm-up. Streaming begins on the first token.

Air-gap default

100% offline

The runtime never reaches a network. Verification happens against a public-key chain shipped in the artifact. Phone-home is not a feature you have to disable; it isn’t there.

02 · supported targets

Where .kolm runs at the edge.

Tested on each platform per release. Larger artifacts may be gated on the smaller targets; the kolm doctor command tells you what fits before you ship.

Target	Acceleration	Max base size (INT4)	Tok/s band	Status
Jetson Orin Nano (8GB)	Ampere GPU + 40 TOPS	3B	22-38	supported
Jetson Orin NX / AGX	Ampere GPU + 70-275 TOPS	7B	40-95	supported
Raspberry Pi 5 + Hailo-8L	13 TOPS NPU	3B (offload)	14-22	supported
Coral Dev Board Mini	Edge TPU 4 TOPS	1.5B (recipe-heavy)	8-14	narrow tasks
Intel NUC + Arc A380	Arc dGPU INT4	7B	34-52	supported
x86 + 8GB RAM (no GPU)	AVX-512 / AVX2	3B	4-9	degraded
IL5 GovCloud edge box	per-customer	per-customer	per-customer	design partner

03 · the deployment model

Like firmware. Not like a container.

If you’ve ever shipped firmware to a fleet, the operational shape will be familiar. The .kolm ships through your existing OTA pipeline. The runtime verifies signatures before it loads; rollback is automatic; the in-field state matches the build artifact.

Signed at compile time.

Every .kolm carries an HMAC-SHA256 chain over manifest, model, LoRA, recipes, recall index. Tampering breaks the chain; the runtime refuses to load.

Atomic upgrade.

The runtime swaps artifacts only after verifying the new chain. If verify fails, it stays on the prior artifact. There is no half-installed state.

Rollback by version.

kolm runtime rollback <sha> reverts to any previously-installed artifact still on disk. K-score regressions trigger automatic rollback if you opt in.

04 · one-line install on Jetson

From cold OS image to first inference: 90 seconds.

No CUDA juggling, no Python venv, no container daemon. The runtime is one binary; the artifact is one file; the API is one HTTP call.

jetson-orin-nano ~

# 1. install runtime (one binary, signed)
$ curl -fsSL https://kolm.ai/install.sh | sh -s -- --target=jetson-orin
✓ verified signature: AAC9D680
✓ kolm 1.4.0 installed at /usr/local/bin/kolm

# 2. drop the artifact
$ scp ops-incident-triage-2.1.0.kolm jetson:/var/lib/kolm/

# 3. serve it (systemd unit auto-generated)
$ sudo kolm install ops-incident-triage-2.1.0.kolm
✓ verified chain: 11 segments, all good
✓ K-score on holdout: 91.8 (T 93 / C 89 / L 96)
✓ systemd unit: kolm@ops-incident-triage.service
▸ serving on http://127.0.0.1:8000
▸ first token in 0.7s, sustained 31 tok/s

# 4. watch it without a network
$ sudo journalctl -u kolm@ops-incident-triage -f

05 · what edge teams actually buy this for

The four problems .kolm solves on day one.

Edge AI buyers are not looking for a model API. They’re looking for an artifact that cleanly fits the way devices already work: signed, versioned, observable, replaceable.

No reachable network.

Manufacturing-floor robots, oilfield inference, ag drones, classified ground stations. Inference must work offline forever; kolm is offline-first as a property.

Bounded latency.

Industrial loops require p99 SLOs, not p50 averages. Recipe-drafted decoding makes the structured-output paths fast and predictable; the runtime measures and reports.

Auditable provenance.

What model ran, on what input, with what prompt, at what time. Every output carries a receipt any auditor can verify after the fact, offline.

The edge does not need another inference server.

It needs an artifact that ships the way every other piece of edge software ships. That’s what a .kolm is.

Start Enterprise → Security & supply chain