One artifact for every edge box.
Edge AI usually means three runtimes (CoreML, ExecuTorch, TFLite), three quantization passes, and a six-month integration before the first device ships. kolm collapses the stack: compile once on your gold examples, ship the same signed file to ARM, x86, or RISC-V hardware. Runtime is offline, deterministic, and verifiable. The .kolm doesn't care whether you're on a NVIDIA Jetson, a Raspberry Pi, or a fleet of Linux x86 kiosks.
Where this shape wins.
Each is a single .kolm. Compile once, distribute via your existing OTA pipeline, run with zero network dependency.
| industrial fault triage | Take a vibration / thermal / current sensor stream, label the operator's recommended next action. Compile from 60–90 days of plant-floor sensor + maintenance logs. |
| retail kiosk routing | Take a customer voice prompt or text input, route to the right knowledge base or staff escalation. Compile from your existing call-center transcripts. |
| in-vehicle assistant | Take cabin voice, return navigation / climate / media intents bound to your HMI's API surface. Compile from logged dialogues + the OEM's intent schema. |
Compile for a target class.
$ kolm compile fault-triage \
--base qwen2.5-3b-instruct \
--quantize int4 \
--target-ram 4G \
--recipe-pack-depth 64 \
--examples ./plant-logs.jsonl
K-score 0.79 ok
size_bytes 1.84 GB (fits ARM Cortex-A78 4GB SKU)
p50_latency 312us (target: 800us on Jetson Orin Nano)
Same .kolm, three architectures.
# Jetson Orin Nano (ARM, CUDA) $ kolm run fault-triage.kolm --in /dev/sensors/vib0 runtime: arm64-cuda, p50 287us, 0 network calls # Raspberry Pi 5 (ARM, CPU) $ kolm run fault-triage.kolm --in /dev/sensors/vib0 runtime: arm64-cpu, p50 4.2ms, 0 network calls # Industrial x86 mini-PC (Intel N100) $ kolm run fault-triage.kolm --in /dev/sensors/vib0 runtime: x86_64-cpu, p50 3.8ms, 0 network calls
No retargeting. No re-quantization. The compiler chose the right primitives at compile time; the runtime adapts to the host.
Fits your existing OTA pipeline.
A .kolm is a regular zip with a sha256. Push it through whatever you already use (Mender, Balena, Azure IoT Hub, AWS IoT, your own apt repo). The runtime caches by hash and verifies the signature on every cold load.
| signed at compile | HMAC chain anchors to your team registry. A field device only loads artifacts whose anchor matches the deployment's expected anchor. |
| delta-friendly | Recipes and LoRA changes ship as a small delta against the base model pointer. No need to push 2GB on every policy update. |
| rollback in seconds | Old .kolm stays in the cache. kolm pin v2026-04-12 reverts the active artifact instantly. |
| offline-first | The runtime never phones home. Receipts are local; you mirror them upstream on your own schedule. |
What we say. What we don't.
| same artifact across architectures | Yes. The compiler emits target-class metadata; runtime picks the right path. No retargeting required. |
| deterministic at run time | Yes for recipe-mode tasks. LoRA-mode tasks include a deterministic seed in the manifest; same input + same artifact = same output. |
| tiny-target SKUs (<1GB RAM) | Recipe-only mode on the Pro tier. Default compile targets 2GB RAM minimum; sub-1GB SKUs use the recipe-only flag and skip the LoRA layer. |
| real-time guarantees | No. p50 latency is reported; p99 / hard real-time is your integrator's responsibility, not the artifact's. |