Air-gapped deployment: when the network is the threat model

What air-gap means here
The offline switches
Pre-flight: prime the cache
Verifying integrity offline
TEE attestation in an air-gap
Where this lands in practice
Honest limits

What air-gap means here.

Air-gap is an architectural property, not a checkbox. The receiving network does not route packets to the public internet, and the rules that enforce that are part of the regulated boundary the system is sitting inside. A vendor binary that calls home, a runtime that downloads a missing tokenizer on first run, a verifier that wants to hit a certificate authority for a freshness check, every one of those is a violation, every one of those fails closed inside the boundary, and every one of those is a recurring incident the operator has to write a memo about.

The kolm posture for this case is the inverse of the cloud-runtime default. The compile step is allowed to fetch base weights, build the recall index, and write the artifact. The deploy step writes one file to disk and assumes nothing about the network it lands on. The runtime opens that file, verifies the receipt chain against an HMAC key that is in operator possession, loads the base into the in-process accelerator, and answers requests. No DNS lookup. No HTTPS handshake. No background heartbeat. If the box has no network interface configured, the artifact still runs.

The cost of getting there is a pre-flight discipline. Anything the runtime might want at first boot has to be on local disk before the deploy machine is moved inside the boundary. The kolm airgap verbs exist to make that discipline a workflow, not an audit checklist.

The offline switches.

Three environment variables, one config file, and one CLI verb. The switches are designed so that an air-gap-violating call fails loudly at startup, not silently on the seventh user request.

# the three switches kolm respects at process start
$ export KOLM_AIRGAP=1
$ export HF_HUB_OFFLINE=1
$ export TRANSFORMERS_OFFLINE=1

# or write them once to ~/.kolm/airgap.env
$ kolm airgap enable
[airgap]     wrote ~/.kolm/airgap.env
[airgap]     KOLM_AIRGAP=1
[airgap]     HF_HUB_OFFLINE=1
[airgap]     TRANSFORMERS_OFFLINE=1
[airgap]     subsequent kolm invocations will source this file

KOLM_AIRGAP=1 is the kolm-side switch. It disables the registry sync (kolm registry pull exits non-zero with a message naming the variable), it forbids the compute layer from picking a cloud backend even if a token is set, and it forbids the runtime from issuing any outbound request. The two HuggingFace switches are upstream library flags that have the same effect on the Transformers stack: any code path that would normally call hf_hub_download raises a local-cache miss instead.

The disable path is symmetric. kolm airgap disable removes the env file and unsets the variables in the current shell. The status verb shows where each switch lives:

$ kolm airgap status
airgap mode:         enabled
env file:            ~/.kolm/airgap.env
KOLM_AIRGAP:         1   (set)
HF_HUB_OFFLINE:      1   (set)
TRANSFORMERS_OFFLINE: 1   (set)
on-disk artifacts:   4 in ~/.kolm/artifacts
hf cache:            ~/.cache/huggingface (3 base models, 18.4 GB)

The status output is the first thing an operator checks before moving a workstation into the regulated network. If any of the three switches says (unset), the pre-flight is incomplete.

Pre-flight: prime the cache.

The hardest mistake to recover from is a runtime that gets to the bounded network and then realizes it needs a tokenizer file. The pre-flight sequence is the same five steps in every environment we have shipped into.

Step one: list every weight the artifact will reach for. The artifact manifest names a base model, an adapter, and a recall index. The manifest does not embed the base weights; it points at a HuggingFace repo and a revision SHA. The kolm inspect verb produces a flat list of every base the runtime will load.

$ kolm inspect support_ticket_router-1.0.0.kolm
base:     Qwen/Qwen2.5-3B-Instruct  rev=5d6f9...   approx 1.84 GB int4
adapter:  lora rank 16 on q_proj, v_proj, o_proj
recall:   bge-small-en-v1.5  rev=4d0a2...   approx 0.13 GB fp16
eval:     embedded (no external dependency)
receipt:  chain of 5 HMAC-SHA256 steps

Step two: pre-download each base into the HuggingFace cache. A regular huggingface-cli download call against each repo, with the revision pinned to the SHA above. The cache directory (~/.cache/huggingface by default) is now a copyable artifact. The recall encoder is included; long-context tasks pull a second encoder, the same rule applies.

Step three: copy the cache plus the artifact to removable media. The exact media is regulator-defined: write-once optical, a hardware-attested USB device, an offline relay server. The two things that have to move are the artifact file and the HuggingFace cache directory. Everything else (Python, the kolm CLI, the runtime binary) is part of the base image of the bounded workstation and follows that image's own provenance chain.

Step four: import on the bounded side. The HuggingFace cache is unpacked into ~/.cache/huggingface on the destination. The artifact is unpacked into ~/.kolm/artifacts. kolm airgap enable writes the env file. kolm run opens a server.

Step five: smoke-test offline. Run the embedded eval pack against the loaded artifact. The K-score should match the score recorded at compile time, byte for byte. A mismatch here is the integrity signal: either the base weights drifted (impossible if you pinned the revision SHA), or the artifact bytes are not the bytes you compiled.

$ kolm airgap verify ~/.kolm/artifacts/support_ticket_router-1.0.0.kolm
[1/5]     manifest schema                  ok
[2/5]     base + adapter present in cache  ok (1.84 GB + 0.04 GB)
[3/5]     HMAC receipt chain (5 steps)      ok
[4/5]     CID over canonical-JSON           ok (cidv1:sha256:7a3b...)
[5/5]     embedded eval pack                K=0.917 (gate 0.85)
PASS     artifact verifies entirely offline

Verifying integrity offline.

The receipt chain, the content identifier, and the embedded eval pack are the three things an air-gapped operator can check without any network. None of them require a certificate authority, a clock-skew oracle, or a remote attestation server.

The HMAC chain. Five steps over canonical JSON, keyed by an HMAC-SHA256 secret that the operator possesses. The chain binds: the task description, the base model SHA, the adapter weights hash, the embedded eval pack, and the final artifact manifest. Each step's output is the next step's salt. The verifier walks the chain start to finish; one byte off the canonical JSON serialization fails the step.

The content identifier. A cidv1:sha256:<64-hex> string over the canonical-JSON form of the manifest's hash table. The CID is embedded in three places: the manifest itself, the receipt, and the audit log. They all have to agree, and they all have to match a recomputation against the bytes on disk. The CID is also the offline key the operator uses to refer to the artifact in incident logs (it's stable across moves, since it's content-derived).

The embedded eval pack. A small set of prompt-and-expected-output pairs that the compile step locked in. The runtime can replay the pack against the loaded artifact and produce the same K-score the compile produced, deterministically. A K-score that has drifted is a strong signal that the artifact has been substituted or corrupted on the way through the airlock.

The point is not that an air-gapped operator trusts the kolm runtime. The point is that the verification path is small enough to audit, deterministic enough to re-run on demand, and self-contained enough to do without any network at all.

The Rust verifier crate at packages/runtime-rs is built with forbid(unsafe_code), pure-Rust dependencies, and no network surface. It compiles to a static binary that can verify any .kolm file. An auditor working under a regulator who does not trust the JavaScript runtime can build the Rust binary from source on their own infrastructure and use it as the second pair of eyes.

TEE attestation in an air-gap.

The TEE attestation story changes shape inside an air-gap. A normal cloud deploy validates an attestation document by fetching the vendor's root certificate, then verifying the COSE_Sign1 envelope (Nitro), the SEV-SNP report, the TDX quote, or the Azure MAA JWT against that root. An air-gapped operator cannot fetch the root, because the root lives at an AWS, GCP, Azure, or Intel endpoint.

The pragmatic answer is trust-on-first-use. The first attestation document the operator sees, on the bounded side, gets pinned. Subsequent boots produce documents that have to match the pinned record on the fields that should be stable: the PCR0 measurement (Nitro), the launch digest (SEV-SNP), the MRTD (TDX), the image hash (docker). A mismatch is flagged. The operator decides whether to roll forward (an authorized image update) or to halt (a substitution).

TOFU is not cryptographic certainty. It is pragmatic enough for the cases that air-gap is the right answer for: the bounded environment is already trusted under a different controls regime, and the attestation is one more verifiable signal, not a sole control. The audit log records every pinning event, every mismatch, and every operator override. The regulator's review is over those records, not over the attestation root.

Where this lands in practice.

Four sectors where the air-gapped path is the deployment, not an edge case.

Defense. The classified enclave is the bounded network. The artifact lands on a workstation cleared for the same classification level. The model can answer questions about classified material because the material never leaves the enclave. The compile happens on the unclassified side from de-identified examples; the artifact's K-score on the embedded eval pack is what the program office reviews.

Healthcare under HIPAA. The regulated boundary is the hospital's clinical network. A compiled note-summarizer runs against the EHR's PHI inside that boundary; no outbound call carries text, no inbound call delivers a model update. The Business Associate Agreement work disappears because there is no third party in the data path. The compile uses de-identified prior notes; the K-score gate is what the privacy officer signs against.

Industrial control. Plant-floor networks for utilities, refining, manufacturing. The bounded side is the OT network; the compile happens on the IT side. A vibration-anomaly classifier or a maintenance-log triage artifact lands on a workstation that already lives behind the OT firewall. The artifact is static; the model never proposes an action it was not trained to propose, because there is no live channel to a frontier API.

Regulated finance. A market-making firm's trading network. Surveillance, supervision, and regulator-bounded analytics live on the inside. A compiled artifact for a specific surveillance task answers questions on the bounded side; the receipt chain is what the firm produces under an SR 11-7 model-risk review.

Honest limits.

Three things an air-gap cannot give you.

Model updates require manual sync. A static artifact is static. The same compile job, run six months later against a refreshed corpus, produces a new artifact with a new CID and a new receipt. The pre-flight sequence above runs again. There is no auto-update path that survives the airlock; that is by construction. The operational cadence is whatever the regulated environment's change-management process supports: weekly, monthly, quarterly. The artifact format is small enough that a quarterly refresh is reasonable, but it is a human-driven cycle.

The capture-distill loop is broken at runtime. kolm's normal lifecycle is: capture frontier traffic, distill it into a recipe pack, recompile. None of that runs on the bounded side, because by definition nothing leaves. Air-gapped artifacts are read-only at deploy time. The corpus refresh, if there is one, happens on a separate ingestion path that the regulator has independently approved (typically a one-way diode or a periodic export from the bounded environment back to the unclassified side, both subject to the same controls regime that the rest of the data follows).

Software-update channel still exists. The kolm runtime itself, the Python interpreter, the operating system, the CUDA stack, all need patching against CVEs. The air-gap does not abolish those updates; it pushes them through a signed sneakernet pattern. Most regulated environments already have this discipline (CVE feeds, vendor advisories, mirrored package repositories on the bounded side). The kolm-specific addition is that artifact updates ride the same channel as runtime updates. A signed artifact bundle is one more tarball on the patching mirror; the operator signs receipts on each install.

The point of the static-artifact model is that the failure modes are visible. A compiled .kolm that is six months stale is observably six months stale; the manifest date and the receipt date both say so. An air-gapped operator who has a stale artifact knows. An operator running a frontier API against the same task has no comparable visibility, which is why the bounded environment does not allow it in the first place.

Air-gapped deployment: when the network is the threat model.

Contents