Engineering essays

Articles, written for builders.

Five-to-twelve minute reads on what the AI compiler does, why it does it, and how the pieces fit together. Updated as the product ships.

0111 min

How to compile GPT-5 into a 4GB local model that runs offline.

The missing build step between a frontier API and a local artifact. What the compiler does, why the unit economics flip, and how to compile your first .kolm in five minutes.

029 min

K-sample verified inference: a practical alternative to zk-ML.

Zero-knowledge proofs of inference are technically beautiful and economically dead. Verified inference takes a different route: sample k times, score deterministically, sign the receipt. The mechanism behind every label inside a .kolm.

0312 min

HIPAA-safe AI on a laptop: PHI never leaves the device.

Every BAA you sign with a frontier model vendor is a ticking compliance bomb. Compile in a clean room, run on the device, and the disclosure event goes away. A practical playbook.

0410 min

The .kolm file format, component by component.

One signed zip with seven internal components and a manifest. Why everything is co-versioned, how the HMAC chain works, and what the file looks like on disk.

0510 min

Speculative decoding with deterministic drafts.

A draft model is the standard way. A recipe pack is faster, smaller, free at runtime, and verifiably correct on the patterns it covers. How RSD cuts the local inference bill to zero on structured tokens.