kolm.app · the consumer surface (sprint 3)

The biggest model on your phone, honestly.

Not a quantized 70B. A compiled personal model that behaves like Hermes-class on the tasks you compiled it for, because it was distilled on those exact tasks, drafted by recipes that cover its structured outputs, and grounded in your own corpus. 3B base + LoRA + recipe pack + sqlite-vec, all running offline.

SPRINT 3 PREVIEW

kolm.app is in build.

The CLI ships first; the cloud ships second; the boxed mobile app is the consumer endpoint.
Below is what it will look like when it lands.

Three Specialists out of the box.

k=389.4 · 2.1GB · signed

Personal Assistant

Schedule, recall, summarize, draft. Grounded in your photos, voice memos, calendar, and inbox via on-device sqlite-vec.

k=412.7 · 2.0GB · signed

Email Reply

Drafts replies in your voice. Distilled from your sent folder. Never auto-sends. Receipts visible on every draft.

k=355.9 · 1.9GB · signed

Daily Recap

One paragraph at end of day, grounded in everything captured. Optional voice readout. Stays on device.

Or compile your own from your phone.

The "Compile a new one" button drives the same cloud pipeline as the CLI, using your on-device corpus (photos, voice memos, screen captures) as the data source. The resulting .kolm downloads back onto the phone and runs locally, no further API calls.

# under the hood, the app does what you'd do at the CLI 1. capture phone-side → embed locally via wllama.wasm + sqlite-vec.wasm 2. user types a task → POST /v1/compile (cloud) 3. cloud runs the four engines + signs 4. phone downloads .kolm → loads model + LoRA + recipes + index 5. all subsequent calls run on-device

The runtime.

# in-browser stack, same on PWA, Android (NPU), iOS (Neural Engine) wllama.wasm # INT4 inference sqlite-vec.wasm # offline recall index recipe-worker.js (sandbox) # deterministic draft execution service-worker.js # offline-first artifact cache # Sprint 4 adds executorch bindings # for hardware-accelerated NPU paths.