Step 1 . 90 seconds
Pick a task in your codebase that calls OpenAI.
Grep your repo for `chat.completions.create`. Pick the simplest one for the first compile - a classifier, a summarizer, a tagger.
$ grep -nR "chat.completions.create" src/ src/triage.py:34: resp = client.chat.completions.create( src/triage.py:35: model="gpt-4o", src/triage.py:36: messages=[{"role":"system","content":"You triage support tickets..."}, ...])
Step 2 . 60 seconds
Describe the task to kolm.
$ kolm compile "triage support tickets by urgency; output low/normal/high/urgent" \ --base qwen2.5-7b --target-k 0.95 Compile plan: SFT + DPO + constrained-decoder K target: 0.95 estimated cost: $1.20 time: 8 min
Step 3 . 8 minutes
Compile, then serve as OpenAI-compatible.
The CLI ships an OpenAI-compatible server. Same wire format.
$ kolm serve --http triage.kolm
k o l m
─────── the private AI compiler
serving triage.kolm
endpoint: http://127.0.0.1:8080/v1/chat/completions
K-score: 0.961
CID: cidv1:sha256:8a3...
Step 4 . 60 seconds
Change one line in your code.
The OpenAI SDK lets you point at any base_url. That's the whole switch.
# before client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) # after client = OpenAI( api_key="local", base_url="http://127.0.0.1:8080/v1", )
CheckpointYour code now calls the local artifact. No other changes. The `model` field is ignored; the loaded artifact is the model.
Step 5 . 60 seconds
Measure the delta.
The CLI ships a bench verb that compares a kolm endpoint to any OpenAI-compatible endpoint over the same 200 inputs.
$ kolm bench triage.kolm \
--against openai \
--inputs ./fixtures/tickets.jsonl
kolm openai-gpt-4o delta
median p50 9.3 ms 78 ms -88%
p99 14.1 ms 310 ms -95%
$ / 1M tok $0.13 $2.50 -95%
accuracy 0.961 0.954 +0.7%
| What changed | Before | After |
|---|---|---|
| Lines of code modified | — | 2 |
| Token bill per month (10M tok) | $25.00 | $1.30 |
| Median latency | 78 ms | 9.3 ms |
| Audit trail | opaque | receipt JSON |
Step 6 . optional
Add receipts to your audit log.
Every response from the kolm endpoint includes a receipt CID. Log it next to your existing request log and you can re-verify the model output months later.
resp = client.chat.completions.create(...) log.info("triage", request_id=req_id, receipt_cid=resp.kolm["receipt_cid"])