vs OpenAI Fine-Tuning

Their model. Or your model.

OpenAI fine-tuning trains a model that only runs on their servers, billed per call, accessible via API key. kolm trains a model and gives you the file - signed, portable, runs on hardware you own. Same training step underneath; the difference is who owns the deliverable.

OpenAI Fine-Tuning

A hosted training service. Upload pairs, OpenAI trains a private variant of GPT-4o or GPT-4o-mini, you call it through their API. The model lives on OpenAI infrastructure forever.

vs

kolm

A compiler. Capture pairs, run them through a verifier, distill into a smaller open-base model, sign the result. The deliverable is a .kolm file you can read, ship, fork, and run.

The same first step, different last step

Both products start from the same place: (input, expected output) pairs. The user uploads a JSONL file or proxies traffic through a capture endpoint. Both run a supervised training loop on those pairs.

The difference is the deliverable. OpenAI returns a model name like ft:gpt-4o-mini-2026-08-06:org:project:abc123. You get a per-token bill in exchange. kolm returns a .kolm file with weights, recipes, and a signed manifest. You get a model you can copy onto a server, a phone, or an air-gapped network.

Where OpenAI fine-tuning wins

Honest concession. OpenAI's tooling is more polished. The dashboard handles eval reports, hyperparameter sweeps, version pinning, and the resulting model speaks the OpenAI API exactly. If your stack is already on the OpenAI SDK and your bottleneck is "I need a slightly better GPT-4o-mini for this task this week," there is no faster path.

You also benefit from frontier upgrades. When OpenAI ships a new base model, your fine-tune can be re-run against it with a click. kolm targets specific open bases (Qwen, Llama, Phi). When a new base comes out, you re-compile.

Bigger context windows. A fine-tuned GPT-4o variant inherits 128K context. A 7B-base .kolm is typically 32K or 128K depending on base. If your task needs 200K+ context, frontier fine-tuning is the right answer today.

Where kolm wins

You own the file. The whole point. The .kolm file lands on your hard drive. You can run it offline, ship it to a customer, deploy it on a phone, fork it, audit it, anchor its hash on-chain. None of that is possible with a hosted fine-tune.

$0 marginal inference. After the compile, you are not paying per call. You bought the artifact; running it is your hardware cost only. For high-volume tasks this crosses a break-even point fast.

Privacy and sovereignty. The captured pairs and the resulting model never leave your control once compiled. For HIPAA, finance, defense, or any data-residency regime, this is the difference between "we can ship" and "we can't ship."

Receipts. Every output a .kolm produces ships with an HMAC-SHA256 receipt chain. OpenAI does not sign individual outputs.

Portability across stacks. The .kolm file runs through llama.cpp, vLLM, MLX, or our own runtime. It is not bound to any vendor.

Side-by-side

OpenAI Fine-Tuningkolm
What it is Hosted fine-tune of GPT-4o family Compile to portable signed artifact
Output A model name on OpenAI servers A .kolm file (≤3 GB)
Where it runs OpenAI servers only Anywhere - server, phone, air-gapped
Per-call cost ~3-8x base GPT-4o-mini per token $0 marginal (your hardware)
Base model GPT-4o-mini, GPT-4o (closed) Qwen, Llama, Phi (open weights)
Context window 128K inherited 32K-128K depending on base
Receipts / signing no HMAC-SHA256 chain on every output
Data residency Pairs uploaded to OpenAI Pairs stay on your namespace; artifact yours
Vendor lock model dies if account dies file is yours forever
Eval tooling first-class - dashboard + sweeps K-score gate on held-out test set

When to use OpenAI fine-tuning

Use OpenAI fine-tuning when the task needs frontier-grade reasoning, the data residency story is "OpenAI is fine," and the per-call cost is acceptable for your volume. The 128K context, the polished dashboard, and the API-compatibility-by-default are all real advantages.

# classic OpenAI fine-tune flow:
openai.files.create(file=open("pairs.jsonl", "rb"),
                    purpose="fine-tune")
openai.fine_tuning.jobs.create(training_file="file-abc",
                               model="gpt-4o-mini-2026-08-06")

When to use kolm

Use kolm when the task needs a portable model you own. Privacy regulations, on-device deployment, air-gapped networks, $0-marginal-cost inference, or simply not wanting your model to live on someone else's servers - all good reasons to compile.

# point traffic at the kolm capture proxy:
OPENAI_BASE_URL=https://kolm.ai/v1/capture/openai

# once enough pairs accumulate, compile:
kolm compile "answer support tickets" \
  --namespace support \
  --base qwen2.5-7b

ok wrote support.kolm  k_score=0.89  signature=hmac-sha256

Can I use both?

Yes. Many teams will keep an OpenAI fine-tune for the long-tail or top-of-funnel reasoning while compiling specific high-volume sub-tasks into .kolm files. The kolm capture proxy preserves the upstream call, so there's no compromise on the OpenAI side until you flip to the local artifact.

Verdict

If your task needs frontier reasoning at 128K and you're fine on data residency, use OpenAI fine-tuning. The dashboard is polished, the upgrade path is clean.

If your task can fit in a 7B-class model and you want the file, use kolm. You will pay once, run forever, and own the deliverable.

Adjacent comparisons: vs fine-tuning (general) · vs Together · vs LangSmith · full comparison table