use cases / UC-02 · AI-native SaaS

The 90%-deterministic features paying frontier prices.

Every AI-native SaaS has a handful of features that do the same shape of thing every time (classify, extract, reformat, score). They make 60% of the calls and burn 80% of the model bill. kolm compiles each one once and ships you a .kolm artifact you serve from your own infra at fixed cost.

01 · the gross-margin tax

Where SaaS gross margin goes to die.

Pricing rounds usually expose this in the fundraise: AI features that should be 80%-margin SaaS plumbing end up at 30% because every customer call hits a frontier API. The fix is not a smaller model. The fix is the right artifact.

Cost per task
$0.04/ call

Typical Sonnet/GPT-class call for a 1.5k-input / 600-output extraction task. Pure variable cost, charged on every customer click.

Marginal kolm cost
$0.0003/ call

3B-class Specialist serving the same task. ~130× cheaper on a $99/mo GPU sliver after the one-time compile fee. Latency drops 3-8× alongside.

Compile cost (once)
$200- $2k

The teacher-model bill to k-sample & verify a labeled set big enough to LoRA-tune the student. Amortizes over the next million calls.

02 · what compiles cleanly

The features built for .kolm

Not every feature should be compiled. The criterion is honest: does the task have a verifier? If a deterministic check (regex, JSON schema, AST diff, exact match, BLEU>cutoff) can distinguish a good output from a bad one, kolm can compile it.

EX

Extraction.

Resume parsing, invoice line-items, contract clauses, support-ticket field-tagging. JSON-schema verifiers run in microseconds; the verifier is the spec.

CL

Classification + routing.

Intent detection, ticket routing, content moderation, lead scoring. Confusion-matrix verifiers over a held-out set, K-score gates production.

RW

Rewrite + reformat.

Tone shifting, summarization at fixed length, translation pairs you ship for, code-style normalization. Reference-output verifiers via BLEU/ROUGE thresholds.

03 · the migration

Compile a feature in 90 seconds.

Bring 50-200 (input, expected) pairs from your prod logs. The compiler synthesizes the verifier, k-samples the teacher, fits a LoRA, runs the K-score gate, and signs the artifact.

~/myco-saas
# 1. drop your eval pairs in
$ ls examples/
ticket-router-train.jsonl    # 200 pairs from prod logs
ticket-router-eval.jsonl     # 50 holdout

# 2. compile
$ kolm compile "route a support ticket to one of 12 queues" \
    --examples examples/ticket-router-train.jsonl \
    --eval examples/ticket-router-eval.jsonl \
    --base qwen2.5-3b-instruct
 verifier synthesized: schema + label-set match (47 lines)
 k-sampled teacher (claude-opus-4-7) on 200 pairs
 LoRA fit: 14 min, 3 epochs, loss 0.18 → 0.04
 K-score: 94.2 (T 95.8 / C 91.1 / L 99.7)
 signed: ticket-router-1.0.0.kolm (1.4GB)

# 3. ship it
$ kolm serve ticket-router-1.0.0.kolm --port 8000 --mcp
 serving on http://localhost:8000  (cold start: 1.2s)
04 · the model after vs before

One feature. Eight months of margin.

Worked example: a Series A company processing 4M ticket-routing calls / month. Before kolm, that’s a $160k/yr line item with growth-elastic exposure. After: a fixed $99/mo GPU and a one-time compile.

Before kolm · SonnetX API
$160,128 / year

4M calls × $0.04/call. Variable. Grows with usage. Vendor pricing changes outside your control. Per-call latency 1.4s p50.

After kolm · compiled Specialist
$1,888 / year

$99/mo GPU + $700 one-time compile. Fixed. Latency 180ms p50. K-score 94.2 monitored continuously, regressions block deploys.

05 · how it ships

Three ways to serve a .kolm.

Same artifact, three places it can run. The CI pipeline produces the artifact; what changes is where the inference happens.

DC

Your own GPU.

Serve .kolm via kolm serve on a single H100 sliver, A10, L40, or 5090. OpenAI-compatible /v1/chat/completions + MCP. Drop-in replace your existing API client.

EG

Edge / per-customer.

Ship the artifact to the customer’s VPC for their compliance posture. Same artifact, different infra. The receipt chain proves it’s the same model.

CL

kolm Cloud.

Don’t want to operate GPUs at all? Cloud serves your artifacts on managed infra, billed by request, with the same receipt chain. Same gross margin shape.

Stop renting your gross margin from a closed model API.

Every shipped Specialist is a margin you got back. Compile your three biggest cost-center features and watch the unit economics shift in a quarter.