kolm  /  tutorials  /  support triage

Triage PagerDuty alerts in 15 minutes.

By the end of this walkthrough you have a 4.8 KB .kolm that reads an incoming PagerDuty webhook, returns a severity (sev1, sev2, sev3) and a routing team (api, ingest, db, ml-infra, frontend), with a HMAC receipt on every call. Runs on a single CPU. No alert text leaves your VPC.

Runtime 15 min Output triage.kolm Base qwen2.5-3b K-target 0.88 Size 4.8 KB

Step 1 . 90 seconds

Install and log in.

You will need a kolm account so the compile job has somewhere to charge against. Free tier covers the first three artifacts.

# Install
$ npm install -g kolm

# Open the browser sign-in flow
$ kolm login

# Sanity check
$ kolm version
  k o l m
  ------- the private AI compiler
kolm cli v0.1.0
spec rs-1
CheckpointIf kolm version prints the brand mark and spec rs-1, you are ready.

Step 2 . 2 minutes

Write the recipe.

The recipe names the task. It binds an input schema (raw alert text), an output schema (severity + team), an evaluator (multi-class F1 over a labeled set), and a K-score floor. Kolm picks the base model and the adapter unless you override.

$ cat > triage.recipe.json <<'JSON'
{
  "task": "classify PagerDuty alert text into severity and routing team",
  "base": "qwen2.5-3b-instruct",
  "objective": "multiclass-f1",
  "adapter": "lora",
  "output_schema": {
    "severity": { "enum": ["sev1", "sev2", "sev3"] },
    "team":     { "enum": ["api", "ingest", "db", "ml-infra", "frontend"] },
    "summary":  { "type": "string", "max_chars": 140 }
  },
  "target_k": 0.92,
  "min_k":    0.88,
  "eval_pack": "on-call-triage-v1"
}
JSON
Checkpointmin_k is the hard floor. The compile step exits non-zero if the artifact misses it. Use it as your CI gate.

Step 3 . 1 minute

Drop in labeled examples.

You need a small labeled set. 60 examples is enough to bootstrap. Format is JSONL: one record per line, each with input and expected. Pull a quarter of your past PagerDuty incidents and label them by team and severity.

$ head -3 triage.examples.jsonl
{"input":"DB connection pool exhausted on primary; 2k 503s in last 5min","expected":{"severity":"sev1","team":"db"}}
{"input":"Login button mis-aligned on mobile Safari, 3 user reports","expected":{"severity":"sev3","team":"frontend"}}
{"input":"Ingest worker queue depth growing 200/min, no consumer errors","expected":{"severity":"sev2","team":"ingest"}}

Synthetic augmentation runs as part of the compile, so 60 labels typically expand to ~1,200 training pairs.

Step 4 . 8 minutes

Compile.

One command. Step out for coffee while the compile streams its progress.

$ kolm compile --from triage.recipe.json --examples triage.examples.jsonl --out triage.kolm

[1/6] synthesizing (Magpie + Evol-Instruct) ........ 1,184 pairs 38s
[2/6] dedup + label-noise filter ................. 1,042 pairs 6s
[3/6] LoRA fit ................................... 3 epochs 5m 04s
[4/6] constrained-decoder fit (JSON schema) ...... 24s
[5/6] K-score gate ............................... K = 0.913 > 0.88 floor
[6/6] sign + package ............................. 2s

  artifact: ./triage.kolm  (4.8 KB)
  receipt:  ./triage.receipt.json  (3.0 KB)
  CID:      cidv1:sha256:8c92f1...
CheckpointTotal compile cost on our account: $0.62. The K-score gate blocks if the artifact misses the floor; you do not have to enforce it in CI separately.

Step 5 . 30 seconds

Run it locally.

The artifact runs in-process on a single CPU. No network call.

$ kolm run triage.kolm "DB connection pool exhausted on primary; 2k 503s in last 5min"

{
  "severity": "sev1",
  "team": "db",
  "summary": "Primary database connection pool exhausted; 2,000 503 errors in 5 minutes.",
  "latency_ms": 18.4,
  "receipt_cid": "cidv1:sha256:8c92f1..."
}

Step 6 . 2 minutes

Wire it into PagerDuty.

Point PagerDuty's webhook at a small handler that runs kolm run and posts the structured output back as an incident note. The artifact lives on your handler host; no alert text leaves your network.

$ cat > handler.js <<'JS'
const { spawn } = require('node:child_process');
const http = require('node:http');

http.createServer((req, res) => {
  let body = '';
  req.on('data', c => body += c);
  req.on('end', () => {
    const alert = JSON.parse(body);
    const p = spawn('kolm', ['run', './triage.kolm', alert.summary]);
    let out = '';
    p.stdout.on('data', c => out += c);
    p.on('close', () => {
      const r = JSON.parse(out);
      console.log('[triage]', alert.id, r.severity, r.team);
      res.end(JSON.stringify(r));
    });
  });
}).listen(8088);
JS

$ node handler.js
CheckpointThe receipt CID is your audit handle. Six months from now you can rerun the exact same alert against the exact same .kolm and prove the routing decision.

Step 7 . optional

Verify and ship.

The verifier confirms the manifest, the receipt HMAC, and the K-score gate. Hand the .kolm + the binder PDF to your on-call lead.

$ kolm verify triage.kolm --binder triage.binder.pdf

  ✓ manifest CID matches canonical hash
  ✓ all 9 entries hashed and verified
  ✓ receipt HMAC valid
  ✓ K-score 0.913 (above declared gate 0.88)
  artifact is valid.

# one-page PDF lists CID + base + recipe + K-score
# auditors sign the PDF, not the JSON