kolm / tutorials / code review

Compile a PR review bot in 25 minutes.

By the end of this walkthrough you have an 8.2 KB .kolm that reads a unified-diff and emits a list of structured review comments. Three issue classes only: missing tests, unbounded recursion, hardcoded secrets. Tight scope on purpose; the K-score floor is 0.87 against a held-out set of 600 PRs.

Runtime 25 min Output code-review.kolm Base deepseek-coder-6.7b K-target 0.87 Size 8.2 KB

Step 1 . 60 seconds

Install and authenticate.

$ npm install -g kolm
$ kolm login
$ kolm version
  k o l m
  ------- the private AI compiler
kolm cli v0.1.0
spec rs-1

Step 2 . 3 minutes

Write the recipe.

Scope this tight. Three issue classes, an output schema that names file + line + class + suggestion, and a K-score floor at 0.87. The eval pack pr-review-3class-v1 covers the held-out test set.

$ cat > code-review.recipe.json <<'JSON'
{
  "task": "review a unified diff for three issue classes: missing_tests, unbounded_recursion, hardcoded_secrets",
  "base": "deepseek-coder-6.7b-instruct",
  "objective": "per-class-precision-at-r80",
  "adapter": "lora + dpo",
  "context_window": 16384,
  "output_schema": {
    "comments": [
      {
        "file":        "string",
        "line":        "int",
        "class":       { "enum": ["missing_tests", "unbounded_recursion", "hardcoded_secrets"] },
        "severity":    { "enum": ["block", "suggest"] },
        "message":     { "type": "string", "max_chars": 240 },
        "suggestion":  { "type": "string", "max_chars": 400, "required": false }
      }
    ]
  },
  "target_k": 0.90,
  "min_k":    0.87,
  "eval_pack": "pr-review-3class-v1"
}
JSON

CheckpointTight scope is the trick. Three classes with a 0.87 floor beats fifteen classes with a 0.62 floor every time, and your reviewers know which comments to trust.

Step 3 . 4 minutes

Seed with examples.

Pull 80 real diffs from your repo history, labeled by issue class. The synthetic data step extrapolates to ~2,500 training pairs.

$ head -1 code-review.examples.jsonl | jq .
{
  "input": "diff --git a/auth/login.py b/auth/login.py\n+++ b/auth/login.py\n@@ -12,3 +12,7 @@ def login(email, password):\n+    API_KEY = 'sk-prod-9af28e...' \n+    r = requests.post(url, headers={'Authorization': API_KEY})\n+    return r.json()",
  "expected": {
    "comments": [
      {
        "file": "auth/login.py",
        "line": 13,
        "class": "hardcoded_secrets",
        "severity": "block",
        "message": "Live API key committed in source. Move to env or secret manager before merge."
      }
    ]
  }
}

Step 4 . 14 minutes

Compile.

$ kolm compile --from code-review.recipe.json --examples code-review.examples.jsonl --out code-review.kolm

[1/6] synthesizing pairs (Magpie + CodeSearchNet seed) ... 2,562 pairs 1m 24s
[2/6] dedup + filter (per-language MinHash) .............. 2,318 pairs 11s
[3/6] LoRA + DPO (preference over reviewer votes) ........ 4 epochs 10m 02s
[4/6] constrained-decoder fit (comment schema) ........... 42s
[5/6] K-score gate (pr-review-3class-v1) ................. K = 0.891 > 0.87 floor
[6/6] sign + package ..................................... 3s

  artifact: ./code-review.kolm  (8.2 KB)
  receipt:  ./code-review.receipt.json  (3.4 KB)
  CID:      cidv1:sha256:a31e7c...

CheckpointCompile cost on our account: $2.28. K=0.891 is the held-out precision-at-recall-80. The per-class precision breakout is in the receipt.

Step 5 . 1 minute

Run on a real diff.

$ git diff main...HEAD | kolm run code-review.kolm --stdin

{
  "comments": [
    {
      "file": "ingest/walker.py",
      "line": 42,
      "class": "unbounded_recursion",
      "severity": "block",
      "message": "walk() recurses on subdir without a depth cap; nested-symlink loop will OOM the worker.",
      "suggestion": "Add max_depth=64 with a counter; raise WalkLimit if hit."
    },
    {
      "file": "auth/session.py",
      "line": 88,
      "class": "missing_tests",
      "severity": "suggest",
      "message": "New SessionStore.invalidate() path has no test; the rollover bug we fixed in #1422 would regress silently."
    }
  ],
  "latency_ms": 2104.8,
  "receipt_cid": "cidv1:sha256:a31e7c..."
}

Step 6 . 2 minutes

Wire into GitHub Actions.

Drop this in .github/workflows/kolm-review.yml. The artifact runs on the runner; no PR diff leaves your network if you set KOLM_BACKEND=local_cpu.

name: kolm review
on:
  pull_request: {}

jobs:
  review:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npm install -g kolm
      - name: review
        env:
          KOLM_BACKEND: local_cpu
        run: |
          git diff origin/${{ github.base_ref }}...HEAD \
            | kolm run ./code-review.kolm --stdin > review.json
      - uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const r = JSON.parse(fs.readFileSync('review.json'));
            for (const c of r.comments) {
              await github.rest.pulls.createReviewComment({
                ...context.repo,
                pull_number: context.issue.number,
                body: `**${c.class}** [${c.severity}] ${c.message}`,
                path: c.file,
                line: c.line,
              });
            }

CheckpointThe bot posts inline comments only on the three classes it was compiled for. Reviewers learn fast that when the bot speaks, it is on a class it scores 0.89+ on.

Step 7 . optional

Verify and ship.

$ kolm verify code-review.kolm

  ✓ manifest CID matches canonical hash
  ✓ all 12 entries hashed and verified
  ✓ receipt HMAC valid
  ✓ K-score 0.891 (above declared gate 0.87)
  artifact is valid.

All tutorials → GitHub Actions reference → GitLab CI reference →