Bug spotter recipe

What this recipe does

Single-function bug detection. Where PR review answers "what's wrong with this whole change?" the bug spotter zooms in on a hot function and asks "what's about to break inside this 30-line scope?"

Used on its own as a pre-commit hook, or as a second pass after PR review. The spec is intentionally narrow: one input function, one output list of flags with severity and a one-line explanation.

The spec

{
  "type": "object",
  "required": ["flags"],
  "properties": {
    "flags": {
      "type": "array", "maxItems": 8,
      "items": {
        "type": "object",
        "required": ["line", "kind", "detail"],
        "properties": {
          "line": { "type": "integer", "minimum": 1 },
          "kind": { "enum": ["off-by-one", "null-deref", "race", "leak", "flow", "unsafe-input"] },
          "detail": { "type": "string", "maxLength": 240 }
        }
      }
    }
  }
}

Gold pairs (3 of 80 shown)

input - off-by-one

def last_n(xs, n):
    out = []
    for i in range(len(xs) - n, len(xs) + 1):
        out.append(xs[i])
    return out

output

{
  "flags": [
    {
      "line": 3, "kind": "off-by-one",
      "detail": "range upper bound is len(xs)+1; xs[len(xs)] is out-of-bounds. Use range(len(xs)-n, len(xs))."
    }
  ]
}

input - leaked resource

def read_first_line(path):
    f = open(path)
    return f.readline()

output

{
  "flags": [
    {
      "line": 2, "kind": "leak",
      "detail": "file handle never closed; use 'with open(path) as f:' to scope the resource."
    }
  ]
}

input - clean code

def clamp(x, lo, hi):
    return max(lo, min(x, hi))

output

{ "flags": [] }

Compile

kolm compile "flag suspicious patterns in a single function" \
  --base qwen2.5-coder-3b \
  --pairs bugs.jsonl \
  --spec spec.json \
  --k-floor 0.82 \
  --output bug-spotter.kolm

ok wrote bug-spotter.kolm
   k_score=0.86  signature=hmac-sha256

K-score gate

K-score 0.86 held-out 40 pairs · verifier-pass 90% · recipe-coverage 84% · latency-ratio 0.86

The verifier rejects empty objects, missing flags, and any kind outside the enum. False-positive rate on the held-out clean snippets was 7% (3/40). The recipe is tuned to be quiet - we'd rather miss a flag than spam a clean function.

Run-time profile

M2 MacBook

620ms

RTX 5090

180ms

iPhone 15 Pro

2.1s

CPU x86 (server)

2.8s

p50 on a 25-line function. Compact enough to run as a pre-commit hook on a developer laptop without measurable friction.

Deploy

# pre-commit hook (.git/hooks/pre-commit):
#!/bin/sh
for fn in $(git diff --cached --name-only --diff-filter=AM); do
  for func in $(kolm extract-functions "$fn"); do
    flags=$(kolm run bug-spotter.kolm --input "$func")
    if echo "$flags" | jq -e '.flags | length > 0'; then
      echo "$flags"
      exit 1
    fi
  done
done

Find suspicious patterns. In a single function.

What this recipe does

The spec

Gold pairs (3 of 80 shown)

Compile

K-score gate

Run-time profile

Deploy

Related recipes