Refactor recipe

What this recipe does

Looks at a function and asks "is there a smaller, clearer, or more idiomatic shape for this same behavior?" Outputs a proposed-diff plus a single-line reason. The verifier walks the AST before and after and rejects anything that introduces a new side effect, changes the function's signature, or removes a branch the caller might depend on.

The spec

{
  "type": "object",
  "required": ["diff", "reason"],
  "properties": {
    "diff": { "type": "string", "description": "unified diff against the input" },
    "reason": { "type": "string", "maxLength": 200 }
  },
  "verifier": {
    "signature_preserved": true,
    "side_effects_unchanged": true,
    "branches_preserved": true
  }
}

Gold pairs (3 of 90 shown)

input

def find_user(users, target):
    result = None
    for u in users:
        if u.email == target:
            result = u
            break
    return result

output

{
  "diff": "@@\n-    result = None\n-    for u in users:\n-        if u.email == target:\n-            result = u\n-            break\n-    return result\n+    return next((u for u in users if u.email == target), None)",
  "reason": "Loop is just a search; next() with default expresses it directly and avoids the temp variable."
}

input

def is_eligible(user):
    if user.age >= 18:
        if user.country == 'US':
            if user.verified:
                return True
            else:
                return False
        else:
            return False
    else:
        return False

output

{
  "diff": "@@\n-    if user.age >= 18:\n-        if user.country == 'US':\n-            if user.verified:\n-                return True\n-            else:\n-                return False\n-        else:\n-            return False\n-    else:\n-        return False\n+    return user.age >= 18 and user.country == 'US' and user.verified",
  "reason": "All branches return a single boolean expression; collapse to one line."
}

input - already clean

def hex_color_to_rgb(s: str) -> tuple[int, int, int]:
    s = s.lstrip("#")
    return int(s[0:2], 16), int(s[2:4], 16), int(s[4:6], 16)

output

{
  "diff": "",
  "reason": "Already clear and idiomatic; no refactor needed."
}

Compile

kolm compile "propose a refactor with rationale" \
  --base qwen2.5-coder-7b \
  --pairs pairs.jsonl \
  --verifier ast:behavior-preserving \
  --k-floor 0.82 \
  --output refactor.kolm

ok wrote refactor.kolm
   k_score=0.85  signature=hmac-sha256

K-score gate

K-score 0.85 held-out 45 pairs · behavior-preserved 96% · signature-preserved 100% · "no refactor" path 22%

The recipe is tuned to recognize when no refactor is needed. 22% of held-out inputs were already clean, and the model returned an empty diff with a one-line acknowledgement - exactly the right answer.

Run-time profile

M2 MacBook

1.6s

RTX 5090

380ms

iPhone 15 Pro

4.2s

CPU x86 (server)

5.5s

Deploy

# weekly nudge in CI: open a PR with refactor suggestions:
for fn in $(find src -name "*.py" -mtime -7); do
  out=$(kolm run refactor.kolm --input-file "$fn")
  if [ -n "$(echo "$out" | jq -r '.diff')" ]; then
    echo "$out" >> suggestions.txt
  fi
done

Refactor with rationale.

What this recipe does

The spec

Gold pairs (3 of 90 shown)

Compile

K-score gate

Run-time profile

Deploy

Related recipes