What this recipe does
Looks at a function and asks "is there a smaller, clearer, or more idiomatic shape for this same behavior?" Outputs a proposed-diff plus a single-line reason. The verifier walks the AST before and after and rejects anything that introduces a new side effect, changes the function's signature, or removes a branch the caller might depend on.
The spec
{
"type": "object",
"required": ["diff", "reason"],
"properties": {
"diff": { "type": "string", "description": "unified diff against the input" },
"reason": { "type": "string", "maxLength": 200 }
},
"verifier": {
"signature_preserved": true,
"side_effects_unchanged": true,
"branches_preserved": true
}
}
Gold pairs (3 of 90 shown)
def find_user(users, target):
result = None
for u in users:
if u.email == target:
result = u
break
return result
{
"diff": "@@\n- result = None\n- for u in users:\n- if u.email == target:\n- result = u\n- break\n- return result\n+ return next((u for u in users if u.email == target), None)",
"reason": "Loop is just a search; next() with default expresses it directly and avoids the temp variable."
}
def is_eligible(user):
if user.age >= 18:
if user.country == 'US':
if user.verified:
return True
else:
return False
else:
return False
else:
return False
{
"diff": "@@\n- if user.age >= 18:\n- if user.country == 'US':\n- if user.verified:\n- return True\n- else:\n- return False\n- else:\n- return False\n- else:\n- return False\n+ return user.age >= 18 and user.country == 'US' and user.verified",
"reason": "All branches return a single boolean expression; collapse to one line."
}
def hex_color_to_rgb(s: str) -> tuple[int, int, int]:
s = s.lstrip("#")
return int(s[0:2], 16), int(s[2:4], 16), int(s[4:6], 16)
{
"diff": "",
"reason": "Already clear and idiomatic; no refactor needed."
}
Compile
kolm compile "propose a refactor with rationale" \ --base qwen2.5-coder-7b \ --pairs pairs.jsonl \ --verifier ast:behavior-preserving \ --k-floor 0.82 \ --output refactor.kolm ok wrote refactor.kolm k_score=0.85 signature=hmac-sha256
K-score gate
The recipe is tuned to recognize when no refactor is needed. 22% of held-out inputs were already clean, and the model returned an empty diff with a one-line acknowledgement - exactly the right answer.
Run-time profile
Deploy
# weekly nudge in CI: open a PR with refactor suggestions: for fn in $(find src -name "*.py" -mtime -7); do out=$(kolm run refactor.kolm --input-file "$fn") if [ -n "$(echo "$out" | jq -r '.diff')" ]; then echo "$out" >> suggestions.txt fi done