What this recipe does
Replaces the "ask GPT to review this PR" prompt that lives in your CI with a signed local model. The deliverable is a .kolm file that runs offline, returns a JSON object you can render in a comment, and never leaks the diff to a third-party API.
The shape: diff in, structured-review out. Each issue carries severity (error / warning / info), category (correctness / style / security / perf), and a precise file:line pointer. The verifier rejects free-form prose and enforces the schema before any output ships.
The spec
{
"type": "object",
"required": ["summary", "issues"],
"properties": {
"summary": { "type": "string", "maxLength": 280 },
"issues": {
"type": "array",
"items": {
"type": "object",
"required": ["severity", "category", "file", "line", "detail"],
"properties": {
"severity": { "enum": ["error", "warning", "info"] },
"category": { "enum": ["correctness", "style", "security", "perf"] },
"file": { "type": "string" },
"line": { "type": "integer", "minimum": 1 },
"detail": { "type": "string", "maxLength": 480 }
}
}
}
}
}
Gold pairs (3 of 120 shown)
--- a/src/auth.js
+++ b/src/auth.js
@@ -42,8 +42,11 @@ function checkPassword(req, res) {
const stored = db.users[req.body.email];
- if (stored.password === req.body.password) {
+ if (stored.password == req.body.password) {
return res.json({ token: sign(stored) });
}
return res.status(401).end();
}
{
"summary": "Loose equality on credential check; missing bcrypt compare.",
"issues": [
{
"severity": "error", "category": "security",
"file": "src/auth.js", "line": 44,
"detail": "Plaintext password comparison with == coerces types and timing-leaks. Use bcrypt.compare or crypto.timingSafeEqual."
}
]
}
--- a/lib/sum.py +++ b/lib/sum.py @@ -1,3 +1,3 @@ def sum_pos(xs): - return sum(x for x in xs if x > 0) + return sum([x for x in xs if x > 0])
{
"summary": "List comprehension is materialized unnecessarily; minor perf regression.",
"issues": [
{
"severity": "info", "category": "perf",
"file": "lib/sum.py", "line": 3,
"detail": "Generator already lazy; wrapping it in a list allocates an extra array. Drop the brackets."
}
]
}
--- a/api/users.go
+++ b/api/users.go
@@ -10,6 +10,9 @@ func ListUsers(w http.ResponseWriter, r *http.Request) {
page, _ := strconv.Atoi(r.URL.Query().Get("page"))
+ if page < 0 { page = 0 }
+ if page > 1000 { page = 1000 }
users := db.ListUsers(page * 50)
}
{
"summary": "Defensive bounds added; correct.",
"issues": []
}
Compile
# pairs.jsonl: 120 (input, output) pairs of the shape above kolm compile "PR review with structured issues" \ --base qwen2.5-coder-7b \ --pairs pairs.jsonl \ --spec spec.json \ --k-floor 0.85 \ --output pr-review.kolm ok wrote pr-review.kolm k_score=0.89 signature=hmac-sha256 artifact_hash=sha256:8a4f...e612
K-score gate
Floor was 0.85. Run came in at 0.89. The verifier rejected three Pass-1 outputs that emitted free-form prose instead of the schema; those got recompiled and the gate held.
Run-time profile
Numbers are p50 on a 32-line diff. Long diffs (200+ lines) scale roughly 2x. The artifact is 2.4 GB on disk; cold-load to first-token is dominated by the load step on phone-class hardware.
Deploy
# GitHub Action that posts a review on every push: - uses: kolm-ai/run-action@v1 with: artifact: pr-review.kolm input: ${{ github.event.pull_request.diff_url }} output: review.json - uses: actions/github-script@v7 with: script: | const r = require('./review.json'); for (const i of r.issues) { await github.rest.pulls.createReviewComment({...}); }