What this recipe does
Replaces a "churn risk" rules engine that was 20+ if-statements deep and never quite right. Reads telemetry (login frequency, feature breadth, seat-utilization, support-ticket sentiment, billing health) and returns a risk band plus the specific signals that drove it. The verifier confirms every cited signal actually appears in the input — no hallucinated reasons.
The probability is calibrated on the held-out set: when the model says "0.7 risk," accounts at that bucket actually churn ~70% of the time within 90 days.
The spec
{
"output_kind": "json",
"schema": {
"required": ["risk", "band", "reasons", "suggested_outreach"],
"properties": {
"risk": { "type": "number", "minimum": 0, "maximum": 1 },
"band": { "enum": ["green", "yellow", "red"] },
"reasons": { "type": "array", "items": {
"required": ["signal", "value", "trend"],
"properties": {
"signal": { "type": "string" },
"value": {},
"trend": { "enum": ["up", "flat", "down"] }
}
} },
"suggested_outreach": { "type": "string" }
}
},
"verifier": {
"reasons_must_match_input": true,
"calibration_target_brier_score": 0.10
}
}
Gold pairs (1 of 600 shown)
{
"account": "acme-co", "plan": "team", "seats": 12,
"logins_30d": 84, "prev_logins_30d": 142,
"feature_breadth": 0.31, "prev_feature_breadth": 0.58,
"support_tickets_30d": 4, "prev_support_tickets_30d": 1,
"avg_support_sentiment": -0.35, "prev_avg_support_sentiment": 0.10,
"billing_health": "current", "renewal_in_days": 41
}
{
"risk": 0.72,
"band": "red",
"reasons": [
{"signal": "logins_30d", "value": 84, "trend": "down"},
{"signal": "feature_breadth", "value": 0.31, "trend": "down"},
{"signal": "support_tickets_30d", "value": 4, "trend": "up"},
{"signal": "avg_support_sentiment", "value": -0.35, "trend": "down"}
],
"suggested_outreach": "CSM check-in within 7 days; lead with the support ticket sentiment dip and ask which workflow regressed. Renewal is 41 days out — there's time to fix this."
}
Compile
kolm compile "churn risk classifier with grounded reasons" \ --base qwen2.5-coder-3b \ --pairs pairs.jsonl \ --verifier reasons-grounded,calibrated-brier=0.10 \ --k-floor 0.85 \ --output churn-predict.kolm ok wrote churn-predict.kolm k_score=0.87 signature=hmac-sha256
K-score gate
"Reasons grounded" means: every cited signal-value pair must appear in the input telemetry. The model can't make up a "feature_breadth = 0.31, trending down" reason if the input shows 0.58 stable. Verifier catches this at compile time.
Run-time profile
Deploy
# nightly batch — score every active account, write red-band to CSM queue: for acct in $(psql -t -c "SELECT id FROM accounts WHERE active"); do tel=$(./build-telemetry.py --account "$acct" --days 30) out=$(kolm run churn-predict.kolm --input "$tel") if [ "$(echo "$out" | jq -r .band)" = "red" ]; then queue write csm-priority "$out" fi done