Churn-predict recipe

What this recipe does

Replaces a "churn risk" rules engine that was 20+ if-statements deep and never quite right. Reads telemetry (login frequency, feature breadth, seat-utilization, support-ticket sentiment, billing health) and returns a risk band plus the specific signals that drove it. The verifier confirms every cited signal actually appears in the input — no hallucinated reasons.

The probability is calibrated on the held-out set: when the model says "0.7 risk," accounts at that bucket actually churn ~70% of the time within 90 days.

The spec

{
  "output_kind": "json",
  "schema": {
    "required": ["risk", "band", "reasons", "suggested_outreach"],
    "properties": {
      "risk": { "type": "number", "minimum": 0, "maximum": 1 },
      "band": { "enum": ["green", "yellow", "red"] },
      "reasons": { "type": "array", "items": {
        "required": ["signal", "value", "trend"],
        "properties": {
          "signal": { "type": "string" },
          "value": {},
          "trend": { "enum": ["up", "flat", "down"] }
        }
      } },
      "suggested_outreach": { "type": "string" }
    }
  },
  "verifier": {
    "reasons_must_match_input": true,
    "calibration_target_brier_score": 0.10
  }
}

Gold pairs (1 of 600 shown)

input - 30d telemetry

{
  "account": "acme-co", "plan": "team", "seats": 12,
  "logins_30d": 84,                "prev_logins_30d": 142,
  "feature_breadth": 0.31,        "prev_feature_breadth": 0.58,
  "support_tickets_30d": 4,       "prev_support_tickets_30d": 1,
  "avg_support_sentiment": -0.35, "prev_avg_support_sentiment": 0.10,
  "billing_health": "current", "renewal_in_days": 41
}

output

{
  "risk": 0.72,
  "band": "red",
  "reasons": [
    {"signal": "logins_30d", "value": 84, "trend": "down"},
    {"signal": "feature_breadth", "value": 0.31, "trend": "down"},
    {"signal": "support_tickets_30d", "value": 4, "trend": "up"},
    {"signal": "avg_support_sentiment", "value": -0.35, "trend": "down"}
  ],
  "suggested_outreach": "CSM check-in within 7 days; lead with the support ticket sentiment dip and ask which workflow regressed. Renewal is 41 days out — there's time to fix this."
}

Compile

kolm compile "churn risk classifier with grounded reasons" \
  --base qwen2.5-coder-3b \
  --pairs pairs.jsonl \
  --verifier reasons-grounded,calibrated-brier=0.10 \
  --k-floor 0.85 \
  --output churn-predict.kolm

ok wrote churn-predict.kolm
   k_score=0.87  signature=hmac-sha256

K-score gate

K-score 0.87 held-out 180 accounts · reasons-grounded 100% · band-accuracy 84% · Brier 0.092 (target 0.10)

"Reasons grounded" means: every cited signal-value pair must appear in the input telemetry. The model can't make up a "feature_breadth = 0.31, trending down" reason if the input shows 0.58 stable. Verifier catches this at compile time.

Run-time profile

M2 MacBook

680ms

RTX 5090

190ms

iPhone 15 Pro

2.0s

CPU x86 (server)

2.6s

Deploy

# nightly batch — score every active account, write red-band to CSM queue:
for acct in $(psql -t -c "SELECT id FROM accounts WHERE active"); do
  tel=$(./build-telemetry.py --account "$acct" --days 30)
  out=$(kolm run churn-predict.kolm --input "$tel")
  if [ "$(echo "$out" | jq -r .band)" = "red" ]; then
    queue write csm-priority "$out"
  fi
done

Risk scores, calibrated.