kolm / case studies / finance
A mid-cap regional bank ran a claims-routing model in production for 14 months. When the OCC's targeted exam window opened, the SR 11-7 model-risk binder needed an evidence trail an examiner could replay. The kolm receipts had been emitting all along.
Replay rate
100% of 12k
Findings
0 MRA / 0 MRIA
Time to comply
9 days
External cost
$0 add'l
$28B-asset bank, three-state footprint. Their property-and-casualty claims line routes incoming claims into one of five queues (standard, complex, fraud-flag, hurricane-special, customer-VIP). The router is a small classifier: tokenize the claim narrative + structured fields, predict the queue, fall back to human triage on low confidence. SR 11-7 puts this model squarely in the "tier 2" model-risk category — not core capital but customer-facing and consequential.
The bank's MRM (model-risk management) team had inherited the model from a prior vendor engagement. The vendor was no longer responsive. When the OCC examination window opened, the bank had ~10 weeks to assemble independent validation, performance monitoring, and ongoing-use evidence.
The router was already compiled as a .kolm 8 months earlier — not for the exam, but because the bank's IT-security team had blocked the original vendor's cloud callback. As a side effect of compilation, every production inference had been emitting an HMAC-signed receipt with the artifact CID, input hash, output hash, and K-score.
When the SR 11-7 evidence ask landed, the MRM team did not need to instrument anything. The receipts were sitting in the audit-log warehouse already. The exam preparation was about formatting the data the examiner expected, not collecting it.
Recipe.yaml + the labeled training set, both pinned to a CID in the artifact's provenance block. The examiner could trace every training row back to a specific claim-system snapshot date.
K-score per-month, computed from the production receipts themselves. The receipts carry K, so a re-aggregation over the prior 14 months took a single query.
SELECT date_trunc('month', ts) AS m,
AVG(k_score) AS k_mean,
COUNT(*) AS n
FROM model_receipts
WHERE artifact_cid = 'cidv1:sha256:9a3f4e1b…'
GROUP BY 1 ORDER BY 1;
The examiner sampled 12,000 historic claims at random and asked the bank to re-route them through the current production stack and prove the outputs matched what was recorded. With the input_sha and output_sha in every receipt, the replay test was a verifier run: 12,000/12,000 passed.
The audit log shows every override (human reroute), every gate fail (K below 0.92), and every fall-through to human triage. The MRM team's existing dashboards now pulled directly from receipts instead of vendor reports.
The examiner accepted the receipt JSONL as system-of-record. We did not have to commission an independent validation engagement on the model itself — the receipts proved both the model behavior and the production conformity. The exam closed with no Matters Requiring Attention. — SVP Model Risk Management (anonymized)
Total wall time from "examiner request received" to "evidence package delivered": 9 days. No additional external counsel, no validation vendor.
The finance compliance pack at /compliance-packs includes the SR 11-7 evidence schema referenced above. The full NIST AI RMF mapping is at /compliance/nist-ai-rmf — SR 11-7 and AI RMF overlap on roughly 70% of the same controls.