kolm  /  case studies  /  finance

An OCC examiner asked for the model log. The receipts were already there.

A mid-cap regional bank ran a claims-routing model in production for 14 months. When the OCC's targeted exam window opened, the SR 11-7 model-risk binder needed an evidence trail an examiner could replay. The kolm receipts had been emitting all along.

Replay rate

100% of 12k

Findings

0 MRA / 0 MRIA

Time to comply

9 days

External cost

$0 add'l

The setup

$28B-asset bank, three-state footprint. Their property-and-casualty claims line routes incoming claims into one of five queues (standard, complex, fraud-flag, hurricane-special, customer-VIP). The router is a small classifier: tokenize the claim narrative + structured fields, predict the queue, fall back to human triage on low confidence. SR 11-7 puts this model squarely in the "tier 2" model-risk category — not core capital but customer-facing and consequential.

The bank's MRM (model-risk management) team had inherited the model from a prior vendor engagement. The vendor was no longer responsive. When the OCC examination window opened, the bank had ~10 weeks to assemble independent validation, performance monitoring, and ongoing-use evidence.

Why kolm fit

The router was already compiled as a .kolm 8 months earlier — not for the exam, but because the bank's IT-security team had blocked the original vendor's cloud callback. As a side effect of compilation, every production inference had been emitting an HMAC-signed receipt with the artifact CID, input hash, output hash, and K-score.

When the SR 11-7 evidence ask landed, the MRM team did not need to instrument anything. The receipts were sitting in the audit-log warehouse already. The exam preparation was about formatting the data the examiner expected, not collecting it.

What the examiner asked for, and what we showed

1. Conceptual soundness

Recipe.yaml + the labeled training set, both pinned to a CID in the artifact's provenance block. The examiner could trace every training row back to a specific claim-system snapshot date.

2. Performance monitoring

K-score per-month, computed from the production receipts themselves. The receipts carry K, so a re-aggregation over the prior 14 months took a single query.

SELECT date_trunc('month', ts) AS m,
       AVG(k_score) AS k_mean,
       COUNT(*) AS n
FROM   model_receipts
WHERE  artifact_cid = 'cidv1:sha256:9a3f4e1b…'
GROUP  BY 1 ORDER BY 1;

3. Replay

The examiner sampled 12,000 historic claims at random and asked the bank to re-route them through the current production stack and prove the outputs matched what was recorded. With the input_sha and output_sha in every receipt, the replay test was a verifier run: 12,000/12,000 passed.

4. Ongoing use

The audit log shows every override (human reroute), every gate fail (K below 0.92), and every fall-through to human triage. The MRM team's existing dashboards now pulled directly from receipts instead of vendor reports.

The exam outcome

The examiner accepted the receipt JSONL as system-of-record. We did not have to commission an independent validation engagement on the model itself — the receipts proved both the model behavior and the production conformity. The exam closed with no Matters Requiring Attention. — SVP Model Risk Management (anonymized)

Total wall time from "examiner request received" to "evidence package delivered": 9 days. No additional external counsel, no validation vendor.

What we did not solve

Where to look next

The finance compliance pack at /compliance-packs includes the SR 11-7 evidence schema referenced above. The full NIST AI RMF mapping is at /compliance/nist-ai-rmf — SR 11-7 and AI RMF overlap on roughly 70% of the same controls.