kolm / case studies / healthcare

A regional health plan moved PHI redaction off the cloud and into a compiled artifact.

4 million quarterly call-center transcripts. HIPAA Safe Harbor identifiers. Previously routed to a vendor API. Now redacted by a 7B-param .kolm running on the plan's existing on-prem boxes. The receipts are the audit trail.

K-score

0.982

Leak rate

0 / 50k

Egress

0 bytes

Cost

$1.8M→$0.04M/yr

The setup

A regional Medicare Advantage plan, ~340k members. Call-center workflow recorded calls, transcribed them with a vendor STT, then ran a separate vendor API to redact PHI before piping transcripts into an analytics warehouse. The redactor API cost ran about $1.8M annually. More importantly, every transcript transited a third party that the plan's Privacy Officer had to BAA, audit, and re-certify yearly.

The plan's data-science team was small (4 people). They did not want to train a redactor from scratch and they did not want a fine-tuning project. They wanted the existing vendor's behavior, on their hardware, with paperwork that survived an OCR examiner.

The compile

Source: ~12,000 labeled transcripts the plan already had in their Snowflake account, plus the public n2c2 de-identification dataset. Base model: Llama-3.1-8B-Instruct, INT8. The team used kolm's HIPAA compliance pack as the harness: 18 Safe Harbor identifier classes, constrained-decoder verifier, per-class leak counters.

$ kolm compile recipe.yaml \
    --base meta-llama/Llama-3.1-8B-Instruct \
    --pack hipaa-safe-harbor \
    --gate K=0.95 \
    --gate leak_rate=0
…
[7/7] OK signature attached
artifact: build/phi-redactor.kolm
CID:      cidv1:sha256:1bcf2323…
K-score:  0.982 (gate 0.95) PASS
Leak:     0 / 50,000 (gate 0) PASS

Compile time on a single A100: 4 hours 22 minutes. The artifact is 9.4 GB. It contains the base model digest, the LoRA adapter, the verifier schema, the eval results, and a HMAC-signed manifest.

The deployment

The plan's existing call-center boxes are 2×A10 GPUs (commodity, ~$0.35/hr equivalent). The .kolm runs there. Throughput: ~1,800 tokens/sec per box, which clears the daily transcript backlog by 10am. Cold-start: 38 seconds from process boot to first inference.

No new vendor onboarding. No new BAA. Privacy Officer's binder grew by one document — the manifest and a receipt sample — rather than another vendor row.

What the receipt records

{
  "artifact_cid":  "cidv1:sha256:1bcf2323…",
  "input_sha":     "sha256:8c2d1fa0…",
  "output_sha":    "sha256:4e1a90c3…",
  "k_score":       0.982,
  "leak_count":    0,
  "verifier_ok":   true,
  "ts":            "2026-05-09T14:18:32Z",
  "issuer_pubkey": "kolm-issuer-2026q2",
  "hmac":          "b7c41e87…"
}

The plan exports a JSONL of receipts nightly into the existing model-risk warehouse. The Privacy Officer can spot-check any transcript months later: the artifact CID resolves, the input hash matches what's in the warehouse, the output hash matches what the analytics layer received, the leak count is 0.

The audit result

The vendor API gave us a SOC 2 report and a BAA. The .kolm gives us the actual model, the actual numbers, and a receipt for every transcript. The compliance team prefers the receipt. — Director, Privacy & Compliance (anonymized)

Outcome at year-end audit: zero findings on PHI handling. The plan's external auditor (Big-4, name on request from carrier) accepted the manifest + receipt JSONL as in-scope evidence for HIPAA §164.514(b) Safe Harbor.

What we did not solve

The STT vendor is still in the loop. Transcription accuracy gates downstream redaction. We did not replace STT.
The plan still pays for a second .kolm to run on their claims-narrative workload. The redactor is a single artifact, not a platform contract.
Compile cost ran ~$220 (single A100 for 4.5 hr on a spot market). That number recurs every refresh, not once.

Reproducing this case

The artifact CID is public. The HIPAA compliance pack is at /compliance-packs. The recipe is in the docs HIPAA-onboarding section. Two hours from kolm init to a signed artifact you can curl.