kolm vs a RAG stack.
A generic RAG stack (vector DB + frontier API) is fast to demo and slow to audit. The matrix below is what changes when each retrieval carries an HMAC receipt over (source_uri, content_sha, retrieved_at) and the decoder is constrained to refuse if the answer isn't grounded.
Ten axes. Reviewed 2026-05-15.
| Axis | kolm | RAG stack | Why it matters | Proof |
|---|---|---|---|---|
| Citation drift | HMAC receipt per retrieval, < 0.5% | ~14% measured | When the source moves or changes, generic RAG carries on; kolm refuses or replays the bundle as it was. | enterprise-search → |
| Hallucinated grounding | near 0 via constrained decoder | 3–8% measured | A confident citation that does not appear in the retrieved bundle is the bug regulators specifically test for. | constrained decoding → |
| Offline replay | months later, byte-identical | no | An audit that asks "what was retrieved on 2025-09-04" is unanswerable without receipts. | anatomy → |
| Cost | flat compile + cached embeddings | per-query LLM + vector DB | RAG cost scales with users, not knowledge size. The line crosses quickly. | /roi → |
| Latency | 0.6 ms local retrieval + 12 ms decode | 200–500 ms network + LLM | Helpdesk and field engineering UIs feel different at 12 ms vs 400 ms. | /benchmarks → |
| Audit readable | receipt chain in JSON | spotty logs | An examiner needs a primitive that survives a 3-year audit window. | receipt JSON → |
| Privacy | on-prem, no third party | vector DB vendor + LLM vendor | Each external dependency is a perimeter to defend. | /airgap → |
| Determinism | seeded, reproducible | no | Reproducibility is the floor for any regulated workflow. | RS-1 → |
| Update cycle | re-compile, new CID | re-index in place | CIDs let you pin a known-good bundle and roll back without re-indexing. | K-score → |
| Refusal token | built-in, returns "I can't answer from this bundle" | prompt engineering, brittle | A constrained decoder refusal is enforced by the model architecture, not by hoping the prompt holds. | constrained decoding → |
When a generic RAG stack is the right answer.
You are building a consumer chatbot, the corpus is public knowledge, citation accuracy is nice-to-have, and you want to ship in a week. The vector-DB-plus-frontier-API pattern is the path of least resistance for that shape.
When kolm is the right answer.
The retrieval is over a regulated corpus (legal, medical, financial), the citation has to survive an audit, or the cost line is structural. The receipt chain is what every downstream procurement step converges on.