What this recipe does
Most recall systems have one of two failure modes: search every namespace (slow, expensive, leaks) or pick wrong (miss). This recipe is the third option: a tiny model that's right about which namespace to hit ~95% of the time, with a confidence score that defaults to "search them all" below 0.7. So the slow fallback is rare, the right answer is fast, and you never accidentally search a namespace your tenant shouldn't see.
The verifier locks the namespace output to the closed namespace list and rejects any prediction outside the tenant's authorized set — a cross-tenant leak is impossible by construction.
The spec
{
"output_kind": "json",
"schema": {
"required": ["namespace", "confidence", "fallback"],
"properties": {
"namespace": { "$ref": "namespaces.json" },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"fallback": { "enum": ["narrow", "broad"] }
}
},
"verifier": {
"namespace_must_be_in_tenant_authorized_list": true,
"low_confidence_fallback_must_be_broad": true,
"calibration_target_brier_score": 0.10
}
}
Gold pair (1 of 4,000 shown)
"how do I rotate the API key for the staging instance?"
{
"namespace": "runbooks",
"confidence": 0.94,
"fallback": "narrow"
}
Compile
kolm compile "recall query namespace router" \ --base qwen2.5-coder-3b \ --pairs ./query-namespace-pairs.jsonl \ --namespaces ./namespaces.json \ --verifier closed-vocab,calibrated-brier=0.10 \ --k-floor 0.85 \ --output namespace-tagger.kolm ok wrote namespace-tagger.kolm k_score=0.91 signature=hmac-sha256
K-score gate
Run-time profile
The 75ms RTX number is the load-bearing one — this recipe sits in the recall hot path on kolm serve. Per-request latency budget for routing is 100ms; we land at 75ms in the steady state, with the slow fallback ("search all") triggered ~5% of the time.
Deploy
# wired into kolm serve recall pipeline: serve.before_recall = (query, tenant) => { const r = kolm.run('namespace-tagger.kolm', query); if (r.confidence > 0.7) return { namespace: r.namespace, mode: 'narrow' }; return { namespaces: tenant.authorized_namespaces, mode: 'broad' }; };