Recall namespace tagger recipe

What this recipe does

Most recall systems have one of two failure modes: search every namespace (slow, expensive, leaks) or pick wrong (miss). This recipe is the third option: a tiny model that's right about which namespace to hit ~95% of the time, with a confidence score that defaults to "search them all" below 0.7. So the slow fallback is rare, the right answer is fast, and you never accidentally search a namespace your tenant shouldn't see.

The verifier locks the namespace output to the closed namespace list and rejects any prediction outside the tenant's authorized set — a cross-tenant leak is impossible by construction.

The spec

{
  "output_kind": "json",
  "schema": {
    "required": ["namespace", "confidence", "fallback"],
    "properties": {
      "namespace": { "$ref": "namespaces.json" },
      "confidence": { "type": "number", "minimum": 0, "maximum": 1 },
      "fallback": { "enum": ["narrow", "broad"] }
    }
  },
  "verifier": {
    "namespace_must_be_in_tenant_authorized_list": true,
    "low_confidence_fallback_must_be_broad": true,
    "calibration_target_brier_score": 0.10
  }
}

Gold pair (1 of 4,000 shown)

input - query

"how do I rotate the API key for the staging instance?"

output

{
  "namespace": "runbooks",
  "confidence": 0.94,
  "fallback": "narrow"
}

Compile

kolm compile "recall query namespace router" \
  --base qwen2.5-coder-3b \
  --pairs ./query-namespace-pairs.jsonl \
  --namespaces ./namespaces.json \
  --verifier closed-vocab,calibrated-brier=0.10 \
  --k-floor 0.85 \
  --output namespace-tagger.kolm

ok wrote namespace-tagger.kolm
   k_score=0.91  signature=hmac-sha256

K-score gate

K-score 0.91 held-out 1,200 queries · namespace-correct 95% · Brier 0.087 (target 0.10) · cross-tenant leak rate 0% (verifier blocks unauthorized)

Run-time profile

M2 MacBook

280ms

RTX 5090

75ms

iPhone 15 Pro

820ms

CPU x86 (server)

1.1s

The 75ms RTX number is the load-bearing one — this recipe sits in the recall hot path on kolm serve. Per-request latency budget for routing is 100ms; we land at 75ms in the steady state, with the slow fallback ("search all") triggered ~5% of the time.

Deploy

# wired into kolm serve recall pipeline:
serve.before_recall = (query, tenant) => {
  const r = kolm.run('namespace-tagger.kolm', query);
  if (r.confidence > 0.7) return { namespace: r.namespace, mode: 'narrow' };
  return { namespaces: tenant.authorized_namespaces, mode: 'broad' };
};

Queries in, routed to the right namespace.