Recipe from observations recipe

What this recipe does

Every API call you proxy through /v1/capture/anthropic or /v1/capture/openai writes an observation. After you accumulate ≥1,000 observations clustered around an intent, this recipe turns that cluster into a deployable recipe: it picks the train/eval split, infers the JSON schema from the verified outputs, picks the right verifier hooks (closed-vocab if outputs are categorical, byte-grounded if extractive, etc.), and writes the K-floor based on how clean the cluster is.

The output is a complete recipe directory ready for kolm compile --recipe ./recipe-out. No copy-paste, no hand-tuning. The "rent vs buy" loop completes itself.

The spec

{
  "output_kind": "json",
  "schema": {
    "required": ["recipe_dir", "pairs_jsonl", "spec_json", "k_floor"],
    "properties": {
      "recipe_dir": { "type": "string" },
      "pairs_jsonl": { "type": "string" },
      "spec_json": { "$ref": "kolm-verifier-spec.schema.json" },
      "k_floor": { "type": "number", "minimum": 0.5, "maximum": 0.95 },
      "diagnostics": { "type": "object" }
    }
  },
  "verifier": {
    "output_must_compile_dry_run": true,
    "k_floor_must_match_cluster_purity": true,
    "pairs_count_minimum": 200
  }
}

Compile

kolm compile "recipe synthesizer from production observations" \
  --base qwen2.5-coder-7b \
  --pairs ./meta-pairs/recipe-from-traffic-pairs.jsonl \
  --verifier output-compiles-dry-run,k-floor-matches-purity \
  --k-floor 0.80 \
  --output recipe-from-observations.kolm

ok wrote recipe-from-observations.kolm
   k_score=0.84  signature=hmac-sha256

K-score gate

K-score 0.84 held-out 100 clusters · output-compiles-dry-run 96% · k-floor within 0.05 of cluster purity 91% · final-recipe-passes-K 78%

The "final-recipe-passes-K 78%" number is the load-bearing one: when a customer feeds this recipe a cluster of 1,000 observations and runs the resulting recipe through kolm compile, 78 times out of 100 the compiled artifact passes its own K-floor on first try.

Run-time profile

M2 MacBook

3.8s

/cluster

RTX 5090

920ms

/cluster

Mac Studio

2.1s

/cluster

CPU x86 (server)

5.4s

/cluster

Deploy

# the rent-vs-buy loop, fully closed:
const obs = await fetch('/v1/bridges/observations?namespace=email-classify');
const recipe = kolm.run('recipe-from-observations.kolm', obs);
fs.writeSync('recipe-out/spec.json', recipe.spec_json);
fs.writeSync('recipe-out/pairs.jsonl', recipe.pairs_jsonl);
exec('kolm compile --recipe ./recipe-out --output ./email-classify.kolm');

Three steps: capture (proxy via /v1/capture), distill (this recipe), compile (kolm compile). What goes in is your frontier-API spend; what comes out is a signed .kolm you can run offline forever.

Production traffic in, a recipe out.

What this recipe does

The spec

Compile

K-score gate

Run-time profile

Deploy

Related recipes