kolm / tutorials / contract review
Compile a contract clause extractor in 20 minutes.
By the end of this walkthrough you have a 5.4 KB .kolm that reads a Master Services Agreement and returns four named clauses: arbitration, IP assignment, payment terms, non-compete. Each clause comes back with a character-span pointing back at the source PDF. The receipt is HMAC-signed. The model runs offline.
Step 1 . 60 seconds
Install and authenticate.
$ npm install -g kolm
$ kolm login
$ kolm version
k o l m
------- the private AI compiler
kolm cli v0.1.0
spec rs-1
Step 2 . 3 minutes
Write the recipe.
Four clause types, each with a span pointer. The constrained decoder enforces the schema so the output is always machine-readable. min_k is the floor; the compile fails if the F1 is below it.
$ cat > contract.recipe.json <<'JSON'
{
"task": "extract named clauses from a Master Services Agreement",
"base": "mistral-7b-instruct",
"objective": "clause-f1",
"adapter": "lora + dpo",
"context_window": 32768,
"output_schema": {
"clauses": [
{ "type": "arbitration", "start": "int", "end": "int", "text": "string" },
{ "type": "ip_assignment", "start": "int", "end": "int", "text": "string" },
{ "type": "payment_terms", "start": "int", "end": "int", "text": "string" },
{ "type": "non_compete", "start": "int", "end": "int", "text": "string" }
]
},
"target_k": 0.94,
"min_k": 0.91,
"eval_pack": "msa-clause-extract-v1"
}
JSON
msa-clause-extract-v1 ships with a held-out test set of 240 MSAs. You do not have to build it.Step 3 . 2 minutes
Add 30 labeled examples.
The eval pack covers the test set. You still want a small in-domain training-side set so the LoRA fits your contract style. Pull 30 of your own MSAs and tag the four clauses with character spans.
$ head -1 contract.examples.jsonl | jq .
{
"input": "... MASTER SERVICES AGREEMENT ... 12. ARBITRATION. Any dispute arising under this Agreement shall be resolved by binding arbitration administered by JAMS ...",
"expected": {
"clauses": [
{ "type": "arbitration", "start": 1842, "end": 2104,
"text": "Any dispute arising under this Agreement shall be resolved by binding arbitration administered by JAMS ..." }
]
}
}
If you already have a contract-extraction labeled set in CUAD or LexGLUE format, kolm reads both directly.
Step 4 . 12 minutes
Compile.
$ kolm compile --from contract.recipe.json --examples contract.examples.jsonl --out contract.kolm [1/6] synthesizing pairs (Magpie + LegalBench seed) .. 2,420 pairs 1m 12s [2/6] dedup + filter (MinHash, legal-language) ........ 2,108 pairs 14s [3/6] LoRA + DPO (long context) ....................... 3 epochs 8m 44s [4/6] constrained-decoder fit (clause-span schema) .... 52s [5/6] K-score gate (msa-clause-extract-v1) ............ K = 0.927 > 0.91 floor [6/6] sign + package .................................. 3s artifact: ./contract.kolm (5.4 KB) receipt: ./contract.receipt.json (3.2 KB) CID: cidv1:sha256:74a2c3...
Step 5 . 1 minute
Run it on a real MSA.
$ kolm run contract.kolm --file ./acme-msa.pdf
{
"clauses": [
{
"type": "arbitration",
"start": 18420, "end": 18762,
"text": "Any dispute, claim or controversy arising out of or relating to this Agreement ..."
},
{
"type": "ip_assignment",
"start": 22104, "end": 22418,
"text": "All inventions, discoveries, developments and improvements ..."
},
{
"type": "payment_terms",
"start": 8240, "end": 8512,
"text": "Customer shall pay all Fees within thirty (30) days of receipt of invoice ..."
},
{
"type": "non_compete",
"start": 26120, "end": 26450,
"text": "Provider agrees that, for a period of twelve (12) months ..."
}
],
"latency_ms": 1840.2,
"receipt_cid": "cidv1:sha256:74a2c3..."
}
Step 6 . optional
Verify and bind.
The verifier confirms the manifest, the receipt HMAC, and the K-score gate. The binder PDF is the one-pager outside-counsel will sign.
$ kolm verify contract.kolm --binder contract.binder.pdf ✓ manifest CID matches canonical hash ✓ all 11 entries hashed and verified ✓ receipt HMAC valid ✓ K-score 0.927 (above declared gate 0.91) artifact is valid. # contract.binder.pdf lists: # CID, base, recipe, compliance, K-score # auditors sign the PDF, not the JSON