kolm  /  tutorials  /  contract review

Compile a contract clause extractor in 20 minutes.

By the end of this walkthrough you have a 5.4 KB .kolm that reads a Master Services Agreement and returns four named clauses: arbitration, IP assignment, payment terms, non-compete. Each clause comes back with a character-span pointing back at the source PDF. The receipt is HMAC-signed. The model runs offline.

Runtime 20 min Output contract.kolm Base mistral-7b-instruct K-target 0.91 Size 5.4 KB

Step 1 . 60 seconds

Install and authenticate.

$ npm install -g kolm
$ kolm login
$ kolm version
  k o l m
  ------- the private AI compiler
kolm cli v0.1.0
spec rs-1

Step 2 . 3 minutes

Write the recipe.

Four clause types, each with a span pointer. The constrained decoder enforces the schema so the output is always machine-readable. min_k is the floor; the compile fails if the F1 is below it.

$ cat > contract.recipe.json <<'JSON'
{
  "task": "extract named clauses from a Master Services Agreement",
  "base": "mistral-7b-instruct",
  "objective": "clause-f1",
  "adapter": "lora + dpo",
  "context_window": 32768,
  "output_schema": {
    "clauses": [
      { "type": "arbitration",    "start": "int", "end": "int", "text": "string" },
      { "type": "ip_assignment",  "start": "int", "end": "int", "text": "string" },
      { "type": "payment_terms",  "start": "int", "end": "int", "text": "string" },
      { "type": "non_compete",    "start": "int", "end": "int", "text": "string" }
    ]
  },
  "target_k": 0.94,
  "min_k":    0.91,
  "eval_pack": "msa-clause-extract-v1"
}
JSON
CheckpointThe eval pack msa-clause-extract-v1 ships with a held-out test set of 240 MSAs. You do not have to build it.

Step 3 . 2 minutes

Add 30 labeled examples.

The eval pack covers the test set. You still want a small in-domain training-side set so the LoRA fits your contract style. Pull 30 of your own MSAs and tag the four clauses with character spans.

$ head -1 contract.examples.jsonl | jq .
{
  "input": "... MASTER SERVICES AGREEMENT ... 12. ARBITRATION. Any dispute arising under this Agreement shall be resolved by binding arbitration administered by JAMS ...",
  "expected": {
    "clauses": [
      { "type": "arbitration", "start": 1842, "end": 2104,
        "text": "Any dispute arising under this Agreement shall be resolved by binding arbitration administered by JAMS ..." }
    ]
  }
}

If you already have a contract-extraction labeled set in CUAD or LexGLUE format, kolm reads both directly.

Step 4 . 12 minutes

Compile.

$ kolm compile --from contract.recipe.json --examples contract.examples.jsonl --out contract.kolm

[1/6] synthesizing pairs (Magpie + LegalBench seed) .. 2,420 pairs 1m 12s
[2/6] dedup + filter (MinHash, legal-language) ........ 2,108 pairs 14s
[3/6] LoRA + DPO (long context) ....................... 3 epochs 8m 44s
[4/6] constrained-decoder fit (clause-span schema) .... 52s
[5/6] K-score gate (msa-clause-extract-v1) ............ K = 0.927 > 0.91 floor
[6/6] sign + package .................................. 3s

  artifact: ./contract.kolm  (5.4 KB)
  receipt:  ./contract.receipt.json  (3.2 KB)
  CID:      cidv1:sha256:74a2c3...
CheckpointCompile cost on our account: $1.84. The K=0.927 number is the held-out F1 the compile measured against the eval pack. Auditable.

Step 5 . 1 minute

Run it on a real MSA.

$ kolm run contract.kolm --file ./acme-msa.pdf

{
  "clauses": [
    {
      "type": "arbitration",
      "start": 18420, "end": 18762,
      "text": "Any dispute, claim or controversy arising out of or relating to this Agreement ..."
    },
    {
      "type": "ip_assignment",
      "start": 22104, "end": 22418,
      "text": "All inventions, discoveries, developments and improvements ..."
    },
    {
      "type": "payment_terms",
      "start": 8240, "end": 8512,
      "text": "Customer shall pay all Fees within thirty (30) days of receipt of invoice ..."
    },
    {
      "type": "non_compete",
      "start": 26120, "end": 26450,
      "text": "Provider agrees that, for a period of twelve (12) months ..."
    }
  ],
  "latency_ms": 1840.2,
  "receipt_cid": "cidv1:sha256:74a2c3..."
}

Step 6 . optional

Verify and bind.

The verifier confirms the manifest, the receipt HMAC, and the K-score gate. The binder PDF is the one-pager outside-counsel will sign.

$ kolm verify contract.kolm --binder contract.binder.pdf

  ✓ manifest CID matches canonical hash
  ✓ all 11 entries hashed and verified
  ✓ receipt HMAC valid
  ✓ K-score 0.927 (above declared gate 0.91)
  artifact is valid.

# contract.binder.pdf lists:
#   CID, base, recipe, compliance, K-score
# auditors sign the PDF, not the JSON