kolm recall · the data moat

Every modality. One index. Every compile, grounded.

Recall is the multimodal vector substrate beneath the compiler. Drop a folder, get back a per-tenant hybrid index, BM25 + vector + RRF + reranker, that every Distill call grounds itself in. The same index ships inside the artifact as index.sqlite-vec, so the compiled model knows your specifics offline, forever.

Ingest anything.

# ingest a folder, auto-detects modality per file $ kolm recall ingest ./my-data --namespace work ✔ tokenized 312 files (147 text, 88 image, 41 audio, 26 pdf, 10 video) ✔ embedded 4,802 chunks ✔ indexed in namespace "work" # query the index directly $ kolm recall query "the migration we ran last quarter" --namespace work [{ "source": "./my-data/notes/2026-01-migration.md", "chunk_id": "c_4f2a", "score": 0.847, "text": "On Jan 14 we cut over to the new auth schema..." }, ...] # compile a Specialist that grounds every call in this corpus $ kolm compile "answer questions about our infra" --data ./my-data

Multimodal sidecars.

Every non-text file gets a Markdown sidecar, a structured tokenization that turns pixels, frames, and waveforms into something both humans and models can read. The sidecar is what gets embedded; the original is what gets returned. Inspired by Tobi Lutke's qmd pattern, extended to image / audio / video / PDF.

.txt .md .ts .py

text + code

chunked by structure, embedded with bge-m3.

.png .jpg .heic

image

CLIP embed + caption sidecar via vision model.

.mp3 .wav .m4a

audio

Whisper transcript + CLAP embed of timbre.

.mp4 .mov

video

scene-detected → per-shot caption + ASR + CLIP.

.pdf

pdf

unstructured layout parse + per-page sidecar.

auto

everything else

magika sniffs, falls back to bytes-as-text.

Hybrid retrieval, reranked.

query

→

BM25 (top 50)

vector (top 50)

→

RRF fusion

→

cross-encoder rerank

→

top-k

Lexical recall catches names and IDs. Vector recall catches paraphrase. RRF fuses without tuning a weight. A small cross-encoder reorders the top of the list. The whole pipeline is what grounds every Distill k-sample and every /v1/wrap/verified call when a corpus_namespace is set.

Three back-ends, one interface.

# managed (default, Pinecone) KOLMOGOROV_INDEX_BACKEND=pinecone # self-host (Trieve docker) KOLMOGOROV_INDEX_BACKEND=trieve KOLM_TRIEVE_URL=http://localhost:8090 # on-device (sqlite-vec, what ships INSIDE every .kolm) KOLMOGOROV_INDEX_BACKEND=sqlite-vec

Same SDK, same CLI, same wire format. Pick the back-end at the boundary; the rest of the stack doesn't notice. On-device is the default inside artifacts: every .kolm ships index.sqlite-vec so recall keeps working when the network is gone.

# index size on disk, measured 10,000 docs ~38 MB # bge-m3 dim 1024, INT8 quantized 100,000 docs ~360 MB # still ships inside a .kolm 1,000,000 docs ~3.4 GB # tier up: managed back-end