Models & devices . v0.1

Compile to your hardware.

Every .kolm artifact is bound to a base model and a target device. The defaults pick themselves from what kolm detects on the box. Override either at the CLI; both ride the receipt chain so verification is reproducible.

Defaults by device

What kolm compile picks when you don't pass --base-model or --target-device. kolm gpu detect returns the device id; the picker walks the table below.

Device	Class	VRAM	Default train	Default infer
`rtx-5090`	training	32 GB	`Qwen2.5-7B-Instruct`	`Qwen2.5-7B-Instruct`
`rtx-4090`	training	24 GB	`Qwen2.5-3B-Instruct`	`Qwen2.5-3B-Instruct`
`rtx-3090`	training	24 GB	`Qwen2.5-3B-Instruct`	`Qwen2.5-3B-Instruct`
`a100-40gb`	training	40 GB	`Qwen2.5-7B-Instruct`	`Qwen2.5-7B-Instruct`
`a100-80gb`	training	80 GB	`Qwen2.5-14B-Instruct`	`Qwen2.5-7B-Instruct`
`h100-80gb`	training	80 GB	`Qwen2.5-14B-Instruct`	`Qwen2.5-7B-Instruct`
`h200-141gb`	training	141 GB	`Qwen2.5-14B-Instruct`	`Qwen2.5-7B-Instruct`
`apple-m3-max`	training	64 GB	`Qwen2.5-3B-Instruct`	`Qwen2.5-3B-Instruct`
`apple-m2-pro`	inference	16 GB	n/a	`Qwen2.5-3B-Instruct` (MLX)
`iphone-15-pro`	inference	4 GB	n/a	`Qwen2.5-1.5B-Instruct` (4-bit)
`pixel-8-pro`	inference	3 GB	n/a	`gemma-3-1b-it` (4-bit)
`laptop-igpu`	inference	2 GB	n/a	`Qwen2.5-1.5B-Instruct`
`cpu-x86_64`	inference	n/a	`SmolLM2-1.7B-Instruct`	`Qwen2.5-0.5B-Instruct`
`wasm`	inference	n/a	n/a	`Qwen2.5-0.5B-Instruct`

The default pick: Qwen 2.5 3B Instruct

When no device is detected and no flag is passed, kolm compile resolves base_model to Qwen/Qwen2.5-3B-Instruct. Why:

Apache 2.0. Commercial-redistributable, no MAU clause, no acceptable-use policy that has to be re-read per buyer.
Native tool use. Distillation targets that emit JSON tool-calls work zero-shot.
32K context native, 128K with YaRN. Long enough for clinical notes, contracts, long support threads.
29 languages. Healthcare, finance, legal callers frequently need non-English.
3.09B params. Fits a single 24 GB consumer GPU at bf16 + LoRA r=16 with room for optimizer state. Drops to under 2 GB on disk for the LoRA adapter.
Beats Llama-3.2-3B on MMLU, GSM8K, MATH, HumanEval, IFEval per the Qwen 2.5 tech report and the public Hugging Face Open LLM Leaderboard.

Full base-model registry

16 models. kolm models list prints the same shape at the terminal.

Model	License	Params	Ctx	Tool use	Multilingual	Use
`Qwen/Qwen2.5-0.5B-Instruct`	Apache 2.0	0.50 B	32 K	yes	29 langs	edge / wasm
`Qwen/Qwen2.5-1.5B-Instruct`	Apache 2.0	1.54 B	32 K	yes	29 langs	mobile
`Qwen/Qwen2.5-3B-Instruct` ← default	Apache 2.0	3.09 B	32 K / 128 K YaRN	yes	29 langs	laptop / 4090
`Qwen/Qwen2.5-7B-Instruct`	Apache 2.0	7.62 B	128 K	yes	29 langs	5090 / A100
`Qwen/Qwen2.5-Coder-7B-Instruct`	Apache 2.0	7.62 B	128 K	yes	code-first	code distill
`Qwen/Qwen2.5-14B-Instruct`	Apache 2.0	14.7 B	128 K	yes	29 langs	A100 80 / H100
`meta-llama/Llama-3.2-1B-Instruct`	Llama 3.2 Community	1.24 B	128 K	yes	English-first	mobile alternate
`meta-llama/Llama-3.2-3B-Instruct`	Llama 3.2 Community	3.21 B	128 K	yes	English-first	3B alternate
`meta-llama/Llama-3.1-8B-Instruct`	Llama 3.1 Community	8.03 B	128 K	yes	English-first	8B alternate
`microsoft/Phi-3.5-mini-instruct`	MIT	3.82 B	128 K	no	multilingual	reasoning-first
`google/gemma-3-1b-it`	Gemma ToU	1.00 B	32 K	partial	140 langs	mobile (Pixel)
`google/gemma-3-4b-it`	Gemma ToU	4.30 B	128 K	partial	140 langs	vision target
`google/gemma-3-12b-it`	Gemma ToU	12.2 B	128 K	partial	140 langs	vision alternate
`google/gemma-2-2b-it`	Gemma ToU	2.61 B	8 K	no	English-first	tiny alt
`mistralai/Ministral-3B-Instruct-2410`	MRL (Mistral Research)	3.00 B	128 K	yes	multilingual	3B alternate
`HuggingFaceTB/SmolLM2-1.7B-Instruct`	Apache 2.0	1.71 B	8 K	no	English-first	CPU fallback

License posture. kolm defaults to Apache 2.0 because that's the license least likely to require legal review in regulated industries. Llama and Gemma stay in the registry as opt-in; pin them with kolm models pin meta-llama/Llama-3.2-3B-Instruct if your buyer accepts the terms.

Device registry

14 device profiles. kolm gpu detect picks a row by parsing nvidia-smi output (NVIDIA), Metal device id (Apple), or falling back to cpu-x86_64 / wasm.

Device	Arch	VRAM	Attention	Min CUDA	Min torch	FP4 / FP8 / BF16
`rtx-5090`	Blackwell sm_120	32 GB	`fa3`	12.8	2.7	yes / yes / yes
`rtx-4090`	Ada sm_89	24 GB	`fa2`	12.1	2.4	no / yes / yes
`rtx-3090`	Ampere sm_86	24 GB	`fa2`	11.8	2.2	no / no / yes
`a100-40gb`	Ampere sm_80	40 GB	`fa2`	11.8	2.2	no / no / yes
`a100-80gb`	Ampere sm_80	80 GB	`fa2`	11.8	2.2	no / no / yes
`h100-80gb`	Hopper sm_90	80 GB	`fa3`	12.4	2.4	no / yes / yes
`h200-141gb`	Hopper sm_90	141 GB	`fa3`	12.4	2.4	no / yes / yes
`apple-m3-max`	Apple Silicon	64 GB	`mlx`	n/a	n/a	no / no / yes
`apple-m2-pro`	Apple Silicon	16 GB	`mlx`	n/a	n/a	no / no / yes
`iphone-15-pro`	A17 Pro	4 GB	`coreml`	n/a	n/a	no / no / partial
`pixel-8-pro`	Tensor G3	3 GB	`mediapipe`	n/a	n/a	no / no / partial
`laptop-igpu`	Intel Arc / Iris	2 GB	`directml`	n/a	n/a	no / no / partial
`cpu-x86_64`	any x86_64	n/a	`sdpa`	n/a	n/a	no / no / no
`wasm`	wasm32	n/a	`sdpa`	n/a	n/a	no / no / no

Device-fit contract

Every .kolm artifact carries target_device and train_device in the manifest. Before the runtime loads an adapter it calls verifyDeviceFit(manifest, hostDeviceId) and reads:

Compile target	Host device	Result	Behavior
`rtx-5090`	`rtx-5090`	`ok:true`	load and run
`iphone-15-pro`	`rtx-5090`	`ok:true, soft:true`	load with warning (cross-class)
null (no target pinned)	`rtx-5090`	`ok:true, soft:true`	load with warning (untargeted)
`rtx-5090`	`iphone-15-pro`	`ok:false`	refuse (4 GB host can't hold 32 GB compile)

The runtime never lies about a mismatch: it either loads cleanly, loads with a structured warning, or refuses. Smoke at scripts/smoke-device-bind.mjs proves the four cases pass.

CLI

Three verbs cover the surface: kolm models for the catalog, kolm gpu for the box, kolm compile for the binding.

$ kolm gpu detect
rtx-5090 . Blackwell sm_120 . 32 GB . cuda 12.8 . torch 2.7+

$ kolm models recommend --target-device rtx-5090
Qwen/Qwen2.5-7B-Instruct       (apache-2.0, 7.6B, 128K ctx, tool-use)
Qwen/Qwen2.5-3B-Instruct       (apache-2.0, 3.1B, 32K ctx, tool-use)
meta-llama/Llama-3.1-8B-Instruct (llama-community, 8B, 128K ctx)

$ kolm models pin Qwen/Qwen2.5-7B-Instruct
pinned base model: Qwen/Qwen2.5-7B-Instruct

$ kolm compile --task "classify support tickets" --target-device rtx-5090
. resolves base model: Qwen/Qwen2.5-7B-Instruct
. attention: fa3 . optimizer: paged_adamw_8bit . liger: on
. compiling: 100% K=0.917 . signing receipt: HMAC-SHA256 . done.

Full verb tables at /docs. The decision matrix that picks defaults is at /spec under "device-fit".

Why we will revisit

Qwen3. When Qwen3-3B-Instruct ships under Apache 2.0 with comparable tool-use, the default reranks. The bigger tokenizer is a code win and a tiny-model latency loss; we will measure both.
Llama 4. If Meta drops the MAU clause, Llama moves up in the picker.
Gemma 3 vision. Once kolm capture <image> ships, the mobile-inference default flips from text-only Qwen to Gemma 3 4B.
NVFP4 training. Torch 2.8 + cuBLASLt 12.9 lands native NVFP4 training on Blackwell. When that wheel hits the index, the 5090 trainer flips bf16 LoRA → fp4 LoRA at the same VRAM.

Honesty notes

Two things we are not claiming.

We did not train these base models. The base-model field selects a foundation; .kolm is the LoRA adapter on top, plus the receipt chain. The base remains under its upstream license; you remain the licensee.

Benchmark numbers in the picker come from public scorecards. MMLU / GSM8K / MATH / HumanEval / IFEval scores quoted in this page and in the Qwen 2.5 tech report are reproductions kolm has not independently rerun. The /leaderboard page tracks reproductions we have run; everything else is cited.