Models & devices . v0.1

Compile to your hardware.

Every .kolm artifact is bound to a base model and a target device. The defaults pick themselves from what kolm detects on the box. Override either at the CLI; both ride the receipt chain so verification is reproducible.

Defaults by device

What kolm compile picks when you don't pass --base-model or --target-device. kolm gpu detect returns the device id; the picker walks the table below.

DeviceClassVRAMDefault trainDefault infer
rtx-5090training32 GBQwen2.5-7B-InstructQwen2.5-7B-Instruct
rtx-4090training24 GBQwen2.5-3B-InstructQwen2.5-3B-Instruct
rtx-3090training24 GBQwen2.5-3B-InstructQwen2.5-3B-Instruct
a100-40gbtraining40 GBQwen2.5-7B-InstructQwen2.5-7B-Instruct
a100-80gbtraining80 GBQwen2.5-14B-InstructQwen2.5-7B-Instruct
h100-80gbtraining80 GBQwen2.5-14B-InstructQwen2.5-7B-Instruct
h200-141gbtraining141 GBQwen2.5-14B-InstructQwen2.5-7B-Instruct
apple-m3-maxtraining64 GBQwen2.5-3B-InstructQwen2.5-3B-Instruct
apple-m2-proinference16 GBn/aQwen2.5-3B-Instruct (MLX)
iphone-15-proinference4 GBn/aQwen2.5-1.5B-Instruct (4-bit)
pixel-8-proinference3 GBn/agemma-3-1b-it (4-bit)
laptop-igpuinference2 GBn/aQwen2.5-1.5B-Instruct
cpu-x86_64inferencen/aSmolLM2-1.7B-InstructQwen2.5-0.5B-Instruct
wasminferencen/an/aQwen2.5-0.5B-Instruct

The default pick: Qwen 2.5 3B Instruct

When no device is detected and no flag is passed, kolm compile resolves base_model to Qwen/Qwen2.5-3B-Instruct. Why:

Full base-model registry

16 models. kolm models list prints the same shape at the terminal.

ModelLicenseParamsCtxTool useMultilingualUse
Qwen/Qwen2.5-0.5B-InstructApache 2.00.50 B32 Kyes29 langsedge / wasm
Qwen/Qwen2.5-1.5B-InstructApache 2.01.54 B32 Kyes29 langsmobile
Qwen/Qwen2.5-3B-InstructdefaultApache 2.03.09 B32 K / 128 K YaRNyes29 langslaptop / 4090
Qwen/Qwen2.5-7B-InstructApache 2.07.62 B128 Kyes29 langs5090 / A100
Qwen/Qwen2.5-Coder-7B-InstructApache 2.07.62 B128 Kyescode-firstcode distill
Qwen/Qwen2.5-14B-InstructApache 2.014.7 B128 Kyes29 langsA100 80 / H100
meta-llama/Llama-3.2-1B-InstructLlama 3.2 Community1.24 B128 KyesEnglish-firstmobile alternate
meta-llama/Llama-3.2-3B-InstructLlama 3.2 Community3.21 B128 KyesEnglish-first3B alternate
meta-llama/Llama-3.1-8B-InstructLlama 3.1 Community8.03 B128 KyesEnglish-first8B alternate
microsoft/Phi-3.5-mini-instructMIT3.82 B128 Knomultilingualreasoning-first
google/gemma-3-1b-itGemma ToU1.00 B32 Kpartial140 langsmobile (Pixel)
google/gemma-3-4b-itGemma ToU4.30 B128 Kpartial140 langsvision target
google/gemma-3-12b-itGemma ToU12.2 B128 Kpartial140 langsvision alternate
google/gemma-2-2b-itGemma ToU2.61 B8 KnoEnglish-firsttiny alt
mistralai/Ministral-3B-Instruct-2410MRL (Mistral Research)3.00 B128 Kyesmultilingual3B alternate
HuggingFaceTB/SmolLM2-1.7B-InstructApache 2.01.71 B8 KnoEnglish-firstCPU fallback

License posture. kolm defaults to Apache 2.0 because that's the license least likely to require legal review in regulated industries. Llama and Gemma stay in the registry as opt-in; pin them with kolm models pin meta-llama/Llama-3.2-3B-Instruct if your buyer accepts the terms.

Device registry

14 device profiles. kolm gpu detect picks a row by parsing nvidia-smi output (NVIDIA), Metal device id (Apple), or falling back to cpu-x86_64 / wasm.

DeviceArchVRAMAttentionMin CUDAMin torchFP4 / FP8 / BF16
rtx-5090Blackwell sm_12032 GBfa312.82.7yes / yes / yes
rtx-4090Ada sm_8924 GBfa212.12.4no / yes / yes
rtx-3090Ampere sm_8624 GBfa211.82.2no / no / yes
a100-40gbAmpere sm_8040 GBfa211.82.2no / no / yes
a100-80gbAmpere sm_8080 GBfa211.82.2no / no / yes
h100-80gbHopper sm_9080 GBfa312.42.4no / yes / yes
h200-141gbHopper sm_90141 GBfa312.42.4no / yes / yes
apple-m3-maxApple Silicon64 GBmlxn/an/ano / no / yes
apple-m2-proApple Silicon16 GBmlxn/an/ano / no / yes
iphone-15-proA17 Pro4 GBcoremln/an/ano / no / partial
pixel-8-proTensor G33 GBmediapipen/an/ano / no / partial
laptop-igpuIntel Arc / Iris2 GBdirectmln/an/ano / no / partial
cpu-x86_64any x86_64n/asdpan/an/ano / no / no
wasmwasm32n/asdpan/an/ano / no / no

Device-fit contract

Every .kolm artifact carries target_device and train_device in the manifest. Before the runtime loads an adapter it calls verifyDeviceFit(manifest, hostDeviceId) and reads:

Compile targetHost deviceResultBehavior
rtx-5090rtx-5090ok:trueload and run
iphone-15-prortx-5090ok:true, soft:trueload with warning (cross-class)
null (no target pinned)rtx-5090ok:true, soft:trueload with warning (untargeted)
rtx-5090iphone-15-prook:falserefuse (4 GB host can't hold 32 GB compile)

The runtime never lies about a mismatch: it either loads cleanly, loads with a structured warning, or refuses. Smoke at scripts/smoke-device-bind.mjs proves the four cases pass.

CLI

Three verbs cover the surface: kolm models for the catalog, kolm gpu for the box, kolm compile for the binding.

$ kolm gpu detect
rtx-5090 . Blackwell sm_120 . 32 GB . cuda 12.8 . torch 2.7+

$ kolm models recommend --target-device rtx-5090
Qwen/Qwen2.5-7B-Instruct       (apache-2.0, 7.6B, 128K ctx, tool-use)
Qwen/Qwen2.5-3B-Instruct       (apache-2.0, 3.1B, 32K ctx, tool-use)
meta-llama/Llama-3.1-8B-Instruct (llama-community, 8B, 128K ctx)

$ kolm models pin Qwen/Qwen2.5-7B-Instruct
pinned base model: Qwen/Qwen2.5-7B-Instruct

$ kolm compile --task "classify support tickets" --target-device rtx-5090
. resolves base model: Qwen/Qwen2.5-7B-Instruct
. attention: fa3 . optimizer: paged_adamw_8bit . liger: on
. compiling: 100% K=0.917 . signing receipt: HMAC-SHA256 . done.

Full verb tables at /docs. The decision matrix that picks defaults is at /spec under "device-fit".

Why we will revisit

Honesty notes

Two things we are not claiming.

We did not train these base models. The base-model field selects a foundation; .kolm is the LoRA adapter on top, plus the receipt chain. The base remains under its upstream license; you remain the licensee.

Benchmark numbers in the picker come from public scorecards. MMLU / GSM8K / MATH / HumanEval / IFEval scores quoted in this page and in the Qwen 2.5 tech report are reproductions kolm has not independently rerun. The /leaderboard page tracks reproductions we have run; everything else is cited.