Runtime target matrix
Aim each artifact at the smallest runtime that can carry the behavior.
Kolm does not replace serving engines, GPU clouds, local runners, or device frameworks. It keeps the behavior evidence, fit constraints, deployment recipe, caveats, and receipts with the signed .kolm artifact so platform teams can choose the right target.
- Best fit
- vLLM · hosted GPU
- Targets
- 18 ranked
- Blocked
- llama.cpp · context
- Receipt
- sha256:a1f0…
Runtime-neutral
01 · Target families
The matrix separates runtime ownership from artifact proof.
Each target family gets a fit signal, not a blanket promise. Kolm emits target instructions and receipts, then records when a feature does not fit. Runtime engines keep owning execution, scheduling, acceleration, and hardware support.
Serving engine
vLLM, SGLang, TensorRT-LLM, NVIDIA Triton, Hugging Face TGI. Best when the workload needs high-throughput OpenAI-compatible serving, batching, tensor parallelism, observability, or GPU-tuned execution.
Hosted GPU
Baseten, Modal, Runpod, Replicate, Hugging Face Inference Endpoints. Best when teams want managed endpoints, autoscaling, job execution, hosted packaging, or a faster path from artifact to deployed endpoint.
Local runner
llama.cpp, Ollama, LM Studio, MLX, MLC LLM. Best for small, quantized, local, demo, support, or offline-adjacent workflows where the compiled behavior can fit local model and memory constraints.
Portable inference
ONNX Runtime, OpenVINO, TVM, Triton model repository. Best for model-format portability, CPU/GPU/accelerator paths, model repository packaging, and enterprise deployment controls.
Device edge
Core ML, LiteRT, ExecuTorch. Best when a target-specific model format, memory ceiling, energy budget, OS/runtime version, and privacy posture are explicitly declared.
Enterprise fleet
KServe, Ray Serve, BentoML, Kubernetes, BYOC, restricted networks. Best for platform teams that need policy-controlled rollout, liveness checks, audit logs, canaries, internal registries, and governance export.
02 · Readiness states
Every target is a state machine, not a logo list.
Runtime fit is inspectable by platform engineers before promotion. The same artifact can be ready for one target and blocked for another.
Fit candidate
Inputs are present, target class is plausible, and Kolm can generate a recipe, but required runtime files or external proof may still be missing.
Recipe generated
Artifact manifest, model hints, eval gates, environment requirements, launch command, health check, and rollback policy are assembled.
Receipt verified
The target package has hashes, verifier output, eval result, policy decision, promotion identity, and exportable evidence.
Blocked with reason
Kolm records why promotion is blocked: context length, unsupported tools, missing quantization, data boundary, package gate, or runtime mismatch.
03 · Device fit
Rank every device path by what the behavior actually needs.
Kolm scores the compiled behavior against each target's model format, memory ceiling, context length, and data boundary, then ranks the device paths it fits and names the reason for anything it blocks.
- Model size and quantization hints checked against the target ceiling
- Context length, tool shape, and data boundary enforced per target
- Blocked paths return a reason, not a silent failure
04 · Evidence packet
The target decision travels with the artifact.
A runtime target is only credible when the buyer can inspect the proof outside the UI: what was compiled, what was tested, why the target was selected, and what still blocks release.
- Artifact manifest · recipe, examples, evaluators, tokenizer metadata, policy scope, dependency hashes, and compatible target classes
- Eval gate · regression threshold, failure set, human review state, replay window, fallback rule, and release decision
- Target recipe · runtime family, launch shape, package files, required environment, health check, logging path, and incompatible features
- Verifier receipt · signature, manifest hash, build identity, promotion actor, timestamp, export destination, and governance packet reference
Compile once, then prove where it can safely run.
Start with one repeated API behavior, generate a signed artifact, and let the target matrix show which serving path is ready, blocked, or waiting on external proof.