kolm / learn / deploy to a phone
Deploy a .kolm to a phone.
Export the same artifact to iPhone (CoreML), Android (ONNX Mobile), or Apple Silicon (MLX). Forecast size, latency, and fit before shipping. Same K-score, smaller binary, fully offline on the device.
Golden Path 3 of 3 . v1.0 . assumes you already completed step 1
.mlpackage for iPhone, ONNX .onnx for Android, MLX bundle for M-series Macs. Predicted size and latency on your target device. Same task, same K-score, no cloud round-trip.
Targets supported
| Device class | Backend | Output | Typical size (3B int4) |
|---|---|---|---|
| iPhone 15 Pro | coreml | .mlpackage | 1.7 GB |
| iPhone 12 / 13 | coreml | .mlpackage | 1.7 GB |
| Pixel 8 / OnePlus 12 | onnx | .onnx | 1.7 GB |
| M-series Mac (M1+) | mlx | mlx bundle | 1.7 GB |
| Raspberry Pi 5 (8GB) | gguf | .gguf | 1.7 GB |
| Jetson Orin Nano | tensorrt | engine plan | 2.1 GB |
| Snapdragon X Elite | onnx | .onnx | 1.7 GB |
Full picker with predicted latency and memory headroom lives at /device-transfer.
Preview the fit (no toolchain).
30 secondsBefore installing CoreMLTools or ONNX Runtime, kolm export --preview tells you in JSON whether the artifact will fit on the device, the predicted tokens/sec, and the K-score loss from quantization. Pure JS lookup, no python.
kolm export my-redactor.kolm --preview --device iphone-15-pro --quant int4
{
"device": "iPhone 15 Pro (8GB)",
"quant": "int4",
"size_mb": 1741,
"estimated_latency_ms": 33.3,
"tok_per_s": 30,
"k_loss": -0.02,
"k_score_est": 0.910,
"fits": true,
"fit_verdict": "fit",
"backend": "coreml"
}
fits: true and a k_score_est ≥ 0.85 mean you can ship to that device. fit_verdict: tight means it will run but with little headroom; over means try a smaller quantization (int3) or a smaller base.Same probe against a Pixel 8:
kolm export my-redactor.kolm --preview --device pixel-8 --quant int4
{
"device": "Pixel 8 (12GB)",
"quant": "int4",
"size_mb": 1741,
"estimated_latency_ms": 41.2,
"tok_per_s": 24,
"k_score_est": 0.910,
"fits": true,
"backend": "onnx"
}
Run the export.
3 minutesThe actual export converts the LoRA-adapted base into the target binary format. CoreML uses Apple's coremltools; ONNX uses optimum; MLX uses mlx_lm; GGUF uses llama.cpp. kolm export shells out to whichever toolchain is on your PATH.
kolm export my-redactor.kolm --backend coreml --quant int4 --out ./exports
resolving base from manifest . llama-3.2-3b applying LoRA adapter . ok quantizing to int4 . ok running coremltools convert . ok verifying export against held-out cases . K=0.910 (-0.026 from fp16) wrote ./exports/my-redactor.mlpackage (1.7 GB)
For Android (ONNX Mobile):
kolm export my-redactor.kolm --backend onnx --quant int4 --opset 17 --out ./exports
For Apple Silicon Macs (MLX, useful for desktop apps):
kolm export my-redactor.kolm --backend mlx --quantize --out ./exports
K= printed at the end is real and reproducible. If it dropped below 0.85 you do not ship.Bundle and ship.
90 secondsiOS. Drag the .mlpackage into your Xcode target. Call from Swift:
import CoreML
let model = try MyRedactor(configuration: MLModelConfiguration())
let out = try model.prediction(input: MyRedactorInput(text: input))
print(out.redacted)
Android. Drop the .onnx into your assets/ folder. Call from Kotlin via ONNX Runtime Mobile:
val env = OrtEnvironment.getEnvironment()
val session = env.createSession(assets.open("my-redactor.onnx").readBytes())
val out = session.run(mapOf("text" to OnnxTensor.createTensor(env, input)))
print(out["redacted"]?.value)
macOS. MLX is python or Swift. For a quick desktop app:
python -m mlx_lm.generate --model ./exports/my-redactor.mlx --prompt "..."
What just happened
You took one .kolm and produced a device-native model in three commands. The K-score is reproducible; the size and latency match the preview. The same artifact, ejected, would let an external auditor trace every layer of the conversion. No remote inference, no API keys, no per-call cost.
Edge cases
- Toolchain missing. The export fails with a clear "
coremltoolsnot found; install withpip install coremltools" message.kolm doctor --exportdiagnoses your PATH. - Quantization loss exceeds gate. Drop to int3 only if your task tolerates the K-loss. For PII redaction the loss is tiny; for code generation it is not.
- Custom recipes that call out to web APIs. kolm refuses to export a recipe that requires network.
kolm inspect --netshows which recipes are air-gap-safe. - FedRAMP / classified environments. The export step itself runs offline;
kolm export --offline-onlyhard-fails if any sub-step would hit the network.