Compiler cache for intelligence
Each task compile should become a reusable artifact with a spec, eval pack, target profile, receipt, and provenance. The customer pays for the governed compile loop, not for a local function call.
The runtime layer is being commoditized by Apple, Google, Microsoft, Meta, and open source. Kolm wins only if it becomes the compiler cache for intelligence: task evidence in, portable artifacts out, every artifact backed by evals, receipts, and a clear runtime target.
A credible decabillion-dollar plan has to explain why developers will use kolm when native runtimes are free. The answer is not generic on-device inference. The answer is repeatable compilation, device-targeted evals, signed release evidence, and a registry that makes good artifacts easier to trust than hand-rolled model glue.
Each task compile should become a reusable artifact with a spec, eval pack, target profile, receipt, and provenance. The customer pays for the governed compile loop, not for a local function call.
Core ML, LiteRT, ONNX Runtime, ExecuTorch, llama.cpp, and MLC should be target backends. Kolm should optimize, verify, and package across them instead of pretending they do not exist.
The score matters only when it correlates with task performance, device fit, latency, size, and release policy. Publish the harness and make the score reproducible.
Regulated teams will pay for control evidence, retention policy, audit logs, BAAs where applicable, and reviewed claims. They will not pay for slogans.
The most valuable asset is a reviewed catalog of artifacts with evals, device profiles, receipts, and revocation history. That is the network-effect wedge.
Local personalization is compelling, but it must specify whether it is retrieval, adapter training, calibration, or another method. Battery, memory, and storage limits decide the truth.
Kolm should assume the best execution engines are free or bundled. The strategy is to own the cross-runtime compile, eval, registry, and governance workflow.
| Layer | Source signal | Threat to kolm | Winning response |
|---|---|---|---|
| Apple Core ML | Apple positions Core ML as optimized for on-device performance, model conversion, compression, Xcode reports, and Apple silicon execution. Source | Native iOS teams already get a deeply integrated path. | Make Core ML a first-class target and compare artifact size, latency, adapter support, and release evidence against native baselines. |
| Apple Foundation Models | Apple gives apps access to an on-device language model for text generation, structured output, and tool calling. Source | For simple Apple-only language features, the platform path may be enough. | Position kolm for cross-platform, task-specific artifacts, non-Apple targets, and audited compile history. |
| Google LiteRT | Google describes LiteRT as a high-performance on-device framework with multi-platform support, conversion, optimization, and hardware acceleration. Source | Android and edge teams can stay inside Google AI Edge tooling. | Target LiteRT output, then own the higher-level spec, eval pack, receipts, and registry metadata. |
| MediaPipe | MediaPipe Solutions provide ready-made cross-platform tasks, models, Model Maker customization, and browser-based evaluation tooling. Source | Common perception and LLM tasks may be solved before kolm enters the workflow. | Focus on custom business tasks, regulated release evidence, and artifacts that combine task examples, policies, and eval cases. |
| ONNX Runtime Mobile | Microsoft documents mobile deployment across iOS and Android, execution providers, binary-size controls, latency, power, and model-size measurement. Source | Framework-neutral mobile teams already have a mature route. | Use ONNX as a target and publish repeatable measurements rather than generic cross-platform claims. |
| ExecuTorch | PyTorch frames ExecuTorch as an end-to-end mobile and edge inference stack with portability, productivity, and hardware acceleration. Source | PyTorch-native teams will not leave familiar export and deploy flows without proof. | Import PyTorch tasks, target ExecuTorch where appropriate, and sell governed artifact promotion over raw deployment. |
| Local LLM open source | llama.cpp emphasizes minimal setup and strong performance across local and cloud hardware; MLC WebLLM uses WebGPU for in-browser local acceleration. llama.cpp MLC | Offline LLM inference alone is not a paid moat. | Sell small task artifacts, eval-backed specialization, release receipts, and local personalization governance. |
| Regulatory pressure | The EU AI Act applies progressively through 2027, and HHS frames the HIPAA Security Rule around administrative, physical, and technical safeguards for ePHI. EU timeline HHS | Blanket compliance claims become legal and sales risk. | Map every claim to a dated control, document owner, limitation, and customer-facing evidence artifact. |
These are the investor, buyer, and technical diligence asks that should shape the next build sprint. Each one converts a claim into a proof asset.
Publish which targets are supported now, which are planned, and which are intentionally out of scope. Include iOS, Android, browser target, laptop, server, and embedded classes with device names.
Run the same task through native Core ML, LiteRT, ONNX Runtime, ExecuTorch where applicable, and a kolm artifact. Report p50, p95, artifact size, binary impact, memory, energy proxy, and K-score.
Define whether personalization uses retrieval, calibration, adapter training, local examples, or another mechanism. Document storage, encryption, deletion, hardware limits, and failure behavior.
Show what is actually inside each artifact tier: recipe, evals, metadata, target binary, adapter, weights, and receipt. No investor should need to infer whether an artifact contains model-bearing payloads.
Prove that score movement predicts useful task outcomes across at least three workload families. Treat score drift as a release blocker.
Replace broad claims with an evidence map: BAA status, DPA, subprocessor list, retention policy, audit log, encryption controls, review date, and limitation notes.
Make the registry the product surface: curated artifacts, device profiles, K-score history, review status, revocation, provenance, and customer-private namespaces.
Pick one wedge for the next 90 days. Healthcare workflow apps, fintech mobile teams, and enterprise mobile teams each require different proof and sales language.
At least one reproducible artifact per named target, with hardware, OS, runtime, benchmark command, and fallback behavior.
Tenant storage, auth, key generation, failure logs, benchmark reproducibility, CI gates, and rollback have passing evidence.
Control owner, legal review, BAA/DPA posture, data lifecycle, audit log, subprocessor list, and limitation copy are linked from the sales page.
Org controls, private registry, receipt retention, access logs, SSO path, SLA language, and at least one credible pilot workflow exist.
Inventory every site claim and map it to code, benchmark, source, or redline. Remove or soften anything without proof.
Run three tasks across at least two real devices and one server target. Publish raw output and commands.
Seed 8 to 12 artifacts with eval packs, target profiles, score history, receipts, and source notes.
Create one vertical pilot with a narrow outcome, price, success metric, and evidence pack.