Public registry economics: who pays for hosting and verification

What the registry is
Cost structure
Why this split
Verification without uptime
Honest limits
Sustainability
Prior art: npm, pip, conda-forge

What the registry is.

The kolm registry is a free public catalog of compiled .kolm artifacts. Anyone with an API key can publish; anyone with the URL can browse. There is no listing fee, no review queue, no royalty on downloads. The catalog page for a registry entry shows the manifest, the K-score, the receipt body, the parent receipt (if a recompile), the content identifier (CID), and one or more storage URLs from which the artifact bytes can be retrieved.

What the registry is not is a CDN. The catalog page describes an artifact and tells you where the bytes live. The bytes themselves do not live on kolm.ai. They live on a storage URL the publisher provided when they registered the entry. That URL might be an S3 bucket, a Backblaze B2 path, a GitHub releases asset, an IPFS pin, a customer-controlled HTTPS path, or a mirror operated by a paid provider. The registry record carries one or more such URLs and a content identifier that lets a downloader verify the bytes match the metadata.

The split between catalog and storage is the whole architectural premise. A publisher who wants to publish does not have to host bytes on kolm.ai. A downloader who wants to fetch does not have to trust kolm.ai's availability. The catalog and the storage are decoupled, and the verification chain works without either.

Cost structure.

Three separable kinds of cost are present in a registry transaction. Each is paid by a different party.

Cost	What it covers	Who pays
Metadata storage	The catalog page, the manifest, the receipt body, the CID, the storage-URL index.	kolm.ai (subsidized by paid tiers)
Artifact bytes storage	The 3.8 GB (median) file the manifest refers to.	The publisher (their bucket) or a paid mirror
Download egress	Network bandwidth from storage to the downloader.	Whoever the storage URL points at
Verification work	HMAC chain check, CID match, optional TEE attestation parse.	The downloader (CPU on their box, no network call after fetch)

The catalog row, the manifest, and the receipt body together are usually under 32 KB. That is two orders of magnitude smaller than the artifact bytes. The economics of paying for the small thing centrally and pushing the big thing to the edge is the same economics that made the original APT/Debian mirror network work. We did not invent it; we are reusing it.

Artifact bytes storage can be free or paid depending on the publisher's choice. A small artifact hosted on a personal GitHub release costs the publisher nothing in additional fees. A 3.8 GB artifact pulled by thousands of downloaders a month from a personal S3 bucket can be expensive; the publisher might point the registry record at a paid mirror (Cloudflare R2, Backblaze B2, IPFS pinning service) and pay the marginal storage cost there. The registry record is agnostic; it carries one or more URLs and the downloader picks the first one that resolves.

Why this split.

The split between metadata and bytes is deliberate. It avoids three categories of failure that have plagued centralized package registries.

Failure category one: the left-pad pattern. In March 2016, an npm package author removed a single 11-line package from the npm registry over a naming dispute. The unpublish broke thousands of downstream builds across the JavaScript ecosystem. The root cause was that npm at the time held both the only canonical mapping from name to package and the only canonical copy of the bytes. Pulling the bytes from one place pulled them from everywhere. The Register's report and the subsequent post-mortems documented the cascade. The architectural fix was unpublish-prohibition rules and, eventually, the npm package mirror network.

The kolm split is structurally different from where npm ended up. The catalog mirrors metadata; the bytes live wherever the publisher put them. A disgruntled publisher can delete their own catalog entry on kolm.ai (we honor delete requests on the publisher's own entries) and that removes the catalog row. The bytes, if they are hosted elsewhere, remain reachable to anyone who has the storage URL or the CID. A third party who depends on the artifact can re-publish the catalog entry against the same CID under a new name and the chain is preserved.

Failure category two: the CDN-as-single-point-of-failure. A registry that holds the only copy of the bytes is a registry whose uptime is the ecosystem's uptime. PyPI has had partial outages (2018, 2020); npm has had partial outages (2018, 2022); Docker Hub has had partial outages (2020, 2023). Every such outage is an ecosystem outage. The kolm registry's uptime is the catalog's uptime, which is much cheaper to keep up because the data is small.

Failure category three: the long-tail orphan problem. Old packages whose maintainers move on, get bored, or die accumulate in any registry. The maintenance cost compounds: bandwidth, security review, takedown notices. A centralized registry that holds the bytes for every artifact ever published is on the hook for every one of those bills forever. The kolm registry is not. The catalog row persists; the bytes are wherever the publisher put them; if the publisher's storage goes away, the catalog row becomes a dangling reference and is flagged accordingly. The registry's marginal cost per orphan is one small database row, not 3.8 GB of cold storage.

The cheapest registry is the one that stores the smallest thing. Bytes belong on storage. Names and pointers belong in a catalog. The hard work is in keeping them honest about each other.

Verification without uptime.

The honest reason this works is that verification does not require kolm.ai to be reachable at the moment of download. The receipt chain plus the CID plus the optional TEE attestation are self-contained inside the .kolm file. A downloader who already has the bytes can verify them offline.

Four checks are performed at verify time.

Manifest hash match. The downloader hashes each payload slot (base model pointer, LoRA adapter, recall index, eval pack, recipe pack) and confirms the SHA-256 matches the manifest's declared hashes.
Receipt body HMAC. The receipt body is canonical-JSON over a fixed set of fields (task, K-score, eval pack hash, manifest hash, signer identity, timestamp, parent receipt). The HMAC over that body is verified against the receipt secret or the public verification key.
Content identifier. The CID is cidv1:sha256:<64-hex> computed over the canonical-JSON of the manifest's hashes block. The downloader recomputes the CID; if it does not match the catalog entry, the file in hand is not the file the catalog described.
TEE attestation (optional). If the artifact carries an attestation document (AWS Nitro, AMD SEV-SNP, Intel TDX, GCP/Azure CVM), the downloader can verify the attestation against the vendor's published root and bind it to the manifest hash.

None of these checks requires a network call to kolm.ai. The CID can be looked up against kolm.ai's /v1/cid/<cid> endpoint if the downloader wants to confirm the catalog still references the same CID, but the local verification works without the lookup. An air-gapped deployment that copies .kolm files in over a thumb drive can verify them without any external service.

$ kolm verify ./ticket-router.kolm
[verify]     manifest hashes:       ok (5/5 slots)
[verify]     receipt body HMAC:     ok
[verify]     content identifier:    ok (cidv1:sha256:9d3e…)
[verify]     TEE attestation:       not present
[verify]     parent receipt:        ok (v3 → v4)
verification: passed

Honest limits.

The split between catalog and storage has three honest costs the design does not paper over.

Limit one: kolm.ai is not a CDN. If a publisher's storage URL is slow or rate-limited, the download is slow or rate-limited. We do not provide a fallback download path for arbitrary public artifacts. Publishers who want fast, reliable downloads pay for fast, reliable storage; the registry record reflects whichever URLs they registered.

Limit two: a dark publisher's artifact may be unreachable. If the only storage URL points at a bucket that the publisher deleted, the bytes are gone. The catalog row remains as a reference, and the CID is preserved so a third party can re-publish the same bytes against a different storage URL if they have a copy. But "if they have a copy" is doing real work. A purely-cold artifact with no remaining mirror is unreachable. This is a real failure mode. The mitigation is publisher culture: artifacts that are dependencies for production deployments should have at least two storage URLs registered, ideally on independent providers.

Limit three: kolm.ai's metadata uptime still matters for discovery. If kolm.ai is down, you cannot search the catalog. The artifacts you already have continue to verify locally; new artifacts cannot be discovered until the catalog is reachable again. We treat catalog uptime as a hard SLO and operate accordingly, but we are also clear about the failure shape: an outage of the catalog is a discovery outage, not a verification outage.

Sustainability.

The registry is free to use. The infrastructure is not free to run. The bill is paid by three revenue lines, none of which is registry-transactional.

Paid Pro and Business tiers subsidize the catalog and verification surface for free users. The same machines that host the catalog metadata also host the customer-facing compile API, the receipt signing service, and the captures inbox. The marginal cost of the registry on those machines is small relative to the cost of the customer-facing service; the paid tiers cover both.

Enterprise BYOC contracts for customers who run kolm in their own cloud account include support, indemnification, and direct engineering time. These contracts pay for the slow-moving infrastructure: storage durability for the metadata index, replication across regions, the legal work that keeps the public catalog defensible.

No artifact royalties; no transaction fees on registry use. A publisher who lists a free public artifact does not pay; a downloader who fetches one does not pay. The registry has no revenue from registry transactions. This is a deliberate choice. Any revenue line attached to registry transactions creates an incentive to inflate transaction counts, which corrupts the catalog. The catalog is funded out of the paid product, and the paid product wins or loses on its own merits.

The trade-off this creates is that the registry's scale is bounded by what the paid product can subsidize. If the catalog grew faster than the paid tiers grew, we would have to either reduce metadata services we offer or charge for some subset of registry features. We are explicit about that trade-off in the terms of service, and our current plan is to keep the catalog free for as long as the paid product can carry it.

Prior art: npm, pip, conda-forge.

Every public-package decision sits on top of decades of prior art. Three reference points inform the design.

npm (Node Package Manager) hosts both metadata and bytes on a centralized service. It is fast, it works, and it has weathered both the left-pad incident and several subsequent supply-chain attacks. The npm operational model is the strongest existence proof that a centralized package registry can serve a vast ecosystem reliably. The cost is that npm Inc. (now GitHub) carries the storage and egress for every package ever published, and supply-chain attacks have a single point of compromise.

PyPI (Python Package Index) sits in a similar architecture: centralized metadata, centralized bytes, with mirrors (Bandersnatch, devpi) bolted on. PyPI's governance is open and transparent; the project's funding has historically been a mix of PSF and corporate sponsorship. PyPI's biggest operational risk is the same as npm's: every byte of every release flows through a single set of buckets.

conda-forge takes a different approach: a community-maintained channel built on top of a CDN (Anaconda's). The metadata is in Git, the bytes are on a CDN, and the verification is via package hashes computed at build time. conda-forge is the closest public-registry model to what we are doing, with two architectural differences: their bytes still live on a single CDN (we leave that to publishers), and their metadata is Git-mirrored across many hosts (we mirror metadata centrally for now).

The kolm registry borrows from all three. The catalog-and-bytes split is closer to conda-forge than to npm. The receipt chain is novel to kolm; the CID-based content addressing is a small adaptation of the IPFS pattern. The funding model (paid product subsidizes a free catalog) is closer to GitHub's funding of npm than to PSF's funding of PyPI.

The honest summary: a public registry is a discoverability layer that lets builders find and verify artifacts; it is not a guarantee of availability. The receipt chain guarantees that a file you hold is the file the catalog described. It does not guarantee that the file will still be reachable next month. That second guarantee belongs to whoever is paying for the storage, and the architecture is meant to make that responsibility legible rather than fictitious.

Public registry economics: who pays for hosting and verification.

Contents