kolm / security / halborn pentest 2026-04

Halborn pentest, April 2026.

Halborn ran a 14-day, two-phase external penetration test of the kolm runtime API surface, compile pipeline, registry, billing path, and tenant isolation. This page summarizes the engagement, the findings, the fixes, and the retest result. Full unredacted report available on request to founders@kolm.ai under MNDA.

Vendor HalbornWindow 2026-04-08 to 2026-04-22Retest 2026-05-06Findings 0C / 1H / 3M / 3I

Scope.

Halborn was engaged to find issues a regulated buyer would care about before kolm reached the public launch. The scope statement, agreed pre-kickoff:

Runtime API surface, 54 endpoints under kolm.ai/v1/* (auth, compile, run, registry, receipt verification, billing).
Compile pipeline including job queue, eval harness, K-score gate enforcement.
Public registry submission flow, including artifact upload and signature path.
Stripe billing webhook and entitlement reconciliation.
Cross-tenant isolation across artifacts, receipts, recall corpora, and recipes.
JWT + API key authentication, including expiry, rotation, and revocation paths.
Receipt HMAC verification and the manifest parser.
RS-1 spec implementation surface (recipe parser, eval harness, K-score formula).

Out of scope: third-party services (GitHub, Stripe, Cloudflare), denial-of-service, social engineering, physical access to anyone's laptop. Standard for a pre-launch engagement.

Methodology.

Two phases over 14 calendar days:

Phase 1 (days 1-5): black-box external scan. No credentials. Halborn enumerated the public surface, probed authentication, attempted injection across the OpenAPI spec, fuzzed the receipt verifier and the manifest parser, and mapped the registry submission flow.
Phase 2 (days 6-14): credentialed deep dive with source-code review. Halborn received the monorepo, a Pro-tier API key, and the threat model. Manual review of authentication, tenant isolation, the K-score gate, and the billing webhook. Mutation-based fuzzing on the verifier with libFuzzer harnesses.

The methodology mapped to OWASP Top 10 (2021) and OWASP ASVS 4.0 Level 2. Threat model handed off pre-engagement; Halborn used it to prioritize the credentialed phase.

Findings summary.

ID	Severity	Title	Status
KOL-01	High	Receipt-replay window. Receipts could be replayed within their 5-minute issuance window if intercepted by an attacker on the same authenticated session.	FIXED v10.4
KOL-02	Medium	Registry submission CSRF. Submission flow lacked a SameSite=strict cookie on the upload endpoint when the registry was driven from a third-party context.	FIXED v10.4
KOL-03	Medium	JWT rotation race. Tokens issued during a key-rotation window of less than 200ms could be accepted by both keys, extending effective lifetime by the rotation interval.	FIXED v10.4
KOL-04	Medium	Stripe webhook idempotency. Replay of a webhook event within the Stripe 5-minute signature window could double-credit on subscription upgrades. No customer impact observed.	FIXED v10.4
KOL-05	Info	Verbose error messages. Some 500 responses leaked internal class names. No exploitability path identified; reported as defense-in-depth.	ACK
KOL-06	Info	CORS pre-flight echoed Origin. Pre-flight responses echoed the request Origin without an allow-list check. Production CORS policy already restricts to known origins; pre-flight echo flagged as minor.	ACK
KOL-07	Info	Rate-limit headers not normalized across endpoints. Three endpoints used `X-Rate-Limit-*` while the rest used `Retry-After`. Cosmetic; flagged for documentation.	ACK

The High: KOL-01, receipt-replay window.

What it was. A receipt carries an HMAC over (cid, prompt_hash, response_hash, ts). Within the 5-minute clock-skew tolerance that the verifier accepts, the same receipt with the same ts would verify as authentic if replayed by an attacker who had captured it. The exposure was limited to the same authenticated session; an attacker without the bearer token could not access the receipt.

Why it matters anyway. A receipt is a record of what the model said. Replaying a receipt could let an attacker make the audit log show a response was issued for a different prompt than the one the user actually sent. For regulated buyers, this would have weakened the audit chain.

The fix. Each receipt now carries a nonce drawn from a CSPRNG at issuance, and the verifier maintains a 10-minute sliding window of seen (cid, nonce) tuples. Duplicate (cid, nonce) within the window is rejected with HTTP 409. The window is twice the clock-skew tolerance to ensure any acceptable receipt is checkable. The HMAC now covers (cid, prompt_hash, response_hash, ts, nonce) so the nonce is itself integrity-protected.

Shipped in. v10.4, 2026-04-19, three days into the retest window.

Retest, 2026-05-06.

Halborn retested all High and Medium findings two weeks after fix-out. Results:

KOL-01 (High, receipt replay): closed. Halborn confirmed the nonce path rejects replay within the verifier window and across verifier restarts (the sliding window is durable across process restart).
KOL-02 (Medium, CSRF): closed. Registry submission now requires SameSite=Strict + an anti-CSRF token tied to the session.
KOL-03 (Medium, JWT race): closed. Key rotation now uses a 60-second hand-off window with both keys advertised in JWKS, and clients are expected to refresh before old key expiry.
KOL-04 (Medium, Stripe idempotency): closed. Webhook handler now stores event.id in a 7-day dedup table; replays return 200 without crediting.
KOL-05, KOL-06, KOL-07 (Info): acknowledged. KOL-05 and KOL-06 will be addressed in v10.5 as a defense-in-depth pass; KOL-07 is a documentation note.

Halborn issued a follow-up letter confirming closure of all P0 and P1 findings on 2026-05-06. The letter is bundled with the full report under MNDA.

Cadence going forward.

kolm runs a pentest engagement on every major release, or sooner if the runtime API surface changes by more than 25%. Continuous bug bounty fills the gap between formal engagements; see /bounty for the payout table and scope.

# next planned engagement: v12 GA, approximately Q3 2026.
# vendor selection rotates every two engagements to avoid drift bias.

Full report.

The unredacted Halborn report runs roughly 38 pages and includes step-by-step reproductions, remediation evidence, and Halborn's risk-rating rationale per finding. We share it on request under an MNDA. To request access:

Mail founders@kolm.ai with your company, role, and the artifact or deployment you are evaluating.
We send an MNDA via DocuSign; typical turnaround is one business day.
After MNDA execution, the PDF arrives in a single signed mail with the report digest in the body.

For procurement reviewers who only need the summary table on this page plus the retest letter, we can issue a redacted version on a same-day turnaround without MNDA.