Skip to content

feat: Docker distribution + determinism receipts + SCIP/LSP breadth (spec 008)#243

Open
theagenticguy wants to merge 14 commits into
mainfrom
feat/v1-distribution-breadth
Open

feat: Docker distribution + determinism receipts + SCIP/LSP breadth (spec 008)#243
theagenticguy wants to merge 14 commits into
mainfrom
feat/v1-distribution-breadth

Conversation

@theagenticguy

Copy link
Copy Markdown
Owner

Summary

Distribution + determinism + breadth for OpenCodeHub (ERPAVal spec 008). Adds a Docker distribution channel, turns the deterministic code-pack into a signed/replayable receipt, conforms the MCP server to the 2026-07-28 stateless model, and extends language coverage — without eroding the byte-identical packHash contract.

Branch: 13 commits, sequenced by dependency (Docker → SCIP/LSP breadth ‖ determinism/MCP). Full spec at .erpaval/specs/008-distribution-determinism-breadth/.

Tracks

B — Docker distribution

  • Multistage node:24 + pnpm 11 lite image (parser + graph + CLI + stdio MCP + TS/Python SCIP) and full multi-arch image (+ jlink JRE + scip-java/go + uv).
  • .github/workflows/docker.yml builds both arches + smoke-tests och-mcp and the bundled indexers. No HTTP surface — docker run -i stdio only.

C — Determinism receipts + MCP conformance

  • code-pack --prove emits an in-toto/SLSA-v1 statement whose subject digest is the packHash; codehub replay <hash> re-derives byte-for-byte and names the drifted item on a mismatch. Keyless cosign (CI-signed; degrades honestly off-CI).
  • BOM reordered cache-prefix-stable; docs reframed around prompt-cache stability.
  • MCP server: stateless per-request _meta protocol negotiation, server/discover, ttlMs/cacheScope, deprecated methods removed. (SDK predates 2026-07-28 → the version pin is a documented TODO; no hand-rolled transport.)

A — Language breadth

  • scip-php + scip-dart at a new scip-unofficial (Tier 1.5) confidence label, distinct from first-party scip:. ADR 0006 refreshed.
  • @opencodehub/lsp-tier: a packHash-quarantined LSP tier-3 for SCIP-blind languages (Swift/Zig/Elixir/Terraform/Clojure). ADR 0019 amends ADR 0005. Quarantine proven by a with/without-Tier-3 byte-identical test.

Verified

  • mise run check green tree-wide (lint + typecheck + tests + banned-strings); packHash + graphHash byte-identity suites pass; license allowlist green.
  • pack --prove / replay proven live locally (cosign 3.1.1): subject == packHash; clean → "reproduced"; tampered byte → "NOT reproduced — drifted item".

Known follow-ups (documented, not blockers)

  • lsp-tier live backend: ships extraction logic with an injectable backend; the concrete agent-lsp-spawning backend + codehub analyze --tier3-lsp flag are not yet wired — SCIP-blind languages stay on tree-sitter until then.
  • cosign signing is CI-only (needs OIDC id-token: write); replay needs no cosign.
  • T-C7 CORE-Bench L3 harness lives in the opencodehub-testbed repo (not in this PR).

🤖 Generated with ERPAVal (session-893add).

…on + breadth)

Plan-phase durables for session-893add. Binary track dropped; Docker multistage
pnpm+Node24 is the sole non-npm artifact. 3 tracks, 10 Act packets, wave graph.
Q1 resolved: amend ADR 0005 (new ADR 0019) for a quarantined Tier-3 LSP fallback.
Read protocolVersion/clientInfo/clientCapabilities from per-request _meta via
withProtocolGate proxy over all 29 tools; UnsupportedProtocolVersionError on
mismatch. SDK@1.29.0 lacks 2026-07-28 so transport handshake stays SDK-native;
full negotiation is a documented TODO. T-C9, spec 008 E-C9/AC-C14/U7.
…stability

Emit skeleton/file-tree/deps first, volatile ast-chunks/findings/embeddings last,
so a byte-identical pack maximizes the cache-eligible prompt prefix (0.1x read).
Docs lead with cache-prefix stability over token savings. packHash byte-identity
holds (no golden literal; determinism suite asserts cross-run equality). T-C2, AC-C4/E-C5.
Builder installs+builds+pnpm-deploy-prunes; node:24-slim runtime carries the
pruned closure + wasm grammars, embedder removed. och-mcp shim runs stdio MCP via
docker run -i. scope-enum += docker; ROADMAP rejects the single-binary track.
Lite ~600MB (lockfile-faithful: DuckDB+graph natives+SCIP TS compilers). T-B1.
prove() emits an in-toto SLSA-v1 statement whose subject sha256 == manifest
packHash, predicate carries (commit, tokenizer, budget, pins) + BOM inputs;
keyless cosign sign-blob (degrades to documented-cmd when cosign absent). replay
re-derives + byte-compares: strict drift exits non-zero naming the item,
best_effort mismatch is expected-drift. Fixes code-pack manifest commit:'' so
packs are replayable. cosign live-sign is env-gated, not faked. T-C1, E-C1/E-C2/AC-C3/U2.
server/discover advertises identity + lex-sorted protocol versions + the live
29-tool catalog (app-level handler; SDK@1.29.0 has no native discover). Remove
ping; logging.setLevel + roots.list_changed never installed; log level via
per-request _meta.logLevel. tools/list, resources/list+read carry ttlMs +
cacheScope (not etag). README documents the stdio-only rail. T-C10-13, E-C10/E-C11/E-C12/AC-C13.
…+ CI

Adds jre-build (jlink JRE-21, 62MB + scip-java 0.12.3), scip-go-dl (SHA-verified
scip-go v0.2.7 per-arch), and a full target (FROM lite + indexer toolchains + uv).
docker.yml builds lite+full for amd64+arm64 and smoke-tests och-mcp + indexers;
all actions SHA-pinned. No GPL/MPL binaries. Lite stage untouched. T-B2, E-D2/E-D3/AC-D6/AC-D7.
IndexerKind += php/dart, ALLOWED_COMMANDS += scip-php/scip_dart, detectLanguages
maps composer.json/pubspec.yaml; both gated behind --allow-build-scripts. New
SCIP_UNOFFICIAL_PROVENANCE_PREFIXES (Tier 1.5, distinct from first-party scip:),
surfaced in confidence-breakdown. scip_dart binary is underscore (verified vs
upstream tag). ADR 0006 refreshed (scip-code/scip-go@v0.2.7, scip@0.8.1). T-A-S.
New @opencodehub/lsp-tier vendors agent-lsp logic (workspace/symbol + blast_radius)
for Swift/Zig/Elixir/Terraform/Clojure etc. Facts tagged lsp:<bin>@<ver>,
canonically re-sorted, kept in a packHash-EXCLUDED sidecar — packHash byte-identical
with/without Tier-3 (proven by quarantine.test). Opt-in only (O-A7); warmup hard-fail
(S-A4b); per-wrapped-server SPDX audit (AC-A5). ADR 0019 amends 0005. T-A-L.
… + stale-dist)

Two durable ERPAVal lessons: (1) quarantine nondeterministic LSP/heuristic facts
in a packHash-excluded sidecar to extend breadth without eroding determinism
(the load-bearing byte-identical-with/without test); (2) new workspace package or
export reads as a phantom missing-member error until clean rebuild + relink.
…plit

Lite was described by what it omits (no JVM/scip-go), which read as 'no SCIP'.
It bakes in scip-typescript + scip-python (CLI prod deps). Full adds the trimmed
JRE + remaining indexers + uv; lite fetches those via codehub setup on demand.
code-pack staged the BOM under os.tmpdir() then rename()d into the repo's
.codehub/packs/<hash>/. When /tmp is a different mount (tmpfs) from the repo
(EFS/NFS), rename throws EXDEV: cross-device link not permitted and the pack
crashes. Stage under the destination's own parent dir so the move is an atomic
on-device rename. Found running pack --prove live on an EFS-backed checkout.
…p-tier status

README was missing the spec-008 surface: add a 'prove a code-pack is reproducible'
section (code-pack --prove + replay + keyless cosign + offline verify), fix the
stale 28->29 tool count (server.test asserts 29), note PHP/Dart scip-unofficial
tier + the SCIP-blind lsp-tier, and add a 'Since v1' status para that honestly
flags the lsp-tier live backend (+ --tier3-lsp flag) as the remaining follow-up.
Comment thread Dockerfile
# Default to the stdio MCP server. `docker run -i` keeps stdin open for the
# JSON-RPC stream; override the command (e.g. `... codehub analyze`) to drive
# the CLI. No EXPOSE / port / listener — stdio is the only transport (U9).
ENTRYPOINT []
Comment thread Dockerfile
# JSON-RPC stream; override the command (e.g. `... codehub analyze`) to drive
# the CLI. No EXPOSE / port / listener — stdio is the only transport (U9).
ENTRYPOINT []
CMD ["och-mcp"]
theagenticguy added a commit that referenced this pull request Jun 19, 2026
lsp-tier's quarantine.test.ts imported @opencodehub/pack, but the graph is
lsp-tier -> pack -> ingestion -> lsp-tier, so a tsconfig ref back to pack is a
TS6202 circular project graph. Local check passed only on stale dist. Move the
test into pack (downstream of lsp-tier, cycle-free), importing the sidecar
writer from @opencodehub/lsp-tier. The invariant is pack's hash guarantee anyway.
Fixes the CI typecheck/test failures on PR #243.
lsp-tier's quarantine.test.ts imported @opencodehub/pack, but the graph is
lsp-tier -> pack -> ingestion -> lsp-tier, so a tsconfig ref back to pack is a
TS6202 circular project graph. Local check passed only on stale dist. Move the
test into pack (downstream of lsp-tier, cycle-free), importing the sidecar
writer from @opencodehub/lsp-tier. The invariant is pack's hash guarantee anyway.
Fixes the CI typecheck/test failures on PR #243.
@theagenticguy theagenticguy force-pushed the feat/v1-distribution-breadth branch from 4cfff0b to bdef5ff Compare June 19, 2026 22:59
@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

2 similar comments
@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants