feat(analysis): plumbing sieve + candidate_business tag (deterministic, advisory)#248
Merged
Merged
Conversation
Adds `classifyPlumbing` to @opencodehub/analysis — a pure, deterministic per-symbol rule that flags high-confidence plumbing (serialization, DTO mapping, transport, DI wiring) and ABSTAINS everywhere else. It never asserts "business logic": calling a domain rule plumbing and hiding it is the costly error, so the rule is tuned for plumbing precision and stays silent when unsure. Provenance: distilled from a teacher/student loop (3-model LLM panel labeled ~300 symbols across Python/Java/Go; shallow tree fit; two cleanest plumbing leaves lifted out). Measured plumbing precision: 0.936 aggregate, >= 0.85 on EVERY repo under per-repo eval (flask 1.00, petclinic 0.94, go-clean 0.92, cosmic-ddd 0.89). The full business-asserting classifier did not generalize cross-repo and is intentionally NOT shipped — only the plumbing direction. Two tiers, both requiring zero domain signal so any real decision vetoes the sieve: - serialization-pure (conf 0.95): serializer call, no domain signal - plumbing-no-domain (conf 0.90): plumbing signals, no domain signal, not ORM Pure function of a small PlumbingFeatures struct (mirrors the page-rank.ts deterministic-kernel idiom), so the verdict is safe to persist into nodes.payload and survives the graphHash byte-identity contract. Validated on python/java/go (SIEVE_VALIDATED_LANGUAGES); other languages should be skipped by the analyze pass rather than emit an unbacked verdict. Tests pin the iter-0 regressions: AbstractRepository reads plumbing, Batch.allocate (domain rule) never does. Also refreshes the stale `sql` MCP tool description in CLAUDE.md to ADR 0019 (nodes/edges are directly SQL-queryable; cypher is fork-only).
…he sieve Adds `classifyBusinessCandidate`: a symbol is a business-logic candidate unless the sieve is confident it is plumbing (`candidateBusiness === !likelyPlumbing`). This is the "look here for domain logic" tag the user gets at analyze time with no query, no labels, no embeddings. Recall-first by construction: a symbol only loses the candidate tag when we are confident it is plumbing, so real domain logic cannot be silently dropped. Measured on 286 labeled symbols (Python/Java/Go): business recall 0.925 (misses 6 of 80), per-repo recall 0.80-1.00. Precision 0.385 (tags ~67%) is the intended trade — the tag is the safety net; an optional embedding-derived rank (follow-up) orders candidates so the most domain-like surface first. Same feature inputs as the sieve, so the two tags can never disagree: every symbol is either confident-plumbing or a candidate, never both, never neither (pinned by a complement-invariant test). 5 new tests; 14 total in the suite.
Merged
theagenticguy
added a commit
that referenced
this pull request
Jun 22, 2026
…ing + candidate_business (#249) ## Summary Makes the merged sieve kernels (#248) end-to-end: `codehub analyze` now writes `likelyPlumbing` + `candidateBusiness` into `nodes.payload` for every Function / Method / Constructor / Class / Interface / Struct in a Python / Java / Go repo. The user gets both concern tags from two commands (`codehub init` + `codehub analyze`) with **no query, no labels, no embeddings**: ``` likely_plumbing — precision-first (~0.94) "this is plumbing" candidate_business — recall-first (~0.93) "look here for domain logic" ``` Queryable via SQLite JSON1: `SELECT name FROM nodes WHERE payload->>'$.candidateBusiness' = 'true'`. ## Components - **core-types** — two optional `CallableShape` fields. Auto-persist through `nodes.payload`; no storage-adapter change (the SQLite store rehydrates payload verbatim). - **`extract/business-logic-features.ts`** — faithful Python→TS port of the feature extractor. Reproduces the marker logic exactly: word-boundary / camelCase-component matching, the precise `nPlumbingSignals` formula (serialization + observability + getter/setter + dto-mapper-ratio≥0.5), and ORM-base class-head detection. **44 unit tests.** - **`pipeline/phases/business-logic.ts`** — the analyze-time phase (after `complexity`). Slices each symbol body, runs `classifyPlumbing` + `classifyBusinessCandidate`, re-adds the node with the tags (richer-entry-wins merge, same contract complexity uses). Python/Java/Go only; other languages skip silently. The class-head slice scans upward over a Javadoc block to reach the real `@Entity` / `@MappedSuperclass` annotation while excluding `@author`-style comment tags. - **default-set + orchestrator test** — registered after complexity; the topological-order assertion updated for the new position. ## The parity contract The whole point of the port is that the shipped numbers survive it. An independent per-symbol harness diffs the TS analyze-pass verdicts against the Python oracle across all four corpus repos: **1368 / 1368 = 100.0% verdict agreement, 0 disagreements.** This was re-verified independently of the porting agent's own report — which is how a JPA-entity divergence surfaced (a Javadoc `@author` tag shadowing the real ORM annotation, dropping `isOrmModel` and flipping 5 entity classes). Caught at 99.63%, fixed to 100%. The 0.936 plumbing precision / 0.925 business recall hold end-to-end. ## Determinism `computePlumbingFeatures` and both kernels are pure; files + definitions iterate in sorted order. Tags are byte-stable across runs (the three `graphHash`-determinism tests pass), safe under the reproducibility contract. ## Verification - core-types / analysis / ingestion typecheck clean. - ingestion **629/629**, core-types **83/83**, analysis **14/14**. - biome + banned-strings + commitlint + pre-push hook pass.
theagenticguy
pushed a commit
that referenced
this pull request
Jun 25, 2026
🤖 Automated release via release-please --- <details><summary>root: 0.9.2</summary> ## [0.9.2](root-v0.9.1...root-v0.9.2) (2026-06-24) ### Features * **analysis:** plumbing sieve + candidate_business tag (deterministic, advisory) ([#248](#248)) ([383b719](383b719)) * **ingestion:** business-logic analyze phase — populate likely_plumbing + candidate_business ([#249](#249)) ([a3d44ad](a3d44ad)) * **storage:** single-file SQLite + WASM embedder — zero native dependencies ([#245](#245)) ([c72c84f](c72c84f)) ### Bug Fixes * **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent bugs ([#247](#247)) ([90f40a2](90f40a2)) </details> <details><summary>cli: 0.9.2</summary> ## [0.9.2](cli-v0.9.1...cli-v0.9.2) (2026-06-24) ### Features * **storage:** single-file SQLite + WASM embedder — zero native dependencies ([#245](#245)) ([c72c84f](c72c84f)) ### Bug Fixes * **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent bugs ([#247](#247)) ([90f40a2](90f40a2)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two deterministic, label-free per-symbol tags for
@opencodehub/analysis, the two halves of one idea:classifyPlumbing(precision-first) — flags high-confidence plumbing (serialization, DTO mapping, transport, DI wiring), abstains otherwise. Never asserts "business logic". Plumbing precision 0.936 aggregate, ≥0.85 on every repo (flask 1.00, java 0.94, go 0.92, cosmic 0.89) under per-repo eval.classifyBusinessCandidate(recall-first) — the exact complement: a symbol is acandidate_businessunless the sieve is confident it's plumbing. Business recall 0.925 (misses 6 of 80), per-repo 0.80–1.00. This is the "look here for domain logic" tag the user gets at analyze time with no query.Why this shape
Asserting "this IS business logic" needs a trained classifier and didn't generalize across repos (held-out F1 ~0.3). Subtracting confident plumbing does generalize, because the sieve does. So:
Recall-first by construction: a symbol only loses the candidate tag when we're confident it's plumbing, so real domain logic can't be silently dropped. The 0.385 candidate precision (tags ~67%) is the intended trade — the tag is the safety net; an optional embedding-derived
business_rank(follow-up) orders candidates so the most domain-like surface first. Embeddings become a rank, not a yes/no judge.Provenance
Distilled from a teacher/student loop — a 3-model LLM panel labeled ~300 symbols across Python/Java/Go, a shallow tree was fit, the cleanest high-precision plumbing leaves were lifted out. Both kernels are pure functions of a small
PlumbingFeaturesstruct (no I/O, no model, no randomness — mirrorspage-rank.ts), so verdicts are safe to persist intonodes.payloadand survive thegraphHashbyte-identity contract. The complement invariant (every symbol is either confident-plumbing or a candidate, never both/neither) is pinned by test.End-user flow (unchanged)
codehub init+codehub analyze [--embeddings]. The tags land innodes.payload, queryable aspayload->>'$.candidate_business'. No labeling, ever — labeling was only the offline method-validation, it does not ship.Scope
AbstractRepository→ plumbing,Batch.allocate→ candidate), exported from the package index.PlumbingFeaturesfrom the AST and writeslikely_plumbing+candidate_business(+ optional embeddingbusiness_rank) intonodes.payload.Also refreshes the stale
sqlMCP tool description inCLAUDE.mdto ADR 0019.Verification
packages/analysistypecheck clean; full suite 154 pass / 0 fail.