feat(ingestion): business-logic analyze phase — populate likely_plumbing + candidate_business#249
Merged
Merged
Conversation
…date_business into the graph Wires the merged @opencodehub/analysis sieve kernels into `codehub analyze`. A new `businessLogicPhase` (after `complexity`) slices each Function / Method / Constructor / Class / Interface / Struct body, computes the deterministic PlumbingFeatures vector, runs classifyPlumbing + classifyBusinessCandidate, and tags the node with `likelyPlumbing` + `candidateBusiness`. The tags land in `nodes.payload` (queryable via `payload->>'$.candidateBusiness'`), so the user gets both concern tags from `codehub analyze` with no query, no labels, no embeddings. Components: - core-types: two optional `CallableShape` fields (likelyPlumbing / candidateBusiness). Auto-persist through nodes.payload; no adapter change. - extract/business-logic-features.ts: faithful Python→TS port of the feature extractor (computePlumbingFeatures), reproducing the marker logic — word- boundary / camelCase-component matching, the exact n_plumbing_signals formula (serialization + observability + getter/setter + dto-mapper-ratio≥0.5), and the ORM-base class-head detection. 44 unit tests. - pipeline/phases/business-logic.ts: the analyze-time phase. Python/Java/Go only (the sieve's validated set); other languages skip silently. Class-head slice scans upward over a Javadoc block to reach the real `@Entity` / `@MappedSuperclass` annotation while excluding `@author`-style comment tags. - default-set: registered after complexity; orchestrator test updated for the new topological position. PARITY GATE (the contract): the TS analyze-pass verdicts match the Python oracle 1368/1368 = 100.0% per-symbol across all four corpus repos (py-cosmic-ddd / py-flask / java-petclinic / go-clean), independently re-verified — so the shipped 0.936 plumbing precision / 0.925 business recall hold through the port. A JPA-entity divergence (Javadoc @author shadowing the ORM annotation) was caught by the gate at 99.63% and fixed to reach 100%. Verified: core-types/analysis/ingestion typecheck clean; ingestion 629/629, core-types 83/83, analysis 14/14; biome + banned-strings pass.
Merged
theagenticguy
pushed a commit
that referenced
this pull request
Jun 25, 2026
🤖 Automated release via release-please --- <details><summary>root: 0.9.2</summary> ## [0.9.2](root-v0.9.1...root-v0.9.2) (2026-06-24) ### Features * **analysis:** plumbing sieve + candidate_business tag (deterministic, advisory) ([#248](#248)) ([383b719](383b719)) * **ingestion:** business-logic analyze phase — populate likely_plumbing + candidate_business ([#249](#249)) ([a3d44ad](a3d44ad)) * **storage:** single-file SQLite + WASM embedder — zero native dependencies ([#245](#245)) ([c72c84f](c72c84f)) ### Bug Fixes * **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent bugs ([#247](#247)) ([90f40a2](90f40a2)) </details> <details><summary>cli: 0.9.2</summary> ## [0.9.2](cli-v0.9.1...cli-v0.9.2) (2026-06-24) ### Features * **storage:** single-file SQLite + WASM embedder — zero native dependencies ([#245](#245)) ([c72c84f](c72c84f)) ### Bug Fixes * **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent bugs ([#247](#247)) ([90f40a2](90f40a2)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the merged sieve kernels (#248) end-to-end:
codehub analyzenow writeslikelyPlumbing+candidateBusinessintonodes.payloadfor every Function / Method / Constructor / Class / Interface / Struct in a Python / Java / Go repo. The user gets both concern tags from two commands (codehub init+codehub analyze) with no query, no labels, no embeddings:Queryable via SQLite JSON1:
SELECT name FROM nodes WHERE payload->>'$.candidateBusiness' = 'true'.Components
CallableShapefields. Auto-persist throughnodes.payload; no storage-adapter change (the SQLite store rehydrates payload verbatim).extract/business-logic-features.ts— faithful Python→TS port of the feature extractor. Reproduces the marker logic exactly: word-boundary / camelCase-component matching, the precisenPlumbingSignalsformula (serialization + observability + getter/setter + dto-mapper-ratio≥0.5), and ORM-base class-head detection. 44 unit tests.pipeline/phases/business-logic.ts— the analyze-time phase (aftercomplexity). Slices each symbol body, runsclassifyPlumbing+classifyBusinessCandidate, re-adds the node with the tags (richer-entry-wins merge, same contract complexity uses). Python/Java/Go only; other languages skip silently. The class-head slice scans upward over a Javadoc block to reach the real@Entity/@MappedSuperclassannotation while excluding@author-style comment tags.The parity contract
The whole point of the port is that the shipped numbers survive it. An independent per-symbol harness diffs the TS analyze-pass verdicts against the Python oracle across all four corpus repos:
1368 / 1368 = 100.0% verdict agreement, 0 disagreements.
This was re-verified independently of the porting agent's own report — which is how a JPA-entity divergence surfaced (a Javadoc
@authortag shadowing the real ORM annotation, droppingisOrmModeland flipping 5 entity classes). Caught at 99.63%, fixed to 100%. The 0.936 plumbing precision / 0.925 business recall hold end-to-end.Determinism
computePlumbingFeaturesand both kernels are pure; files + definitions iterate in sorted order. Tags are byte-stable across runs (the threegraphHash-determinism tests pass), safe under the reproducibility contract.Verification