feat(ingestion): business-logic analyze phase — populate likely_plumbing + candidate_business by theagenticguy · Pull Request #249 · theagenticguy/opencodehub

theagenticguy · 2026-06-22T21:40:27Z

Summary

Makes the merged sieve kernels (#248) end-to-end: codehub analyze now writes likelyPlumbing + candidateBusiness into nodes.payload for every Function / Method / Constructor / Class / Interface / Struct in a Python / Java / Go repo. The user gets both concern tags from two commands (codehub init + codehub analyze) with no query, no labels, no embeddings:

likely_plumbing    — precision-first (~0.94)  "this is plumbing"
candidate_business — recall-first    (~0.93)  "look here for domain logic"

Queryable via SQLite JSON1: SELECT name FROM nodes WHERE payload->>'$.candidateBusiness' = 'true'.

Components

core-types — two optional CallableShape fields. Auto-persist through nodes.payload; no storage-adapter change (the SQLite store rehydrates payload verbatim).
extract/business-logic-features.ts — faithful Python→TS port of the feature extractor. Reproduces the marker logic exactly: word-boundary / camelCase-component matching, the precise nPlumbingSignals formula (serialization + observability + getter/setter + dto-mapper-ratio≥0.5), and ORM-base class-head detection. 44 unit tests.
pipeline/phases/business-logic.ts — the analyze-time phase (after complexity). Slices each symbol body, runs classifyPlumbing + classifyBusinessCandidate, re-adds the node with the tags (richer-entry-wins merge, same contract complexity uses). Python/Java/Go only; other languages skip silently. The class-head slice scans upward over a Javadoc block to reach the real @Entity / @MappedSuperclass annotation while excluding @author-style comment tags.
default-set + orchestrator test — registered after complexity; the topological-order assertion updated for the new position.

The parity contract

The whole point of the port is that the shipped numbers survive it. An independent per-symbol harness diffs the TS analyze-pass verdicts against the Python oracle across all four corpus repos:

1368 / 1368 = 100.0% verdict agreement, 0 disagreements.

This was re-verified independently of the porting agent's own report — which is how a JPA-entity divergence surfaced (a Javadoc @author tag shadowing the real ORM annotation, dropping isOrmModel and flipping 5 entity classes). Caught at 99.63%, fixed to 100%. The 0.936 plumbing precision / 0.925 business recall hold end-to-end.

Determinism

computePlumbingFeatures and both kernels are pure; files + definitions iterate in sorted order. Tags are byte-stable across runs (the three graphHash-determinism tests pass), safe under the reproducibility contract.

Verification

core-types / analysis / ingestion typecheck clean.
ingestion 629/629, core-types 83/83, analysis 14/14.
biome + banned-strings + commitlint + pre-push hook pass.

…date_business into the graph Wires the merged @opencodehub/analysis sieve kernels into `codehub analyze`. A new `businessLogicPhase` (after `complexity`) slices each Function / Method / Constructor / Class / Interface / Struct body, computes the deterministic PlumbingFeatures vector, runs classifyPlumbing + classifyBusinessCandidate, and tags the node with `likelyPlumbing` + `candidateBusiness`. The tags land in `nodes.payload` (queryable via `payload->>'$.candidateBusiness'`), so the user gets both concern tags from `codehub analyze` with no query, no labels, no embeddings. Components: - core-types: two optional `CallableShape` fields (likelyPlumbing / candidateBusiness). Auto-persist through nodes.payload; no adapter change. - extract/business-logic-features.ts: faithful Python→TS port of the feature extractor (computePlumbingFeatures), reproducing the marker logic — word- boundary / camelCase-component matching, the exact n_plumbing_signals formula (serialization + observability + getter/setter + dto-mapper-ratio≥0.5), and the ORM-base class-head detection. 44 unit tests. - pipeline/phases/business-logic.ts: the analyze-time phase. Python/Java/Go only (the sieve's validated set); other languages skip silently. Class-head slice scans upward over a Javadoc block to reach the real `@Entity` / `@MappedSuperclass` annotation while excluding `@author`-style comment tags. - default-set: registered after complexity; orchestrator test updated for the new topological position. PARITY GATE (the contract): the TS analyze-pass verdicts match the Python oracle 1368/1368 = 100.0% per-symbol across all four corpus repos (py-cosmic-ddd / py-flask / java-petclinic / go-clean), independently re-verified — so the shipped 0.936 plumbing precision / 0.925 business recall hold through the port. A JPA-entity divergence (Javadoc @author shadowing the ORM annotation) was caught by the gate at 99.63% and fixed to reach 100%. Verified: core-types/analysis/ingestion typecheck clean; ingestion 629/629, core-types 83/83, analysis 14/14; biome + banned-strings pass.

🤖 Automated release via release-please --- <details><summary>root: 0.9.2</summary> ## [0.9.2](root-v0.9.1...root-v0.9.2) (2026-06-24) ### Features * **analysis:** plumbing sieve + candidate_business tag (deterministic, advisory) ([#248](#248)) ([383b719](383b719)) * **ingestion:** business-logic analyze phase — populate likely_plumbing + candidate_business ([#249](#249)) ([a3d44ad](a3d44ad)) * **storage:** single-file SQLite + WASM embedder — zero native dependencies ([#245](#245)) ([c72c84f](c72c84f)) ### Bug Fixes * **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent bugs ([#247](#247)) ([90f40a2](90f40a2)) </details> <details><summary>cli: 0.9.2</summary> ## [0.9.2](cli-v0.9.1...cli-v0.9.2) (2026-06-24) ### Features * **storage:** single-file SQLite + WASM embedder — zero native dependencies ([#245](#245)) ([c72c84f](c72c84f)) ### Bug Fixes * **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent bugs ([#247](#247)) ([90f40a2](90f40a2)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

theagenticguy merged commit a3d44ad into main Jun 22, 2026
38 checks passed

theagenticguy deleted the feat/business-logic-analyze-pass branch June 22, 2026 21:57

github-actions Bot mentioned this pull request Jun 22, 2026

chore: release main #246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingestion): business-logic analyze phase — populate likely_plumbing + candidate_business#249

feat(ingestion): business-logic analyze phase — populate likely_plumbing + candidate_business#249
theagenticguy merged 1 commit into
mainfrom
feat/business-logic-analyze-pass

theagenticguy commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

theagenticguy commented Jun 22, 2026

Summary

Components

The parity contract

Determinism

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant