Skip to content

feat(ingestion): business-logic analyze phase — populate likely_plumbing + candidate_business#249

Merged
theagenticguy merged 1 commit into
mainfrom
feat/business-logic-analyze-pass
Jun 22, 2026
Merged

feat(ingestion): business-logic analyze phase — populate likely_plumbing + candidate_business#249
theagenticguy merged 1 commit into
mainfrom
feat/business-logic-analyze-pass

Conversation

@theagenticguy

Copy link
Copy Markdown
Owner

Summary

Makes the merged sieve kernels (#248) end-to-end: codehub analyze now writes likelyPlumbing + candidateBusiness into nodes.payload for every Function / Method / Constructor / Class / Interface / Struct in a Python / Java / Go repo. The user gets both concern tags from two commands (codehub init + codehub analyze) with no query, no labels, no embeddings:

likely_plumbing    — precision-first (~0.94)  "this is plumbing"
candidate_business — recall-first    (~0.93)  "look here for domain logic"

Queryable via SQLite JSON1: SELECT name FROM nodes WHERE payload->>'$.candidateBusiness' = 'true'.

Components

  • core-types — two optional CallableShape fields. Auto-persist through nodes.payload; no storage-adapter change (the SQLite store rehydrates payload verbatim).
  • extract/business-logic-features.ts — faithful Python→TS port of the feature extractor. Reproduces the marker logic exactly: word-boundary / camelCase-component matching, the precise nPlumbingSignals formula (serialization + observability + getter/setter + dto-mapper-ratio≥0.5), and ORM-base class-head detection. 44 unit tests.
  • pipeline/phases/business-logic.ts — the analyze-time phase (after complexity). Slices each symbol body, runs classifyPlumbing + classifyBusinessCandidate, re-adds the node with the tags (richer-entry-wins merge, same contract complexity uses). Python/Java/Go only; other languages skip silently. The class-head slice scans upward over a Javadoc block to reach the real @Entity / @MappedSuperclass annotation while excluding @author-style comment tags.
  • default-set + orchestrator test — registered after complexity; the topological-order assertion updated for the new position.

The parity contract

The whole point of the port is that the shipped numbers survive it. An independent per-symbol harness diffs the TS analyze-pass verdicts against the Python oracle across all four corpus repos:

1368 / 1368 = 100.0% verdict agreement, 0 disagreements.

This was re-verified independently of the porting agent's own report — which is how a JPA-entity divergence surfaced (a Javadoc @author tag shadowing the real ORM annotation, dropping isOrmModel and flipping 5 entity classes). Caught at 99.63%, fixed to 100%. The 0.936 plumbing precision / 0.925 business recall hold end-to-end.

Determinism

computePlumbingFeatures and both kernels are pure; files + definitions iterate in sorted order. Tags are byte-stable across runs (the three graphHash-determinism tests pass), safe under the reproducibility contract.

Verification

  • core-types / analysis / ingestion typecheck clean.
  • ingestion 629/629, core-types 83/83, analysis 14/14.
  • biome + banned-strings + commitlint + pre-push hook pass.

…date_business into the graph

Wires the merged @opencodehub/analysis sieve kernels into `codehub analyze`.
A new `businessLogicPhase` (after `complexity`) slices each Function / Method /
Constructor / Class / Interface / Struct body, computes the deterministic
PlumbingFeatures vector, runs classifyPlumbing + classifyBusinessCandidate, and
tags the node with `likelyPlumbing` + `candidateBusiness`. The tags land in
`nodes.payload` (queryable via `payload->>'$.candidateBusiness'`), so the user
gets both concern tags from `codehub analyze` with no query, no labels, no
embeddings.

Components:
- core-types: two optional `CallableShape` fields (likelyPlumbing /
  candidateBusiness). Auto-persist through nodes.payload; no adapter change.
- extract/business-logic-features.ts: faithful Python→TS port of the feature
  extractor (computePlumbingFeatures), reproducing the marker logic — word-
  boundary / camelCase-component matching, the exact n_plumbing_signals formula
  (serialization + observability + getter/setter + dto-mapper-ratio≥0.5), and
  the ORM-base class-head detection. 44 unit tests.
- pipeline/phases/business-logic.ts: the analyze-time phase. Python/Java/Go
  only (the sieve's validated set); other languages skip silently. Class-head
  slice scans upward over a Javadoc block to reach the real `@Entity` /
  `@MappedSuperclass` annotation while excluding `@author`-style comment tags.
- default-set: registered after complexity; orchestrator test updated for the
  new topological position.

PARITY GATE (the contract): the TS analyze-pass verdicts match the Python
oracle 1368/1368 = 100.0% per-symbol across all four corpus repos
(py-cosmic-ddd / py-flask / java-petclinic / go-clean), independently
re-verified — so the shipped 0.936 plumbing precision / 0.925 business recall
hold through the port. A JPA-entity divergence (Javadoc @author shadowing the
ORM annotation) was caught by the gate at 99.63% and fixed to reach 100%.

Verified: core-types/analysis/ingestion typecheck clean; ingestion 629/629,
core-types 83/83, analysis 14/14; biome + banned-strings pass.
@theagenticguy theagenticguy merged commit a3d44ad into main Jun 22, 2026
38 checks passed
@theagenticguy theagenticguy deleted the feat/business-logic-analyze-pass branch June 22, 2026 21:57
@github-actions github-actions Bot mentioned this pull request Jun 22, 2026
theagenticguy pushed a commit that referenced this pull request Jun 25, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.9.2</summary>

##
[0.9.2](root-v0.9.1...root-v0.9.2)
(2026-06-24)


### Features

* **analysis:** plumbing sieve + candidate_business tag (deterministic,
advisory)
([#248](#248))
([383b719](383b719))
* **ingestion:** business-logic analyze phase — populate likely_plumbing
+ candidate_business
([#249](#249))
([a3d44ad](a3d44ad))
* **storage:** single-file SQLite + WASM embedder — zero native
dependencies
([#245](#245))
([c72c84f](c72c84f))


### Bug Fixes

* **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent
bugs ([#247](#247))
([90f40a2](90f40a2))
</details>

<details><summary>cli: 0.9.2</summary>

##
[0.9.2](cli-v0.9.1...cli-v0.9.2)
(2026-06-24)


### Features

* **storage:** single-file SQLite + WASM embedder — zero native
dependencies
([#245](#245))
([c72c84f](c72c84f))


### Bug Fixes

* **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent
bugs ([#247](#247))
([90f40a2](90f40a2))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant