Skip to content

feat(analysis): plumbing sieve + candidate_business tag (deterministic, advisory)#248

Merged
theagenticguy merged 3 commits into
mainfrom
feat/business-logic-sieve-analyzer
Jun 22, 2026
Merged

feat(analysis): plumbing sieve + candidate_business tag (deterministic, advisory)#248
theagenticguy merged 3 commits into
mainfrom
feat/business-logic-sieve-analyzer

Conversation

@theagenticguy

@theagenticguy theagenticguy commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

Two deterministic, label-free per-symbol tags for @opencodehub/analysis, the two halves of one idea:

  • classifyPlumbing (precision-first) — flags high-confidence plumbing (serialization, DTO mapping, transport, DI wiring), abstains otherwise. Never asserts "business logic". Plumbing precision 0.936 aggregate, ≥0.85 on every repo (flask 1.00, java 0.94, go 0.92, cosmic 0.89) under per-repo eval.
  • classifyBusinessCandidate (recall-first) — the exact complement: a symbol is a candidate_business unless the sieve is confident it's plumbing. Business recall 0.925 (misses 6 of 80), per-repo 0.80–1.00. This is the "look here for domain logic" tag the user gets at analyze time with no query.

Why this shape

Asserting "this IS business logic" needs a trained classifier and didn't generalize across repos (held-out F1 ~0.3). Subtracting confident plumbing does generalize, because the sieve does. So:

likely_plumbing    — sieve, precision-first (0.94)   "this is plumbing"
candidate_business — !likely_plumbing, recall-first (0.925)  "look here for domain logic"

Recall-first by construction: a symbol only loses the candidate tag when we're confident it's plumbing, so real domain logic can't be silently dropped. The 0.385 candidate precision (tags ~67%) is the intended trade — the tag is the safety net; an optional embedding-derived business_rank (follow-up) orders candidates so the most domain-like surface first. Embeddings become a rank, not a yes/no judge.

Provenance

Distilled from a teacher/student loop — a 3-model LLM panel labeled ~300 symbols across Python/Java/Go, a shallow tree was fit, the cleanest high-precision plumbing leaves were lifted out. Both kernels are pure functions of a small PlumbingFeatures struct (no I/O, no model, no randomness — mirrors page-rank.ts), so verdicts are safe to persist into nodes.payload and survive the graphHash byte-identity contract. The complement invariant (every symbol is either confident-plumbing or a candidate, never both/neither) is pinned by test.

End-user flow (unchanged)

codehub init + codehub analyze [--embeddings]. The tags land in nodes.payload, queryable as payload->>'$.candidate_business'. No labeling, ever — labeling was only the offline method-validation, it does not ship.

Scope

  • This PR: both kernels + types + 14 unit tests (incl. pinned regressions: AbstractRepository → plumbing, Batch.allocate → candidate), exported from the package index.
  • Follow-up: wire the analyze-time pass that computes PlumbingFeatures from the AST and writes likely_plumbing + candidate_business (+ optional embedding business_rank) into nodes.payload.

Also refreshes the stale sql MCP tool description in CLAUDE.md to ADR 0019.

Verification

  • packages/analysis typecheck clean; full suite 154 pass / 0 fail.
  • banned-strings + commitlint + pre-push test hook all pass.

Adds `classifyPlumbing` to @opencodehub/analysis — a pure, deterministic
per-symbol rule that flags high-confidence plumbing (serialization, DTO
mapping, transport, DI wiring) and ABSTAINS everywhere else. It never asserts
"business logic": calling a domain rule plumbing and hiding it is the costly
error, so the rule is tuned for plumbing precision and stays silent when unsure.

Provenance: distilled from a teacher/student loop (3-model LLM panel labeled
~300 symbols across Python/Java/Go; shallow tree fit; two cleanest plumbing
leaves lifted out). Measured plumbing precision: 0.936 aggregate, >= 0.85 on
EVERY repo under per-repo eval (flask 1.00, petclinic 0.94, go-clean 0.92,
cosmic-ddd 0.89). The full business-asserting classifier did not generalize
cross-repo and is intentionally NOT shipped — only the plumbing direction.

Two tiers, both requiring zero domain signal so any real decision vetoes the
sieve:
  - serialization-pure (conf 0.95): serializer call, no domain signal
  - plumbing-no-domain (conf 0.90): plumbing signals, no domain signal, not ORM

Pure function of a small PlumbingFeatures struct (mirrors the page-rank.ts
deterministic-kernel idiom), so the verdict is safe to persist into
nodes.payload and survives the graphHash byte-identity contract. Validated on
python/java/go (SIEVE_VALIDATED_LANGUAGES); other languages should be skipped
by the analyze pass rather than emit an unbacked verdict.

Tests pin the iter-0 regressions: AbstractRepository reads plumbing,
Batch.allocate (domain rule) never does.

Also refreshes the stale `sql` MCP tool description in CLAUDE.md to ADR 0019
(nodes/edges are directly SQL-queryable; cypher is fork-only).
…he sieve

Adds `classifyBusinessCandidate`: a symbol is a business-logic candidate unless
the sieve is confident it is plumbing (`candidateBusiness === !likelyPlumbing`).
This is the "look here for domain logic" tag the user gets at analyze time with
no query, no labels, no embeddings.

Recall-first by construction: a symbol only loses the candidate tag when we are
confident it is plumbing, so real domain logic cannot be silently dropped.
Measured on 286 labeled symbols (Python/Java/Go): business recall 0.925 (misses
6 of 80), per-repo recall 0.80-1.00. Precision 0.385 (tags ~67%) is the intended
trade — the tag is the safety net; an optional embedding-derived rank (follow-up)
orders candidates so the most domain-like surface first.

Same feature inputs as the sieve, so the two tags can never disagree: every
symbol is either confident-plumbing or a candidate, never both, never neither
(pinned by a complement-invariant test). 5 new tests; 14 total in the suite.
@theagenticguy theagenticguy changed the title feat(analysis): deterministic plumbing-sieve classifier (advisory) feat(analysis): plumbing sieve + candidate_business tag (deterministic, advisory) Jun 22, 2026
@theagenticguy theagenticguy merged commit 383b719 into main Jun 22, 2026
38 checks passed
@theagenticguy theagenticguy deleted the feat/business-logic-sieve-analyzer branch June 22, 2026 20:07
@github-actions github-actions Bot mentioned this pull request Jun 22, 2026
theagenticguy added a commit that referenced this pull request Jun 22, 2026
…ing + candidate_business (#249)

## Summary

Makes the merged sieve kernels (#248) end-to-end: `codehub analyze` now
writes `likelyPlumbing` + `candidateBusiness` into `nodes.payload` for
every Function / Method / Constructor / Class / Interface / Struct in a
Python / Java / Go repo. The user gets both concern tags from two
commands (`codehub init` + `codehub analyze`) with **no query, no
labels, no embeddings**:

```
likely_plumbing    — precision-first (~0.94)  "this is plumbing"
candidate_business — recall-first    (~0.93)  "look here for domain logic"
```

Queryable via SQLite JSON1: `SELECT name FROM nodes WHERE
payload->>'$.candidateBusiness' = 'true'`.

## Components

- **core-types** — two optional `CallableShape` fields. Auto-persist
through `nodes.payload`; no storage-adapter change (the SQLite store
rehydrates payload verbatim).
- **`extract/business-logic-features.ts`** — faithful Python→TS port of
the feature extractor. Reproduces the marker logic exactly:
word-boundary / camelCase-component matching, the precise
`nPlumbingSignals` formula (serialization + observability +
getter/setter + dto-mapper-ratio≥0.5), and ORM-base class-head
detection. **44 unit tests.**
- **`pipeline/phases/business-logic.ts`** — the analyze-time phase
(after `complexity`). Slices each symbol body, runs `classifyPlumbing` +
`classifyBusinessCandidate`, re-adds the node with the tags
(richer-entry-wins merge, same contract complexity uses). Python/Java/Go
only; other languages skip silently. The class-head slice scans upward
over a Javadoc block to reach the real `@Entity` / `@MappedSuperclass`
annotation while excluding `@author`-style comment tags.
- **default-set + orchestrator test** — registered after complexity; the
topological-order assertion updated for the new position.

## The parity contract

The whole point of the port is that the shipped numbers survive it. An
independent per-symbol harness diffs the TS analyze-pass verdicts
against the Python oracle across all four corpus repos:

**1368 / 1368 = 100.0% verdict agreement, 0 disagreements.**

This was re-verified independently of the porting agent's own report —
which is how a JPA-entity divergence surfaced (a Javadoc `@author` tag
shadowing the real ORM annotation, dropping `isOrmModel` and flipping 5
entity classes). Caught at 99.63%, fixed to 100%. The 0.936 plumbing
precision / 0.925 business recall hold end-to-end.

## Determinism

`computePlumbingFeatures` and both kernels are pure; files + definitions
iterate in sorted order. Tags are byte-stable across runs (the three
`graphHash`-determinism tests pass), safe under the reproducibility
contract.

## Verification

- core-types / analysis / ingestion typecheck clean.
- ingestion **629/629**, core-types **83/83**, analysis **14/14**.
- biome + banned-strings + commitlint + pre-push hook pass.
theagenticguy pushed a commit that referenced this pull request Jun 25, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.9.2</summary>

##
[0.9.2](root-v0.9.1...root-v0.9.2)
(2026-06-24)


### Features

* **analysis:** plumbing sieve + candidate_business tag (deterministic,
advisory)
([#248](#248))
([383b719](383b719))
* **ingestion:** business-logic analyze phase — populate likely_plumbing
+ candidate_business
([#249](#249))
([a3d44ad](a3d44ad))
* **storage:** single-file SQLite + WASM embedder — zero native
dependencies
([#245](#245))
([c72c84f](c72c84f))


### Bug Fixes

* **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent
bugs ([#247](#247))
([90f40a2](90f40a2))
</details>

<details><summary>cli: 0.9.2</summary>

##
[0.9.2](cli-v0.9.1...cli-v0.9.2)
(2026-06-24)


### Features

* **storage:** single-file SQLite + WASM embedder — zero native
dependencies
([#245](#245))
([c72c84f](c72c84f))


### Bug Fixes

* **storage:** purge stale lbug/DuckDB refs after ADR 0019; fix 2 latent
bugs ([#247](#247))
([90f40a2](90f40a2))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant