Skip to content

perf(tbr): IW/XPIWE/NA x4 reroot batch + dirty-region (default-OFF, opt-in)#256

Merged
ms609 merged 6 commits into
cpp-searchfrom
claude/na-iw-x4-keep
Jun 23, 2026
Merged

perf(tbr): IW/XPIWE/NA x4 reroot batch + dirty-region (default-OFF, opt-in)#256
ms609 merged 6 commits into
cpp-searchfrom
claude/na-iw-x4-keep

Conversation

@ms609

@ms609 ms609 commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Preserves the IW/XPIWE/NA-aware x4 reroot batch and extract_char_steps dirty-region TBR opts on cpp-search so they survive the disposable worktree they were built on. Mission-null at morphology scale, but the kernel-level win is real (NA-IW x4 cuts the reroot scan 10–14%; no-NA x4+dirty 1.1–1.2× on a full direct climb) — kept default-OFF, opt-in for the large-N / recipe-retune reopen.

What's here

  • indirect_iw_cached_flat_x4 — pure-IW 4-wide reroot batch (T-245 ILP ported to IW).
  • indirect_na_iw_cached_flat_x4 — the first x4 kernel that crosses the !has_na gate onto the native inapplicable-bearing corpus (serves plain IW-NA and the production XPIWE-NA path).
  • extract_char_steps dirty-region — per-clip divided_steps = F + cs_delta − nx instead of the O(n_node) walk.
  • XPIWE gate widening (iw_family = IW || XPIWE) so the opts reach the production scoring mode, plus a genuine nx_cs active-mask underflow fix (a real broken invariant reachable today via MaximizeParsimony(extended_iw = FALSE) rc≥3, independent of these opts).

Default-OFF / opt-in

Flipped from the worktree's default-on kill-switches to opt-in env vars TS_IW_X4 and TS_IW_DIRTY (matches the recorded "do not default-on merge" disposition; byte-identical either way, so nothing is imposed on the shared branch by default).

Correctness (hard gate)

Rebuilt on cpp-search (post-#254) and ran both suites:

  • test-ts-tbr-dirty-rescore.R — x4/dirty byte-identity guards (pure-IW, XPIWE, and NA-IW), updated to the opt-in env vars; the "on" arm explicitly enables the kernel so firing is preserved by construction.
  • test-ts-na-incremental.Rperf(na): incremental exact_verify rescore — default-on (~25-30% native-NA mission wall) #254's byte-identity tests, incl. the per-candidate audit (5658/5658 candidates byte-matched full_rescore), confirming the x4/dirty interleave did not perturb the incremental rescore.

Deliberately excluded

Supporting infra (dev/)

NA-IW x4 element + mission A/B harnesses (on TS_IW_X4) and the two large real NA matrices (Sun2018, lobo) the large-N reopen will need.

Deferred / caveats

  • Diagnostic env reads (TS_IW_TIMING/DIRTYCHK/SCANCHK) remain once-per-tbr_search-call (sub-1%); getenv consolidation deferred.
  • Constrained/MPT not separately exercised (opts are scoring-only; do_reroot is off under constraints anyway).

🤖 Generated with Claude Code

ms609 and others added 6 commits June 23, 2026 14:34
Cherry-pick of 69febb4 onto cpp-search (post-#254). Resolved the
exact_verify candidate-scoring conflict by keeping #254's incremental
rescore path and dropping the directional-audit (na_dir_audit) block
that #254 deliberately removed. Stripped the settled-dead
ts_iw_gather_bench microbench (gather micro-opt proven a dead heat) and
trimmed superseded dev A/B harnesses, keeping only bench_iw_realized.R.

Opts remain kill-switched (TS_IW_NOX4 / TS_IW_NODIRTY); a follow-up
commit flips them default-off for the shared branch. Mission-null at
morphology scale; kept for the large-N / recipe-retune reopen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rflow

The extract_char_steps dirty-region + x4 reroot-batch opts were gated to
ScoringMode::IW, but production MaximizeParsimony defaults to extended IW
(ScoringMode::XPIWE, ts_data.cpp:297). So the opts were dead on the production
path for every dataset, and the mission-wall A/B (which toggles TS_IW_NOX4/
NODIRTY through MaximizeParsimony) was toggling code that never executed -> the
1.005x "null result". Widen the gate to iw_family = (IW || XPIWE): both modes
score via compute_iw/precompute_iw_delta from the same weighting-agnostic
char_steps, differing only in per-pattern eff_k/phi (which those functions
already consume). PROFILE stays excluded (compute_profile + info_amounts +
precomputed_steps is a genuinely different char_steps convention).

Perf-only / byte-identical: opts-on vs opts-off scores identical across
Dikow/Vinther/Zanol/Giles/Wortley x 3 seeds; SCANCHK clean on XPIWE.

Firing the opts on the production path exposed a pre-existing dirty-region bug:
the nx_cs accumulation did not skip active_mask==0 blocks, but extract_char_steps
(which builds the F cache) does. Under a ratchet ZERO_ONLY perturbation (the
default, ratchetCycles>=3) that fully deactivates a block, divided_steps =
F + cs_delta - nx_cs went negative (TS_IW_DIRTYCHK: dirty=-1 ref=0 on
Dikow2009/Vinther2008). Weighting-independent (plain IW mismatches identically;
reachable today via MaximizeParsimony(extended_iw=FALSE)). The candidate scan
masks needs_step by active_mask so scores were unaffected, but the invariant was
broken. Fix: skip active_mask==0 in the nx_cs loop only; EW nx_cost is left
counting all blocks (its best_score/delta length convention is self-consistent).
DIRTYCHK now clean across 5 datasets x ratchetCycles {0,3,6}.

Add a testthat guard pinning XPIWE opts-on == opts-off (tolerance=0) under
ratchet (the discriminating config; prior tests ran plain-IW or ratchetCycles=0
and were tautological for this path). 398 existing tests still pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…[perf TBD]

Port the T-245 4-wide reroot ILP batch to the implied-weights + inapplicable
(IW/XPIWE + NA) TBR scan -- the one scoring path that never received it. The NA
reroot scan already had an EW-NA x4 (fitch_na_indirect_cached_flat_x4) but
IW-NA/XPIWE-NA fell to the one-at-a-time scalar indirect_na_iw_length_cached;
the IW x4 branch was !has_na-gated.

New kernel indirect_na_iw_cached_flat_x4 (src/ts_fitch.cpp) fuses the NA
active-mask candidate logic of the EW-NA x4 (from1 reduce, shared
clip_has_active, per-candidate below_actives AND) with the per-candidate
iw_delta ctz-gather of indirect_iw_cached_flat_x4. Each accumulator keeps the
scalar add order of indirect_na_iw_length_cached, so per-candidate results are
bit-identical; the shared all-4-exceed-cutoff bail only changes early-exit on
cutoff-losing candidates. Wired by widening the IW x4 branch gate
(ts_tbr.cpp:1985) to dispatch on has_na, mirroring its own per-candidate skip
re-check and main_edges[ei] indexing; the no-NA path stays byte-identical.

Gated on iw_family (IW||XPIWE), so it fires on the production XPIWE+NA path
(MaximizeParsimony default) -- the first banked IW kernel opt that lands on the
NATIVE inapplicable-bearing corpus rather than only recoded matrices.

The dirty-region scan shortcut is deliberately NOT ported (stays !has_na): the
top-down down2 pass breaks the path-bounded F+cs_delta-nx_cs decomposition, and
it is the smaller (once-per-clip) prize.

CORRECTNESS validated, PERF still unknown (Hamilton A/B pending):
- byte-identical x4-on vs TS_IW_NOX4-off: 24/24 (15 direct IW-NA ts_tbr_search +
  9 MaximizeParsimony XPIWE-NA under ratchet; datasets at 10-38% NA)
- existing independent full-rescore oracle passes; 519 testthat pass
- kernel confirmed to execute (throw-probe), so the byte-identity is not vacuous
- new regression guard in test-ts-tbr-dirty-rescore.R keeps inapplicables

Kill-switch TS_IW_NOX4 (shared with the no-NA x4). NOT for cpp-search merge
until the A/B reports and the getenv-consolidation merge-gate is addressed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Flip the kill-switch semantics from default-on (TS_IW_NOX4 /
TS_IW_NODIRTY) to opt-in (TS_IW_X4 / TS_IW_DIRTY). These opts are
mission-null at morphology scale (validated byte-identical, 1.005x wall)
and are preserved on the shared branch only for the large-N /
recipe-retune reopen, so they should impose nothing by default.

Test guards updated in lockstep: the "on" arm now explicitly enables the
opt so the kernel still fires (data triggers the iw_family + has_na gate
as before); byte-identity vs the scalar baseline is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Preserve the dilution-free element-level and mission-wall A/B harnesses
for the NA-IW x4 reroot batch (updated to the TS_IW_X4 opt-in env var),
plus the two large real inapplicable-bearing matrices (Sun2018, lobo)
the large-N reopen will need to re-test whether the scan share rises.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…br.cpp

The worktree's 69febb4 bundled the exact directional (Regime-C) NA
scoring audit/scorer into ts_tbr.cpp alongside the x4/dirty opts; the
header it needs (ts_fitch_na_directional.h, added separately by b2e03a9)
is not on cpp-search and was deliberately excluded (the directional path
is dead -- 24-89x slower than SIMD full_rescore). Strip the include,
na_dir_audit / na_dir_scorer setup, whole-tree cross-check,
build_clip_folds, and the per-candidate directional fast-path so
exact_verify_sweep matches cpp-search's (#254 incremental) version. Kept
<chrono> (needed by the iw_timing diagnostic). Builds clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ms609 ms609 merged commit 20c96e5 into cpp-search Jun 23, 2026
4 of 7 checks passed
@ms609 ms609 deleted the claude/na-iw-x4-keep branch June 23, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant