perf(tbr): IW/XPIWE/NA x4 reroot batch + dirty-region (default-OFF, opt-in)#256
Merged
Conversation
Cherry-pick of 69febb4 onto cpp-search (post-#254). Resolved the exact_verify candidate-scoring conflict by keeping #254's incremental rescore path and dropping the directional-audit (na_dir_audit) block that #254 deliberately removed. Stripped the settled-dead ts_iw_gather_bench microbench (gather micro-opt proven a dead heat) and trimmed superseded dev A/B harnesses, keeping only bench_iw_realized.R. Opts remain kill-switched (TS_IW_NOX4 / TS_IW_NODIRTY); a follow-up commit flips them default-off for the shared branch. Mission-null at morphology scale; kept for the large-N / recipe-retune reopen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rflow
The extract_char_steps dirty-region + x4 reroot-batch opts were gated to
ScoringMode::IW, but production MaximizeParsimony defaults to extended IW
(ScoringMode::XPIWE, ts_data.cpp:297). So the opts were dead on the production
path for every dataset, and the mission-wall A/B (which toggles TS_IW_NOX4/
NODIRTY through MaximizeParsimony) was toggling code that never executed -> the
1.005x "null result". Widen the gate to iw_family = (IW || XPIWE): both modes
score via compute_iw/precompute_iw_delta from the same weighting-agnostic
char_steps, differing only in per-pattern eff_k/phi (which those functions
already consume). PROFILE stays excluded (compute_profile + info_amounts +
precomputed_steps is a genuinely different char_steps convention).
Perf-only / byte-identical: opts-on vs opts-off scores identical across
Dikow/Vinther/Zanol/Giles/Wortley x 3 seeds; SCANCHK clean on XPIWE.
Firing the opts on the production path exposed a pre-existing dirty-region bug:
the nx_cs accumulation did not skip active_mask==0 blocks, but extract_char_steps
(which builds the F cache) does. Under a ratchet ZERO_ONLY perturbation (the
default, ratchetCycles>=3) that fully deactivates a block, divided_steps =
F + cs_delta - nx_cs went negative (TS_IW_DIRTYCHK: dirty=-1 ref=0 on
Dikow2009/Vinther2008). Weighting-independent (plain IW mismatches identically;
reachable today via MaximizeParsimony(extended_iw=FALSE)). The candidate scan
masks needs_step by active_mask so scores were unaffected, but the invariant was
broken. Fix: skip active_mask==0 in the nx_cs loop only; EW nx_cost is left
counting all blocks (its best_score/delta length convention is self-consistent).
DIRTYCHK now clean across 5 datasets x ratchetCycles {0,3,6}.
Add a testthat guard pinning XPIWE opts-on == opts-off (tolerance=0) under
ratchet (the discriminating config; prior tests ran plain-IW or ratchetCycles=0
and were tautological for this path). 398 existing tests still pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…[perf TBD] Port the T-245 4-wide reroot ILP batch to the implied-weights + inapplicable (IW/XPIWE + NA) TBR scan -- the one scoring path that never received it. The NA reroot scan already had an EW-NA x4 (fitch_na_indirect_cached_flat_x4) but IW-NA/XPIWE-NA fell to the one-at-a-time scalar indirect_na_iw_length_cached; the IW x4 branch was !has_na-gated. New kernel indirect_na_iw_cached_flat_x4 (src/ts_fitch.cpp) fuses the NA active-mask candidate logic of the EW-NA x4 (from1 reduce, shared clip_has_active, per-candidate below_actives AND) with the per-candidate iw_delta ctz-gather of indirect_iw_cached_flat_x4. Each accumulator keeps the scalar add order of indirect_na_iw_length_cached, so per-candidate results are bit-identical; the shared all-4-exceed-cutoff bail only changes early-exit on cutoff-losing candidates. Wired by widening the IW x4 branch gate (ts_tbr.cpp:1985) to dispatch on has_na, mirroring its own per-candidate skip re-check and main_edges[ei] indexing; the no-NA path stays byte-identical. Gated on iw_family (IW||XPIWE), so it fires on the production XPIWE+NA path (MaximizeParsimony default) -- the first banked IW kernel opt that lands on the NATIVE inapplicable-bearing corpus rather than only recoded matrices. The dirty-region scan shortcut is deliberately NOT ported (stays !has_na): the top-down down2 pass breaks the path-bounded F+cs_delta-nx_cs decomposition, and it is the smaller (once-per-clip) prize. CORRECTNESS validated, PERF still unknown (Hamilton A/B pending): - byte-identical x4-on vs TS_IW_NOX4-off: 24/24 (15 direct IW-NA ts_tbr_search + 9 MaximizeParsimony XPIWE-NA under ratchet; datasets at 10-38% NA) - existing independent full-rescore oracle passes; 519 testthat pass - kernel confirmed to execute (throw-probe), so the byte-identity is not vacuous - new regression guard in test-ts-tbr-dirty-rescore.R keeps inapplicables Kill-switch TS_IW_NOX4 (shared with the no-NA x4). NOT for cpp-search merge until the A/B reports and the getenv-consolidation merge-gate is addressed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Flip the kill-switch semantics from default-on (TS_IW_NOX4 / TS_IW_NODIRTY) to opt-in (TS_IW_X4 / TS_IW_DIRTY). These opts are mission-null at morphology scale (validated byte-identical, 1.005x wall) and are preserved on the shared branch only for the large-N / recipe-retune reopen, so they should impose nothing by default. Test guards updated in lockstep: the "on" arm now explicitly enables the opt so the kernel still fires (data triggers the iw_family + has_na gate as before); byte-identity vs the scalar baseline is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Preserve the dilution-free element-level and mission-wall A/B harnesses for the NA-IW x4 reroot batch (updated to the TS_IW_X4 opt-in env var), plus the two large real inapplicable-bearing matrices (Sun2018, lobo) the large-N reopen will need to re-test whether the scan share rises. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…br.cpp The worktree's 69febb4 bundled the exact directional (Regime-C) NA scoring audit/scorer into ts_tbr.cpp alongside the x4/dirty opts; the header it needs (ts_fitch_na_directional.h, added separately by b2e03a9) is not on cpp-search and was deliberately excluded (the directional path is dead -- 24-89x slower than SIMD full_rescore). Strip the include, na_dir_audit / na_dir_scorer setup, whole-tree cross-check, build_clip_folds, and the per-candidate directional fast-path so exact_verify_sweep matches cpp-search's (#254 incremental) version. Kept <chrono> (needed by the iw_timing diagnostic). Builds clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Preserves the IW/XPIWE/NA-aware x4 reroot batch and extract_char_steps dirty-region TBR opts on
cpp-searchso they survive the disposable worktree they were built on. Mission-null at morphology scale, but the kernel-level win is real (NA-IW x4 cuts the reroot scan 10–14%; no-NA x4+dirty 1.1–1.2× on a full direct climb) — kept default-OFF, opt-in for the large-N / recipe-retune reopen.What's here
indirect_iw_cached_flat_x4— pure-IW 4-wide reroot batch (T-245 ILP ported to IW).indirect_na_iw_cached_flat_x4— the first x4 kernel that crosses the!has_nagate onto the native inapplicable-bearing corpus (serves plain IW-NA and the production XPIWE-NA path).divided_steps = F + cs_delta − nxinstead of the O(n_node) walk.iw_family = IW || XPIWE) so the opts reach the production scoring mode, plus a genuinenx_csactive-mask underflow fix (a real broken invariant reachable today viaMaximizeParsimony(extended_iw = FALSE)rc≥3, independent of these opts).Default-OFF / opt-in
Flipped from the worktree's default-on kill-switches to opt-in env vars
TS_IW_X4andTS_IW_DIRTY(matches the recorded "do not default-on merge" disposition; byte-identical either way, so nothing is imposed on the shared branch by default).Correctness (hard gate)
Rebuilt on
cpp-search(post-#254) and ran both suites:test-ts-tbr-dirty-rescore.R— x4/dirty byte-identity guards (pure-IW, XPIWE, and NA-IW), updated to the opt-in env vars; the "on" arm explicitly enables the kernel so firing is preserved by construction.test-ts-na-incremental.R— perf(na): incremental exact_verify rescore — default-on (~25-30% native-NA mission wall) #254's byte-identity tests, incl. the per-candidate audit (5658/5658 candidates byte-matched full_rescore), confirming the x4/dirty interleave did not perturb the incremental rescore.Deliberately excluded
ts_iw_gather_bench) — settled a dead heat; not added to the shared branch.ts_fitch_na_directional.h,na_dir_audit/na_dir_scorer) — the worktree's69febb48bundled its usage intots_tbr.cpp; stripped soexact_verify_sweepmatchescpp-search's perf(na): incremental exact_verify rescore — default-on (~25-30% native-NA mission wall) #254 version (the directional path is dead, 24–89× slower).Supporting infra (dev/)
NA-IW x4 element + mission A/B harnesses (on
TS_IW_X4) and the two large real NA matrices (Sun2018,lobo) the large-N reopen will need.Deferred / caveats
TS_IW_TIMING/DIRTYCHK/SCANCHK) remain once-per-tbr_search-call (sub-1%); getenv consolidation deferred.do_rerootis off under constraints anyway).🤖 Generated with Claude Code