C++ search#210
Draft
ms609 wants to merge 803 commits into
Draft
Conversation
ms609
added a commit
that referenced
this pull request
Mar 28, 2026
ms609
added a commit
that referenced
this pull request
Mar 28, 2026
Phase 1 diagnostic completed 2026-03-29. Hypothesis falsified: tip clips are UNDER-represented in TBR acceptances (0.43-0.76x enrichment across 4 datasets). Medium-small clips most productive. All three ordering variants (inv-weight, tips-first, bucket) favour tips — counterproductive. Branch feature/weighted-clip-order closed. See completed-tasks.md entry PA-001 and AGENTS.md item 12.
5 datasets (62-180t), 20 seeds, EW/IW10/IW3. IW hypothesis weak signal (closed). Real finding: XSS benefit scales with tree size. At 180t: TAEB delta -6.8 to -9.8 EW steps (12-19% overhead). At ≤88t: zero TAEB benefit. No preset change needed.
Stage 5 benchmark (SLURM 16622483, EPYC 7702, 5 datasets 131-206t, 10 seeds, 60s+120s) showed pr_nni (NNI full-tree polish) fixes the Stage 4 showstopper (0 reps at 206t/60s) while improving 131-180t: project3701 (146t): -178 steps at 60s, -128 at 120s project804 (173t): -9 / -2 steps mbank_X30754(180t): -4 / -7 steps syab07205 (206t): +17.5 at 60s, neutral at 120s Enable in large preset: pruneReinsertCycles=5L, pruneReinsertNni=TRUE. Update AGENTS.md and completed-tasks.md. Results in dev/benchmarks/t289f_pr_nni_polish.csv.
…_search When params.nni_full is true but a ConstraintData is active, guard falls through to TBR (which enforces constraints). One-line change mirroring the nni_wagner guard in ts_driven.cpp. Only affects users who combine pruneReinsertNni=TRUE with topological constraints; no preset does this. Also: S-COORD round 46 (task queue, PR status), to-do cleanup.
Agents now check remote-jobs.md at /assign time (new step 4) for retrievable results before claiming tasks. Prevents SLURM results from being silently lost across conversation boundaries.
C++ instrumentation of tbr_search() with post-acceptance sector-masked TBR on clip subtree. Hit rate ~35% regardless of scoring mode (no IW-specific benefit), but NET HARMFUL: disrupts global TBR trajectory. mbank_X30754 EW: +17 to +34 steps TAEB at 30-120s. Validates existing pipeline design (XSS as separate post-convergence phase). Closed.
Phase 1 (a159311) added diagnostic instrumentation and the TIPS_FIRST, INV_WEIGHT, BUCKET, ANTI_TIP, LARGE_FIRST ordering variants to ts_tbr.cpp. Phase 2 completes the implementation: Bug fix: clip_order was only propagated to the initial TBR and final TBR polish (~10% of replicate time). The ratchet and all sectorial TBR calls defaulted to RANDOM, making the ordering variants effectively inert for the dominant phase (ratchet ~76%). Fix: add clip_order field to RatchetParams and SectorParams, propagate from SearchControl through ts_driven.cpp into every TBR call site in ts_ratchet.cpp and ts_sector.cpp (6 sites + search_sector signature). Empirical validation (5 seeds, 30s, default config): Agnarsson2004 (62t, default preset): TIPS_FIRST -2%, INV_WEIGHT neutral Zhu2013 (75t, thorough preset): TIPS_FIRST +13%, INV_WEIGHT +9% Dikow2009 (88t, thorough preset): TIPS_FIRST +8%, INV_WEIGHT +3% Theoretical model (Poisson bucket, corrected): TIPS_FIRST saves ~48% per productive TBR pass at 88t; practical throughput gain is ~8-13% because null passes (ordering-invariant, exhaust all clips) dilute savings. Benefit is dataset-size dependent: < ~65t: tip enrichment is low (Agnarsson2004: 0.43); TIPS_FIRST neutral 65-120t (thorough): tip enrichment moderate; TIPS_FIRST +8-13% No preset defaults changed yet — pending GHA 10-seed validation. bench_clip_ordering.R contains the full benchmark driver.
The SearchControl.Rd usage section was generated from an old installed
build (missing clipOrder and many parameters added since). The codoc
check correctly flagged the mismatch.
- Added @param clipOrder documentation in R/SearchControl.R
- Regenerated man/SearchControl.Rd with correct \usage and \item{clipOrder}
TBR clip-ordering strategy (SearchControl clipOrder)
When concordance mode uses the dataset (qc/mcc/spc/clc/phc) and the plotted tree's taxa don't match the dataset's names, return NULL rather than passing mismatched objects to QuartetConcordance/LabelSplits etc. This prevents the `mat[i, j, ..., drop=FALSE]: subscript out of bounds` crash when loading trees with non-overlapping taxa (T-293) or reloading a dataset with removed taxa (T-300). Remove T-292, T-293, T-300 from to-do.md (all fixed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ts_search.cpp spr_search uses the bounded scorer but is off-default (sprFirst=FALSE everywhere), exact-verify-gated (never false-accepts), and a warmup washed by the subsequent exact tbr_search → silent-miss mooted, no action. All remaining union-of-finals sites accounted for and benign. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Read of ts_driven.cpp orchestration: per-phase score_tree prints are verbosity>=2-gated (off at default v=1L); only un-gated full rescores are the 2 per-outer-cycle convergence checks (~µs each, ~0.001% wall, one redundant but sub-floor) + 1 final/replicate. Step-switching minimal (each phase owns its state). R/C marshalling already T-P5o'd as amortizable. Last undone non-gated aspect of the isolation plan; addressable wall now lives in composition #40. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…I blocker - ns=9 representation/bit-packing reopen CLOSED analytically: transposed bitset already bit-dense (0.14 op/pattern); states-per-word packing serializes patterns -> strictly worse; scalar reopen is ns<=4 only. - Cherry-pick build-check PASSED (HEAD: fuse 22/0, tbr 28/0, prune 44/0). - Hamilton mission-KPI re-measure BLOCKED: ratchet 12->6 flip is uncommitted shared WIP; cannot define clean reproducible code-state unattended. Flagged for user. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (closure holds) Settles the unverified literature pillar of the T-P5p TBR closure. Primary- source check (one full-text chapter + Goloboff 1996 abstract): TNT/Goloboff builds equivalent two-pass down+up state sets and scores reinsertion by a root-to-root comparison — same structure as TreeSearch's edge_set[D]. Disambiguates two amortization levels: Level-1 (per-candidate, within one clip) TS already matches (full-text confirmed); Level-2 (per-clip incremental view derivation) = the already-deferred lever-b, supported by the 1996 abstract only (unread full text) → revisit at large-N, not via literature. TBR closure HOLDS, now on stronger evidence. Minor: Goloboff's up-aware approximate "check one node" screen differs from lever-c's up-ignoring admissible bounds (flagged if lever-c ever reopens; it screens, doesn't bound). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Profiling (T-P5d, 2026-06-19) found the ratchet over-provisioned: halving cycles saved 20-38% wall on the mid-size EW benchmarks (Wills/Zanol/Zhu/ Giles) at zero quality loss (gapB unchanged at full budget). Flips the formal SearchControl default and the `default` strategy preset; updates the vignette. The `large` preset deliberately keeps 12 (large-tree tradeoff, T-179). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n race (#39 closed) Post-flip cpp-search KPI (Hamilton, freshness-asserted ratchetCycles==6L): - QUALITY CLOSED: TS reaches the optimum on every dataset/seed; on Zanol (ns=9) TS is the ONLY reliably-1261 config (TNT fast configs miss +1). - Wall gap is NOT algorithmic: candidate-efficiency ~1.2-1.9x near-parity (count-based), throughput ~2x at-limit; the 8-110x is a default-budget mismatch (TS default heavy / TNT default light), corrected from an initial overreach (advisor). - #39 CLOSED: ratchet isolated race = cycle-quality PARITY (TNT does NOT reach the optimum in fewer reweight cycles) + ~2x at-limit throughput, no lever. - Component-isolation program now COMPLETE; only composition #40 (gated, modest + reliability-bounded) remains. Adds the ratchet-race driver, KPI CSVs, and the component-isolation plan STATUS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…te before #40 All components closed on both gates (ratchet cycle-parity race 2026-06-21). Resolve the stale Next-task/TBD sections; add the pre-composition fresh-eyes re-audit gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kernels stand Adversarial re-audit (27 agents, 8 lenses): 18 candidates -> 3 survived -> 15 killed. Core kernel/TBR throughput verdicts STAND; no second getenv-class hotspot. Survivors: fuse value stale post-reroot-fix (#55, re-measuring), sectorial column-axis reduction (#56), x4 reroot wasted-block (#57, weak). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lock counter (#57) #55: capture FuseResult + verbosity>=2 'Fuse attempt' print (pool size + n_exchanges) to distinguish fires-but-useless from never-fires (pool-collapse). #57: TS_AUDIT_PROBE-gated counter in fitch_indirect_cached_flat_x4 measuring blocks scanned past each member's individual bail (the x4 'deepest-bailing member' ceiling). Default build unaffected (counter fully #ifdef'd; print is verbosity>=2 only). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…UDIT_PROBE) Measures the realized informative-within-sector fraction on the actual sectors: a char is droppable iff some state is shared by ALL sector tips (incl HTU) -> 0 Fitch steps -> ranking-preserving. fp/tot_blocks = the no-bail precompute saving (compute_insertion_edge_sets scans all n_blocks/node). Inert in production. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s (TS_AUDIT_PROBE) RAII timer to measure the no-bail precompute's share of SECTOR wall (the load-bearing multiplier in #56's saving estimate). Inert in production.
…e namespace ts The precompute timer's mid-namespace #include <chrono> made std::ratio parse as ts::std::ratio (compile error under -DTS_AUDIT_PROBE). Move includes to global scope. Default build unaffected (all ifdef'd).
…OLREDUCE)
Drops characters constant-within-{sector tips + HTU} (0 Fitch steps -> scores
stay exact) and re-packs informative survivors into fewer blocks, shrinking the
per-node block scan in the inner-sector TBR (esp. the no-bail precompute
compute_insertion_edge_sets). EW only (weight 1, no upweight, no inapplicable).
Off by default.
Validated (Hamilton 17533059): dScore=0 on 9/9 full searches, valgrind clean,
adversarial review verified the 0-step invariance + bit arithmetic. The review
also caught (and this fixes) a stale rd.subtree stride that would OOB. rss-
isolated saving: Giles 17%, Zhu 9%, Zanol ~0% (uniform ns=9 = least reduction =
the load-bearing case). Changes the search trajectory on mixed-n_states data
(dCand!=0, equally-optimal path) => OPT-IN, NOT a default flip. Before any
default-on: run a sector-score oracle (reduced vs full, same topology, mixed
state); an accept-gated search cannot discriminate a masked packing bug.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Every MaximizeParsimony/SearchControl switch + preset + the opt-in env levers (TS_SECT_COLREDUCE), each with a when-relevant assessment grounded in this session's findings (ratchet 6, fuse=dead-weight, col-reduce mixed-state-only, rasStarts, prune-reinsert >=120t, clipOrder, the 3x trailing-TBR consolidation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…5-6% slower) Force-scalar A/B (17533065): GATE dScore=0 & dCand=0 9/9; speedup x4/scalar Giles 0.939 / Zhu 0.945 / Zanol 1.001. ~1.9% waste ceiling not realizable (x4 ILP covers it). All 3 audit survivors now resolved. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ot global Jobs 17533071 (20-rep) + 17541277 (40-rep), 3 seeds EW. clipOrder=2 (tips-first) ~1.25x faster / ~26% fewer candidates, but biases the trajectory: clean ~1.5x win on Zanol (uniform ns=9, 3/3 optima); +1 quality tradeoff on Zhu that 2x budget does NOT recover; wall-unstable on Giles. Complements TS_SECT_COLREDUCE. Default stays 0L. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ve-NA mission wall) (#254) * perf(na): incremental dirty rescore for exact_verify candidates (default-off) exact_verify_sweep is ~95% of native-NA TBR/mission wall: it scores every TBR neighbour of the converged tree by apply_tbr_move + full three-pass full_rescore (NA needs the exact sweep because its indirect scan is approximate). This makes each candidate an INCREMENTAL rescore instead. - 3-seed dirty passes: fitch_na_dirty_{down,up}pass gain an optional third seed (default -1 = no-op, so the SPR accept path is unchanged). exact_verify seeds {nz (sibling reconnect), nx (regraft node), clip_node (covers the reversed subtree path + its rootward chain)} -- covering BOTH SPR and reroot candidates. - Per candidate: dirty Pass1/Pass2 (O(dirty) not O(n)) + full Pass3 + per-pattern extract + compute_weighted (IW/XPIWE/PROFILE) or +ew_offset (EW); on reject, restore_prealloc_undo/restore_saved_states undoes the dirty state before restore_topology, keeping the base state valid for the next candidate. Behind TS_NA_INCR (fast path) and TS_NA_INCR_AUDIT (cross-check vs full_rescore); both DEFAULT OFF, so production is byte-identical (519 testthat pass; 3-seed default -1). VALIDATION: - TS_NA_INCR_AUDIT: every candidate's incremental score byte-matched full_rescore -- 5486..13674 candidates each on Vinther/Longrich/DeAssis x {EW, IW}. - End-to-end byte-identical search outcome (TS_NA_INCR on vs off): 18/18 (3 datasets x 2 regimes x 3 starts). - Element wall (exact_verify-heavy direct climb): Zanol 1.12x, Dikow 1.21x, scores identical. (Below the ~2.5x ceiling: NA final_ uppass propagation makes the dirty region large; Pass3/extract stay full -- headroom remains.) Mission A/B pending (should translate, unlike the washed extract-fusion: this speeds exact_verify directly rather than relocating work). NOT default-on / not for cpp-search until the mission A/B + broader byte-identity confirm. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(na): fuse per-pattern extraction into fitch_na_pass3_score (incremental path) The incremental exact_verify path ran fitch_na_pass3_score AND a separate extract_char_steps over the full tree -- both recompute the identical per-node step bits (standard from local_cost, NA from the Pass3 needs_step formula). Add an optional char_steps_out to fitch_na_pass3_score that buckets per-pattern during its existing walk; the IW incremental path uses it and skips extract. Unlike the dirty Pass1/2 (capped by NA's ~0.4-0.48 dirty fraction), this is a redundant-walk ELIMINATION, not dirty-fraction-limited. Default-nullptr param => all other callers byte-identical. Element wall (exact_verify-heavy climb), incremental + this fold vs legacy: Zanol 1.23x (was 1.12x), Dikow 1.32x (was 1.21x); scores byte-identical. Audit byte-matches full_rescore on every candidate; fast-vs-legacy 18/18 byte-identical; 519 tests pass (production unchanged). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(na): make incremental exact_verify rescore the production default Flip the na_incr gate to default-ON (kill-switch TS_NA_NOINCR restores the legacy full_rescore path). The incremental dirty rescore is byte-identical to legacy (per-candidate audit + 180/180 full-roster climbs + 40/40 mission cells) and cuts native-NA mission wall ~25-30% (1.30x mean, p=2e-06). Disabled in TS_NA_INCR_AUDIT mode (which uses full_rescore for decisions). getenv read is per-convergence (exact_verify), not per-candidate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(na): guard incremental exact_verify rescore (default vs TS_NA_NOINCR + audit) Two enduring tests: (1) the default incremental path is byte-identical to the legacy full_rescore (TS_NA_NOINCR) across Vinther/DeAssis x {EW, IW} x starts; (2) TS_NA_INCR_AUDIT runs clean (per-candidate incr == full_rescore, else stop). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Fix SearchControl expectation and spelling terms (#255) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
…efault-OFF (#B2) Adds compute_collapsed_flags_aggressive: collapse internal branches of minimum possible length 0 (final[p] & final[c] != 0 per char), validated bit-for-bit against a brute-force MPR oracle (0/206 internal, fp=0). Gated TS_COLLAPSE_AGGRESSIVE (default OFF), scoped to tbr_search neighbourhood reduction; pool dedup keeps exact flags; NA datasets fall back to conservative. Byte-identical OFF (156/156 tests). Inert on the char-rich roster (0% density at optimum) but the mbank corpus has a char-poor tail where density is high; b2_speed_cell.R + hamilton_b2_speed_array.sh run the wall-clock-to-optimum ON/OFF A/B there. Diagnostic export ts_collapsed_flags_debug + b2_*.R probes + #40 handoff briefing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…liable speed lever Hamilton job 17591484 (896 cells, 10 char-poor + 4 control × 32 seeds): faster-frac 42%, median speedup ≈ 0, dScore ≈ 0; project2144 ≈1.6× SLOWER. char/tip ratio does not predict winners → no selector → CLOSED. Add b2_speed_analyze.R (paired wall-clock-to-floor analysis script). Briefing updated with "TESTED, NOT RECOMMENDED" TL;DR for #40. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trategy briefing Prototype: ts_driven.cpp / ts_parallel.cpp wired with TS_FUSE_PAIRWISE env flag. Functional on 88t (validated), no quality gain over matched-wall-clock multistart on 4 hardest datasets; TS reaches TNT optimum everywhere → recombination NOT a quality lever, residual = budget. B1 closed; flag retained for completeness. Add pairwise-fuse strategy briefing and zero-exchanges analysis for #40. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…est-known README Add basin_diversity.R (B1 harness), fuse_efficiency.R (FeRun/FeAnalyze helpers), basin_cell.R / hamilton_basin_array.sh / hamilton_basin_build.sh / hamilton_fe_array.sh (Hamilton dispatch scripts), b3b1_endstate_probe.R (endstate sampling probe), README-best-known-targets.md (canonical target derivation guide), .gitignore (exclude *_pull/ *_partials/ *.rds *.pdf binary results). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…reproducible The Monte Carlo profile-parsimony information estimate (used when the exact solver is infeasible) scores random trees via mc_fitch_scores() -> random_tree() in build_postorder.h. random_tree() drew from an unseeded global Marsaglia MWC generator whose state was a fixed initial constant that advanced across calls and ignored R's RNG entirely. Two MaximizeParsimony(concavity = "profile") runs with the same set.seed() therefore drew different random trees, producing different info.amounts tables and -- amplified through the search's tie-breaking -- different final scores. EW and IW were unaffected (they never touch this path); the default reproducer happened to stop reproducing only because the exact solver now handles that small dataset (auto -> exact), but the MC path remained non-deterministic for any larger / infeasible character. Fix: add seed_random_tree() to reseed the MWC generator, and call it from mc_fitch_scores() with two non-zero 32-bit seeds drawn from R's RNG (the Rcpp wrapper already establishes an RNGScope). Profile search is now reproducible under set.seed(); the exact path is byte-identical (no regression). Regression tests: same-seed reproducibility of mc_fitch_scores() and StepInformation(approx = "mc") (test-pp-multistate.R), and of a MaximizeParsimony(concavity = "profile", profile_approx = "mc") search (test-ts-profile.R). Note: RANDOM_TREE / RANDOM_TREE_SCORE consume the same unseeded MWC generator (same set.seed-blindness, same bug class) but are out of scope for this fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ility RANDOM_TREE and RANDOM_TREE_SCORE are plain .Call() entry points that wrap the Marsaglia MWC generator. Without GetRNGstate/PutRNGstate they ignored set.seed() entirely. Now both functions draw two non-zero 32-bit seeds from R's RNG (bracketed with GetRNGstate/PutRNGstate) and call seed_random_tree(), matching the pattern already applied to mc_fitch_scores (fe9dd6f). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pt-in) (#256) * perf(iw): x4 reroot batch + extract_char_steps dirty-region for IW TBR Cherry-pick of 69febb4 onto cpp-search (post-#254). Resolved the exact_verify candidate-scoring conflict by keeping #254's incremental rescore path and dropping the directional-audit (na_dir_audit) block that #254 deliberately removed. Stripped the settled-dead ts_iw_gather_bench microbench (gather micro-opt proven a dead heat) and trimmed superseded dev A/B harnesses, keeping only bench_iw_realized.R. Opts remain kill-switched (TS_IW_NOX4 / TS_IW_NODIRTY); a follow-up commit flips them default-off for the shared branch. Mission-null at morphology scale; kept for the large-N / recipe-retune reopen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tbr): fire IW dirty/x4 opts on XPIWE; fix nx_cs active_mask underflow The extract_char_steps dirty-region + x4 reroot-batch opts were gated to ScoringMode::IW, but production MaximizeParsimony defaults to extended IW (ScoringMode::XPIWE, ts_data.cpp:297). So the opts were dead on the production path for every dataset, and the mission-wall A/B (which toggles TS_IW_NOX4/ NODIRTY through MaximizeParsimony) was toggling code that never executed -> the 1.005x "null result". Widen the gate to iw_family = (IW || XPIWE): both modes score via compute_iw/precompute_iw_delta from the same weighting-agnostic char_steps, differing only in per-pattern eff_k/phi (which those functions already consume). PROFILE stays excluded (compute_profile + info_amounts + precomputed_steps is a genuinely different char_steps convention). Perf-only / byte-identical: opts-on vs opts-off scores identical across Dikow/Vinther/Zanol/Giles/Wortley x 3 seeds; SCANCHK clean on XPIWE. Firing the opts on the production path exposed a pre-existing dirty-region bug: the nx_cs accumulation did not skip active_mask==0 blocks, but extract_char_steps (which builds the F cache) does. Under a ratchet ZERO_ONLY perturbation (the default, ratchetCycles>=3) that fully deactivates a block, divided_steps = F + cs_delta - nx_cs went negative (TS_IW_DIRTYCHK: dirty=-1 ref=0 on Dikow2009/Vinther2008). Weighting-independent (plain IW mismatches identically; reachable today via MaximizeParsimony(extended_iw=FALSE)). The candidate scan masks needs_step by active_mask so scores were unaffected, but the invariant was broken. Fix: skip active_mask==0 in the nx_cs loop only; EW nx_cost is left counting all blocks (its best_score/delta length convention is self-consistent). DIRTYCHK now clean across 5 datasets x ratchetCycles {0,3,6}. Add a testthat guard pinning XPIWE opts-on == opts-off (tolerance=0) under ratchet (the discriminating config; prior tests ran plain-IW or ratchetCycles=0 and were tautological for this path). 398 existing tests still pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tbr): NA-IW 4-wide reroot batch (indirect_na_iw_cached_flat_x4) [perf TBD] Port the T-245 4-wide reroot ILP batch to the implied-weights + inapplicable (IW/XPIWE + NA) TBR scan -- the one scoring path that never received it. The NA reroot scan already had an EW-NA x4 (fitch_na_indirect_cached_flat_x4) but IW-NA/XPIWE-NA fell to the one-at-a-time scalar indirect_na_iw_length_cached; the IW x4 branch was !has_na-gated. New kernel indirect_na_iw_cached_flat_x4 (src/ts_fitch.cpp) fuses the NA active-mask candidate logic of the EW-NA x4 (from1 reduce, shared clip_has_active, per-candidate below_actives AND) with the per-candidate iw_delta ctz-gather of indirect_iw_cached_flat_x4. Each accumulator keeps the scalar add order of indirect_na_iw_length_cached, so per-candidate results are bit-identical; the shared all-4-exceed-cutoff bail only changes early-exit on cutoff-losing candidates. Wired by widening the IW x4 branch gate (ts_tbr.cpp:1985) to dispatch on has_na, mirroring its own per-candidate skip re-check and main_edges[ei] indexing; the no-NA path stays byte-identical. Gated on iw_family (IW||XPIWE), so it fires on the production XPIWE+NA path (MaximizeParsimony default) -- the first banked IW kernel opt that lands on the NATIVE inapplicable-bearing corpus rather than only recoded matrices. The dirty-region scan shortcut is deliberately NOT ported (stays !has_na): the top-down down2 pass breaks the path-bounded F+cs_delta-nx_cs decomposition, and it is the smaller (once-per-clip) prize. CORRECTNESS validated, PERF still unknown (Hamilton A/B pending): - byte-identical x4-on vs TS_IW_NOX4-off: 24/24 (15 direct IW-NA ts_tbr_search + 9 MaximizeParsimony XPIWE-NA under ratchet; datasets at 10-38% NA) - existing independent full-rescore oracle passes; 519 testthat pass - kernel confirmed to execute (throw-probe), so the byte-identity is not vacuous - new regression guard in test-ts-tbr-dirty-rescore.R keeps inapplicables Kill-switch TS_IW_NOX4 (shared with the no-NA x4). NOT for cpp-search merge until the A/B reports and the getenv-consolidation merge-gate is addressed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(iw): land IW/NA x4 + dirty-region opts default-OFF (opt-in) Flip the kill-switch semantics from default-on (TS_IW_NOX4 / TS_IW_NODIRTY) to opt-in (TS_IW_X4 / TS_IW_DIRTY). These opts are mission-null at morphology scale (validated byte-identical, 1.005x wall) and are preserved on the shared branch only for the large-N / recipe-retune reopen, so they should impose nothing by default. Test guards updated in lockstep: the "on" arm now explicitly enables the opt so the kernel still fires (data triggers the iw_family + has_na gate as before); byte-identity vs the scalar baseline is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * dev(benchmarks): NA-IW x4 A/B harnesses + large-N matrices for reopen Preserve the dilution-free element-level and mission-wall A/B harnesses for the NA-IW x4 reroot batch (updated to the TS_IW_X4 opt-in env var), plus the two large real inapplicable-bearing matrices (Sun2018, lobo) the large-N reopen will need to re-test whether the scan share rises. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tbr): drop directional-audit orphan bundled in cherry-picked ts_tbr.cpp The worktree's 69febb4 bundled the exact directional (Regime-C) NA scoring audit/scorer into ts_tbr.cpp alongside the x4/dirty opts; the header it needs (ts_fitch_na_directional.h, added separately by b2e03a9) is not on cpp-search and was deliberately excluded (the directional path is dead -- 24-89x slower than SIMD full_rescore). Strip the include, na_dir_audit / na_dir_scorer setup, whole-tree cross-check, build_clip_folds, and the per-candidate directional fast-path so exact_verify_sweep matches cpp-search's (#254 incremental) version. Kept <chrono> (needed by the iw_timing diagnostic). Builds clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…Seconds XSS/RSS/CSS ignored the timeout callback, letting an in-flight sector pass overrun the deadline by up to one full XSS pass per worker. On Agnarsson2004 (62t) the parallel path previously took 2.4-9.3 s with maxSeconds=2; it now returns in ~2.7 s. Changes: - ts_sector.h: add <functional> + check_timeout=nullptr to all three signatures - ts_sector.cpp: poll check_timeout at every existing check_interrupt site; forward it to all tbr_search() calls inside xss/rss/css_search - ts_driven.cpp: pass check_timeout at all 6 xss/rss/css_search call sites - ts_parallel.cpp: give workers a real wall-clock deadline lambda (not stop_flag, which is also set by target_hits and would break byte-identity) No-timeout path is byte-identical by construction: check_timeout=nullptr short-circuits the new branch and nothing else changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndows CI hang) (#258) * fix(parallel): move weighted-rescore char_steps off thread_local (MinGW emutls) `fitch_score_ew`'s IW/profile full-rescore path kept its per-pattern step buffer in a function-local `static thread_local std::vector<int>`. On MinGW a thread_local with a non-trivial destructor is torn down via emutls when each std::thread worker exits, and that teardown corrupted the heap across the parallel search's repeated worker spawn/exit cycles — a Windows-only failure in test-ts-parallel.R (the suite reached "Parallel hits_to_best" and the process hung on a corrupted worker whose replicate never completed, so the driver's main poll loop spun forever; on CI it surfaced as a hard error). Linux uses native TLS and was unaffected (gcc-ASan + valgrind both clean). Fix mirrors the evs_false_cache change already in this file's history: the scratch now lives on DataSet as a `mutable std::vector<int> char_steps_scratch`. Each parallel worker owns a private `ds_local` copy for its whole lifetime, so this preserves the same per-thread, cross-call capacity persistence the thread_local provided — without any emutls teardown. Single-writer per copy (workers touch only their own ds_local; the prototype is used only in the post-join single-threaded MPT phase), so no synchronisation is required. Byte-identical scores; this is the last remaining `static thread_local` object with a destructor on the worker scoring path. Verified on Windows: the full test-ts-parallel.R sequence, which hung 2/2 before, completed 3/3 after. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(parallel): assert timeout behaviour, not a fragile wall-clock ceiling "Parallel search respects timeout" asserted `elapsed < 15.0` for a 2 s budget. That is a wall-clock precision assertion a shared CI runner cannot guarantee: the per-test wall time is observed to vary from ~5 s to ~30 s on the same machine (one in-flight sectorial pass per worker, plus scheduling jitter). Replace it with a working-vs-broken discriminator: `expect_lt(elapsed, 60)` (a working timeout returns in seconds-to-low-tens; a broken one runs the full 1000-replicate budget, >1000 s — a ~100x gap) plus `expect_lt(replicates, 1000L)`, keeping the `timed_out` flag check. Supersedes #257. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(parallel): drop the wall-clock timeout ceiling entirely (covr-robust) The relaxed `elapsed < 60` bound still failed the Windows covr step at 84.7 s: covr rebuilds the C++ engine -O0 + gcov-instrumented, so the single in-flight sectorial pass that overruns the 2 s deadline runs ~10-20x slower than the optimised build — slow code, not a broken timeout. Any wall-clock ceiling is fundamentally unworkable under instrumentation. Drop the elapsed assertion: `result$timed_out` (set only on the deadline/cancel path) plus `result$replicates < maxReplicates` (proves the search stopped before exhausting its budget) are a robust working-vs-broken discriminator with zero wall-clock dependency, holding on any hardware and under covr. A real unbounded run would exhaust the budget (replicates == 1000) and fail the second check; a true hang would never return (caught as a job timeout, not this assertion). This surfaced only after the emutls fix let R CMD check pass on Windows, so the covr step finally ran the parallel tests to completion. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Manual testing underway; shiny app in particular has some usability issues.