interactive/explain: filter both sides' times before splitting join demand by frankmcsherry · Pull Request #758 · TimelyDataflow/differential-dataflow

frankmcsherry · 2026-06-12T00:35:09Z

The join backward rule should map demand for (k, (v1, v2)) @ t to (k, v1) @ ≤t and (k, v2) @ ≤t. The implementation applied each side's ≤ t filter only to its own contribution, projecting the partner's user-time coordinates away unfiltered. Over the signed host tables (witness + forward carry stable −1 rows at retraction iterations) that projection is not a faithful existential: projecting away a time coordinate computes the accumulation at time ∞ rather than at the demanded time, so candidate pairs differing only in the partner's time — a +1 while the partner held the row, a −1 from its later retraction — merge into the same contribution row and cancel, annihilating demand the time-valid configuration justifies.

Concretely (scc at 1000 nodes / 1100 edges, querying output edge (773, 466) on a 13-node cycle): demand for the fwd-label proposal at node 25 paired labels(25) = 0 with the scc edge (25, 236), which scc retracts at outer iteration 1 as trim shrinks. The left contribution's +1/−1 rows cancelled, so the edge was demanded but label(25) never was; the 0-flood justification chase died there, the forward clone's labels diverged, and the 66-row demand-set failed to regenerate the queried output (cycle edges 893→689, 689→327 missing).

Fix: one dep ⋈ pair join carrying both sides' user chains, with both outer-aligned u_in ≤ u_out filters applied before either side is projected away. A (left, right) pair only explains output demanded at u_out if both inputs were present at times ≤ u_out. This is also a sound narrowing: pairs whose partner arrived later no longer contribute demand.

Three commits:

backend::vec::evaluate() — the driver half of the interactive: extract scope-tree renderer into a substrate-generic backend #757 split: run a Program in-process on explicitly provided rows and return every export's consolidated contents as data. Retires the env-var pattern (QUERY/CLONE_RT-style) for semantic questions; the rewrite's demand:input* exports are read directly.
The fix in explain::emit_lookup_join.
tests/explain.rs — end-to-end sufficiency tests: a demand-set, fed back through the original program as its only input, must regenerate the queried output row. Clone-identity, a 12-query sweep at 50/55, demand ⊆ input, plus #[ignore]d heavies: the 22-query sweep at 100/110 and the (773, 466) regression. The regression fails against commit 2's parent with the exact signature above and passes with it. Full suite, ignored tests included: ~1.4s in release.

… as data Adds backend::vec::evaluate(program, inputs): run a scope-tree Program on explicitly provided rows (single worker, in-process) and return every export's consolidated final contents as data. This is the missing driver half of the #757 split: render_tree moved into the library, but inputs could still only come from gen_row and outputs could only leave via Inspect eprintln, which is why semantic questions kept growing one-off env vars (QUERY, CLONE_RT, EDGES_FILE) on the example binaries. With evaluate(), end-to-end properties — clone identity, explanation sufficiency, demand-set excess — are ordinary #[test]s over data, and the explain rewrite's demand:input* exports are read directly instead of scraped from stderr. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…emand The join backward rule maps demand for (k, (v1, v2)) @ t to (k, v1) @ <= t and (k, v2) @ <= t. The implementation applied each side's <= t filter only to that side's contribution, projecting the partner's user-time coordinates away unfiltered. Over the signed host tables (witness + forward carry stable -1 rows at retraction iterations) that projection is not a faithful existential: pair rows differing only in the partner's time -- a +1 while the partner held the row and a -1 from its later retraction -- merge into the same contribution row and cancel, annihilating demand the time-valid configuration justifies. Concretely (scc, 1000 nodes / 1100 edges, QUERY=773:466): demand for the fwd-label proposal at node 25 paired labels(25)=0 with scc edge (25,236), which scc retracts at outer iter 1 as trim shrinks. The left contribution's +1/-1 pair cancelled, so the edge was demanded but label(25) never was; the 0-flood justification chase died there, the forward clone's labels diverged (label(466) = 11 != 0), trim_fwd lost 466->28, and the demand-set (66 edges) failed to regenerate the queried output. Fix: one dep-pair join carrying BOTH user chains, with both outer-aligned u_in <= u_out filters applied before either side is projected away. A (left, right) pair only explains output demanded at u_out if both inputs were present at times <= u_out. This is also a sound narrowing: pairs whose partner arrived later no longer contribute demand. Verified: all 22 scc-edge queries at 100 nodes / 110 edges and 7/7 sampled at 1000/1100 now produce demand-sets that regenerate their queried row (EDGES_FILE re-run); 773:466 grows 66 -> 278 edges and is sufficient; several 100/110 queries shrink (e.g. 25 -> 20, 16 -> 11); the global query is unchanged at 110; interactive lib tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The central property: for a query against a program's output, the reported demand-set — fed back through the original program as its only input — must regenerate the queried output row. - clone_identity_preserves_scc_output: CLONE_RT as a test. - scc_row_explanations_sufficient_small: all 12 scc-edge queries at 50 nodes / 55 edges, each demand-set replayed and checked. - demand_is_a_subset_of_the_input. - [ignored] scc_row_explanations_sufficient_100: the 22-query sweep. - [ignored] scc_join_partner_time_regression: the 1000/1100 (773,466) query. Fails against the parent of the join-time fix with the exact signature from its investigation (66 demanded rows, queried row not regenerated); passes with it. The whole suite, ignored tests included, runs in ~1.4s in release. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The join backward rule matched demand to candidate (left, right) pairs on K_out alone; the demanded output VALUE never constrained which pairs were charged, so every same-key sibling was demanded — every in-edge of a node and its ancestor label cone, not just the pair that carried the demanded label. The pair table now also records the chained V_out each pair produces (the projection's value fields, Pos-expanded against the original arities), and the dep-pair join filters produced == demanded element-wise, alongside the two time filters. Measured on scc row queries (demand-set size vs greedy 1-minimal): 50 nodes / 55 edges: 119 -> 71 demanded over 12 queries (3.72x -> 2.22x); 100 / 110: 736 -> 395 over 22 queries (4.72x -> 2.53x). Sufficiency suite (including the 1000/1100 partner-time regression) passes. An earlier attempt at this narrowing (pre #757-era experiments) appeared to break the demand fixed point's self-correction; that failure was the partner-time cancellation fixed in the previous commit, not a property of value narrowing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

An ignored, non-asserting test that prints per-query demand-set size against a greedy 1-minimal shrink (drop rows while the replay still regenerates the queried output) — the working metric for the over-approximation work. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Generalizes the sufficiency/excess helpers to per-import shapes and multiple demand-set exports (reach demands both edges and roots), and adds transitive closure and reachability cases to the default suite and the excess report. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…uite - fuzz_explanations_sufficient: 10 seeded graphs x {scc, tc, reach}, every output row queried and its demand-set replayed (~300 sufficiency checks, ~11s release). Both soundness bugs so far surfaced only at particular scales/instances; this is the standing net for the next one. - two_simultaneous_queries_sufficient: two queries seeded together with distinct q ids; the union demand must regenerate both rows. Exercises the q-id plumbing through every reverse rule. - count_explanations_sufficient_small: count over duplicate-free inputs is sufficient today (the keyed all-rows demand is exactly the count's multiset). - count_duplicate_inputs_known_gap (#[ignore]d, fails by design): demand- sets are sets but count outputs depend on input multiplicities; with a duplicated row, indeg(5)=3 demands {(1,5),(2,5)} and replays as 2. The motivating case for a multiplicity ("diff") dimension on demand. - CI: run the suite with --include-ignored in release (skipping the metric report and known-gap tests), so the soundness regression gates merges. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…plicities) Demand means "this row, at its reference count": the rewrite's forward clone consumes demand-sets by semijoining the actual input, preserving source multiplicities. The test oracle instead replayed the demand-set itself at multiplicity 1, which understates counts when inputs contain duplicate rows — the supposed count 'known gap' was this oracle bug, not a rewrite gap. Replays now restrict the original inputs to the demanded rows; the count-with-duplicates case flips to a passing test and the CI known-gap skip is removed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- CLONE_RT: deleted; the identity check is the clone_identity_preserves_* test. - EXPLAIN_DEBUG_DEP: the library no longer reads the environment; the rewrite takes explain::Options { debug_inspects } via explain_with(), and ddir_vec maps a --debug-demand flag onto it. - QUERY / DIAG: become ddir_vec flags (--query=K:V,Q / --diag); flag args are stripped before positional parsing, and the timely tail is forwarded from the cleaned argument list instead of raw std::env::args(). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

frankmcsherry and others added 9 commits June 11, 2026 20:31

frankmcsherry merged commit 04b1483 into master-next Jun 12, 2026
6 checks passed

frankmcsherry deleted the explain-join-time-filters branch June 12, 2026 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interactive/explain: filter both sides' times before splitting join demand#758

interactive/explain: filter both sides' times before splitting join demand#758
frankmcsherry merged 9 commits into
master-nextfrom
explain-join-time-filters

frankmcsherry commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frankmcsherry commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant