Skip to content

Implement solver-based branch pruning#8979

Open
tautschnig wants to merge 8 commits into
diffblue:developfrom
tautschnig:branch-pruning
Open

Implement solver-based branch pruning#8979
tautschnig wants to merge 8 commits into
diffblue:developfrom
tautschnig:branch-pruning

Conversation

@tautschnig

Copy link
Copy Markdown
Collaborator

Query the SAT solver at each branch point to eliminate infeasible branches. Works in both --paths mode (push/pop scoping) and monolithic mode (incremental conversion with converted_for_pruning flag). Uses a non-simplifying MiniSat solver to avoid backwardSubsumptionCheck overhead; falls back to solver_factoryt when MiniSat is not available (e.g., CaDiCaL builds).

Adaptive pruning: cumulative 2-second time budget auto-disables pruning when overhead exceeds benefit.

Assumes are explicitly asserted as guard => cond. Includes fix for symex::args name collisions across paths via a static type registry.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Apr 27, 2026
Copilot AI review requested due to automatic review settings April 27, 2026 10:49

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds solver-assisted feasibility checks at branch points to prune unreachable branches during symbolic execution, reducing path explosion.

Changes:

  • Introduces a shared, push/pop-capable SAT solver for branch pruning (with MiniSat-no-simplifier fast path and solver-factory fallback).
  • Implements pruning in both --paths and monolithic modes, with an adaptive time-budget auto-disable.
  • Adds CLI toggle --no-branch-pruning, regression tests, and an architecture note.

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/goto-symex/symex_target_equation.cpp Avoids argument symbol name/type collisions across shared solver conversions
src/goto-symex/symex_goto.cpp Adds solver queries to prune infeasible branches in paths + monolithic execution
src/goto-symex/ssa_step.h Adds converted_for_pruning flag for incremental pruning conversions
src/goto-symex/goto_symex.h Stores pruning solver pointer and pruning budget state; exposes setter
src/goto-checker/single_path_symex_only_checker.h Adds persistent pruning solver member and setup_symex hook
src/goto-checker/single_path_symex_only_checker.cpp Builds/configures pruning solver and wires it into symex
src/goto-checker/multi_path_symex_only_checker.h Adds persistent pruning solver member
src/goto-checker/multi_path_symex_only_checker.cpp Builds/configures pruning solver and wires it into symex
src/cbmc/cbmc_parse_options.h Adds no-branch-pruning command-line option
src/cbmc/cbmc_parse_options.cpp Parses --no-branch-pruning and documents it in --help
regression/cbmc/unwind_counters4/test.desc Updates expected output regex
regression/cbmc/paths-branch-pruning/*.c/.desc Adds new regression tests covering pruning behavior and refine-soundness
doc/architectural/branch-pruning.md Documents pruning architecture and limitations
Comments suppressed due to low confidence (4)

src/goto-symex/symex_goto.cpp:1

  • In monolithic pruning, assignments/constraints are asserted unconditionally via set_to_true(step.cond_expr). SSA steps are typically guarded, and the normal conversion asserts them as guard => cond_expr; dropping the guard can over-constrain the pruning solver and incorrectly mark branches infeasible. Fix by asserting guarded steps as implies_exprt{step.guard, step.cond_expr} (or the equivalent used by the normal equation conversion) for assignments and constraints as well.
    src/goto-symex/symex_goto.cpp:1
  • SSA_stept::converted is documented/used for incremental conversion. Resetting it here for pruning risks interfering with later conversions into other solvers (and can force repeated reconversion), which can change performance and potentially duplicate constraints depending on how conversion is orchestrated downstream. Prefer tracking pruning conversion exclusively with converted_for_pruning (already added) and avoid mutating converted in the pruning path.
    src/goto-symex/symex_target_equation.cpp:1
  • This function-local static registry is shared across all runs/instances and is not thread-safe. It also grows monotonically with every unique symex::args::<n> ever produced, which can create unbounded memory growth in long-running or multi-check workflows. A more robust approach is to make the registry solver- or symex-instance scoped (or incorporate a type-derived discriminator into the identifier) so state doesn’t leak globally and can be managed/reset deterministically.
    src/goto-symex/symex_goto.cpp:1
  • The pruning query is checking satisfiability under the converted equation, but it doesn’t appear to assert the current path condition explicitly (e.g., state.guard). That can significantly reduce pruning power (SAT models can satisfy implications by setting earlier guards false). Consider adding the current path guard as an assumption (or conjoining it with new_guard / !new_guard) when calling push(...) so the solver checks feasibility under the active path condition.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

^SIGNAL=0$
^VERIFICATION FAILED$
^\** 4 of 4 failed
^\*\* [34] of [34] failed
Comment on lines +22 to +25
#ifdef HAVE_MINISAT2
#include <solvers/sat/satcheck_minisat2.h>
#endif
#endif
Comment thread doc/architectural/branch-pruning.md Outdated
@@ -0,0 +1,86 @@
/// \file
/// Branch Pruning in --paths Mode
Comment thread doc/architectural/branch-pruning.md Outdated
Comment on lines +69 to +71
/// ## Push/Pop Overhead
///
/// Each branch check converts the entire equation inside a push/pop scope.
tautschnig and others added 8 commits May 28, 2026 10:05
…pport; add CI tests

The `solver-time-limit` option has been read by `solver_factoryt`
since 38a370c (Jan 2019), but was never registered as a
command-line flag in `OPT_SOLVER` -- the original use case was
purely programmatic, via dependent projects calling
`options.set_option("solver-time-limit", N)` directly. That left
the option's plumbing untested in CI and the only backend that
honoured it (MiniSat 2) had no end-to-end coverage.

This change:

  * Registers `--solver-time-limit N` in `OPT_SOLVER` /
    `HELP_SOLVER` and wires the cmdline-to-options forwarding in
    `parse_solver_options`. `cbmc`, `jbmc`, `goto-synthesizer` and
    `libcprover-cpp` (which all use `parse_solver_options`) now
    accept the flag from the command line.

  * Documents the flag in `doc/man/cbmc.1`,
    `doc/man/goto-synthesizer.1` and `doc/man/jbmc.1`, so
    `scripts/check_help.sh` continues to pass.

  * Implements `set_time_limit_seconds` for the IPASIR back-end
    (`satcheck_ipasirt`). The implementation registers an
    `ipasir_set_terminate` callback that returns 1 once the
    per-solve deadline has passed; IPASIR solvers poll the
    callback during solving and abort cleanly when it does. On
    timeout the back-end returns `P_ERROR` (matching the MiniSat
    2 back-end's behaviour) instead of throwing
    `analysis_exceptiont`.

Tests:

  * `regression/cbmc/solver-time-limit/`: a `CORE` regression
    test that pins down the happy path -- the option is parsed
    and propagated through to the solver layer without breaking
    a trivial solve. Runs on every CI build.

  * `unit/solvers/sat/satcheck_minisat2.cpp`: extended with a
    pigeonhole-PHP(20) `SCENARIO` that builds the formula
    directly via `new_variable` / `lcnf`, sets a 1-second time
    limit, and asserts that `prop_solve` returns `P_ERROR`
    within five seconds. PHP is provably exponential for
    resolution-based SAT (Haken, 1985), so the timeout is
    reliably exceeded on any reasonable hardware.

  * `unit/solvers/sat/satcheck_ipasir.cpp`: new file, mirroring
    the satcheck_minisat2 test. Conditional on `HAVE_IPASIR`,
    so it only compiles in CI builds that link an IPASIR back-end
    (currently the `check-ubuntu-24_04-make-gcc` job, which uses
    Riss as the IPASIR provider).

This is the first piece of foundation work for a wider redesign of
CBMC's solver-timeout infrastructure (sub-second granularity,
cross-OS, watchdog-based interrupt). Subsequent commits will
build on a now-tested baseline.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The solver-time-limit infrastructure used to be expressed in whole
seconds, was implemented for MiniSat 2 via POSIX `alarm()` +
`SIGALRM` (so it was POSIX-only and process-wide), and was
explicitly skipped on Windows with a "Time limit ignored (not
supported on Win32 yet)" warning. In practice this meant:

  * Sub-second budgets -- needed for fine-grained per-check
    timeouts inside symbolic execution -- could not be expressed.
  * Windows builds had no time-limit support at all.
  * The signal handler relied on a process-wide
    `solver_to_interrupt` pointer; nesting solve calls across
    threads or having two solvers active in the same process was
    fragile.

This change reworks the API and the MiniSat 2 implementation:

  * `solver_resource_limitst::set_time_limit_milliseconds` and
    `propt::set_time_limit_milliseconds` are now the primary
    virtual API. The default `propt` implementation logs a
    warning, mirroring the previous behaviour for back-ends with
    no native timeout support.

  * `set_time_limit_seconds` survives as a non-virtual helper
    that forwards to `set_time_limit_milliseconds(lim * 1000)`,
    so external callers (e.g. the CLI option-parser, dependent
    projects) keep working without source changes.

  * The MiniSat 2 back-end now implements the time limit using a
    `std::thread` watchdog: when the limit is non-zero, a
    watchdog thread waits on a `std::condition_variable` for
    either the solve to complete (signalled via a flag) or the
    deadline to elapse, and on timeout calls
    `solver->interrupt()` (which is just a flag set on the
    Minisat::Solver object and is safe to invoke from another
    thread). The watchdog is joined before `do_prop_solve`
    returns. Result: cross-OS, sub-millisecond granularity, no
    global signal state, no `SIGALRM` collision risk.

  * The IPASIR back-end's existing
    `set_time_limit_seconds` override is renamed/repurposed as
    `set_time_limit_milliseconds`; the deadline is computed in
    `std::chrono::milliseconds` instead of seconds. The
    callback already polled a `chrono::steady_clock` deadline,
    so this is essentially a stored-precision change.

  * The `solver_factoryt` call site in `solver_factory.cpp`
    keeps using the seconds helper (the CLI flag
    `--solver-time-limit` remains expressed in seconds for
    user familiarity); only the internal API is now in
    milliseconds.

Tests:

  * Existing pigeonhole(20) SCENARIOs in
    `unit/solvers/sat/satcheck_minisat2.cpp` and
    `unit/solvers/sat/satcheck_ipasir.cpp` now request a
    200-millisecond limit and verify
    (a) that `prop_solve` returns `P_ERROR`, and
    (b) that wall-clock elapsed time is well under five seconds
        (a generous slack for slow CI hardware and watchdog-join
        latency).
    Locally the satcheck_minisat2 test completes in ~207 ms,
    confirming sub-millisecond accuracy of the watchdog.

This is the second piece of the wider solver-timeout rework.
Subsequent work will add CaDiCaL terminator support and let the
branch-pruning code use sub-second per-call budgets directly.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The CaDiCaL back-end was the last SAT back-end in CBMC for which
`set_time_limit_milliseconds` (and its `set_time_limit_seconds`
helper) silently fell through to `propt`'s default no-op + warning.
This change adds native time-limit support via CaDiCaL's
`Terminator` callback API:

  * `satcheck_cadical_baset` declares a nested `terminatort` class
    in the header (forward-declared to keep `cadical.hpp` out of
    the header).

  * `terminatort` is defined in the .cpp and inherits
    `CaDiCaL::Terminator`. It stores a deadline as a
    `steady_clock::time_point` and returns true once the deadline
    has passed.

  * `set_time_limit_milliseconds(uint32_t)` stores the limit on
    the solver instance.

  * `do_prop_solve` constructs / refreshes the deadline before
    calling `solver->solve()`, calls
    `solver->connect_terminator(...)` while solving, and calls
    `solver->disconnect_terminator()` afterwards. CaDiCaL polls
    the terminator at decision points during solving and aborts
    cleanly on a true return.

  * On interrupt, `do_prop_solve` now returns `propt::resultt
    ::P_ERROR` -- matching the MiniSat 2 and IPASIR back-ends --
    instead of throwing `analysis_exceptiont`. The
    `<util/exception_utils.h>` include is dropped as it is no
    longer needed.

A `set_time_limit_milliseconds(200)` SCENARIO is added to
`unit/solvers/sat/satcheck_cadical.cpp` mirroring the existing
MiniSat 2 / IPASIR timeout tests: a pigeonhole(20) CNF (provably
exponential for resolution-based SAT) is built directly via
`new_variable` / `lcnf`, and the solver is asserted to return
`P_ERROR` within five seconds. Locally the test completes in
~208 ms (200 ms deadline + ~8 ms terminator-polling latency),
confirming sub-millisecond accuracy of the CaDiCaL implementation.

After this commit, all three SAT back-ends with native interrupt
infrastructure (MiniSat 2 via std::thread watchdog, IPASIR via
`ipasir_set_terminate`, CaDiCaL via `connect_terminator`) honour
`--solver-time-limit` and `set_time_limit_milliseconds` with
sub-second granularity, are portable across operating systems,
and return `P_ERROR` on interrupt for callers to detect.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Query the SAT solver at each branch point to eliminate infeasible
branches. Works in both --paths mode (push/pop scoping) and
monolithic mode (incremental conversion with converted_for_pruning
flag). Uses a non-simplifying MiniSat solver to avoid
backwardSubsumptionCheck overhead; falls back to solver_factoryt
when MiniSat is not available (e.g., CaDiCaL builds).

Adaptive pruning: cumulative 2-second time budget auto-disables
pruning when overhead exceeds benefit.

Assumes are explicitly asserted as guard => cond. Includes fix for
symex::args name collisions across paths via a static type registry.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The branch pruning solver performs a single SAT check without the
iterative refinement loop. With --refine, --refine-arrays, or
--refine-strings, this can give unsound results on the initial
over-approximation (e.g., incorrectly determining a branch is
infeasible). Disable branch pruning for all refinement modes.

The --refine flag (used by --incremental-loop) also creates a
bv_refinement solver. The branch pruning solver doesn't perform
the refinement loop, so it must be disabled for all refinement
modes, not just --refine-arrays and --refine-strings.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

The branch pruning solver performs a single SAT check without the
iterative refinement loop. With --refine, --refine-arrays, or
--refine-strings, this can give unsound results on the initial
over-approximation (e.g., incorrectly determining a branch is
infeasible). Disable branch pruning for all refinement modes.
Add a command-line option to disable the solver-based branch pruning
that is now active by default in both --paths and monolithic modes.
This is useful for:
- Comparative benchmarking (measuring pruning impact)
- Working around potential edge cases in branch pruning
- Debugging verification results

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Tests covering: prune_assume, no_prune_nondet, prune_chained,
prune_loop, prune_malloc, refine_arrays_sound, refine_sound,
guarded_assume, monolithic_prune, malloc_dynamic_bounds,
stdlib_no_paths_explosion.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants