Implement solver-based branch pruning#8979
Open
tautschnig wants to merge 8 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds solver-assisted feasibility checks at branch points to prune unreachable branches during symbolic execution, reducing path explosion.
Changes:
- Introduces a shared, push/pop-capable SAT solver for branch pruning (with MiniSat-no-simplifier fast path and solver-factory fallback).
- Implements pruning in both
--pathsand monolithic modes, with an adaptive time-budget auto-disable. - Adds CLI toggle
--no-branch-pruning, regression tests, and an architecture note.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/goto-symex/symex_target_equation.cpp | Avoids argument symbol name/type collisions across shared solver conversions |
| src/goto-symex/symex_goto.cpp | Adds solver queries to prune infeasible branches in paths + monolithic execution |
| src/goto-symex/ssa_step.h | Adds converted_for_pruning flag for incremental pruning conversions |
| src/goto-symex/goto_symex.h | Stores pruning solver pointer and pruning budget state; exposes setter |
| src/goto-checker/single_path_symex_only_checker.h | Adds persistent pruning solver member and setup_symex hook |
| src/goto-checker/single_path_symex_only_checker.cpp | Builds/configures pruning solver and wires it into symex |
| src/goto-checker/multi_path_symex_only_checker.h | Adds persistent pruning solver member |
| src/goto-checker/multi_path_symex_only_checker.cpp | Builds/configures pruning solver and wires it into symex |
| src/cbmc/cbmc_parse_options.h | Adds no-branch-pruning command-line option |
| src/cbmc/cbmc_parse_options.cpp | Parses --no-branch-pruning and documents it in --help |
| regression/cbmc/unwind_counters4/test.desc | Updates expected output regex |
| regression/cbmc/paths-branch-pruning/*.c/.desc | Adds new regression tests covering pruning behavior and refine-soundness |
| doc/architectural/branch-pruning.md | Documents pruning architecture and limitations |
Comments suppressed due to low confidence (4)
src/goto-symex/symex_goto.cpp:1
- In monolithic pruning, assignments/constraints are asserted unconditionally via
set_to_true(step.cond_expr). SSA steps are typically guarded, and the normal conversion asserts them asguard => cond_expr; dropping the guard can over-constrain the pruning solver and incorrectly mark branches infeasible. Fix by asserting guarded steps asimplies_exprt{step.guard, step.cond_expr}(or the equivalent used by the normal equation conversion) for assignments and constraints as well.
src/goto-symex/symex_goto.cpp:1 SSA_stept::convertedis documented/used for incremental conversion. Resetting it here for pruning risks interfering with later conversions into other solvers (and can force repeated reconversion), which can change performance and potentially duplicate constraints depending on how conversion is orchestrated downstream. Prefer tracking pruning conversion exclusively withconverted_for_pruning(already added) and avoid mutatingconvertedin the pruning path.
src/goto-symex/symex_target_equation.cpp:1- This function-local
staticregistry is shared across all runs/instances and is not thread-safe. It also grows monotonically with every uniquesymex::args::<n>ever produced, which can create unbounded memory growth in long-running or multi-check workflows. A more robust approach is to make the registry solver- or symex-instance scoped (or incorporate a type-derived discriminator into the identifier) so state doesn’t leak globally and can be managed/reset deterministically.
src/goto-symex/symex_goto.cpp:1 - The pruning query is checking satisfiability under the converted equation, but it doesn’t appear to assert the current path condition explicitly (e.g.,
state.guard). That can significantly reduce pruning power (SAT models can satisfy implications by setting earlier guards false). Consider adding the current path guard as an assumption (or conjoining it withnew_guard/!new_guard) when callingpush(...)so the solver checks feasibility under the active path condition.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ^SIGNAL=0$ | ||
| ^VERIFICATION FAILED$ | ||
| ^\** 4 of 4 failed | ||
| ^\*\* [34] of [34] failed |
Comment on lines
+22
to
+25
| #ifdef HAVE_MINISAT2 | ||
| #include <solvers/sat/satcheck_minisat2.h> | ||
| #endif | ||
| #endif |
| @@ -0,0 +1,86 @@ | |||
| /// \file | |||
| /// Branch Pruning in --paths Mode | |||
Comment on lines
+69
to
+71
| /// ## Push/Pop Overhead | ||
| /// | ||
| /// Each branch check converts the entire equation inside a push/pop scope. |
6498011 to
3834a1a
Compare
…pport; add CI tests The `solver-time-limit` option has been read by `solver_factoryt` since 38a370c (Jan 2019), but was never registered as a command-line flag in `OPT_SOLVER` -- the original use case was purely programmatic, via dependent projects calling `options.set_option("solver-time-limit", N)` directly. That left the option's plumbing untested in CI and the only backend that honoured it (MiniSat 2) had no end-to-end coverage. This change: * Registers `--solver-time-limit N` in `OPT_SOLVER` / `HELP_SOLVER` and wires the cmdline-to-options forwarding in `parse_solver_options`. `cbmc`, `jbmc`, `goto-synthesizer` and `libcprover-cpp` (which all use `parse_solver_options`) now accept the flag from the command line. * Documents the flag in `doc/man/cbmc.1`, `doc/man/goto-synthesizer.1` and `doc/man/jbmc.1`, so `scripts/check_help.sh` continues to pass. * Implements `set_time_limit_seconds` for the IPASIR back-end (`satcheck_ipasirt`). The implementation registers an `ipasir_set_terminate` callback that returns 1 once the per-solve deadline has passed; IPASIR solvers poll the callback during solving and abort cleanly when it does. On timeout the back-end returns `P_ERROR` (matching the MiniSat 2 back-end's behaviour) instead of throwing `analysis_exceptiont`. Tests: * `regression/cbmc/solver-time-limit/`: a `CORE` regression test that pins down the happy path -- the option is parsed and propagated through to the solver layer without breaking a trivial solve. Runs on every CI build. * `unit/solvers/sat/satcheck_minisat2.cpp`: extended with a pigeonhole-PHP(20) `SCENARIO` that builds the formula directly via `new_variable` / `lcnf`, sets a 1-second time limit, and asserts that `prop_solve` returns `P_ERROR` within five seconds. PHP is provably exponential for resolution-based SAT (Haken, 1985), so the timeout is reliably exceeded on any reasonable hardware. * `unit/solvers/sat/satcheck_ipasir.cpp`: new file, mirroring the satcheck_minisat2 test. Conditional on `HAVE_IPASIR`, so it only compiles in CI builds that link an IPASIR back-end (currently the `check-ubuntu-24_04-make-gcc` job, which uses Riss as the IPASIR provider). This is the first piece of foundation work for a wider redesign of CBMC's solver-timeout infrastructure (sub-second granularity, cross-OS, watchdog-based interrupt). Subsequent commits will build on a now-tested baseline. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The solver-time-limit infrastructure used to be expressed in whole
seconds, was implemented for MiniSat 2 via POSIX `alarm()` +
`SIGALRM` (so it was POSIX-only and process-wide), and was
explicitly skipped on Windows with a "Time limit ignored (not
supported on Win32 yet)" warning. In practice this meant:
* Sub-second budgets -- needed for fine-grained per-check
timeouts inside symbolic execution -- could not be expressed.
* Windows builds had no time-limit support at all.
* The signal handler relied on a process-wide
`solver_to_interrupt` pointer; nesting solve calls across
threads or having two solvers active in the same process was
fragile.
This change reworks the API and the MiniSat 2 implementation:
* `solver_resource_limitst::set_time_limit_milliseconds` and
`propt::set_time_limit_milliseconds` are now the primary
virtual API. The default `propt` implementation logs a
warning, mirroring the previous behaviour for back-ends with
no native timeout support.
* `set_time_limit_seconds` survives as a non-virtual helper
that forwards to `set_time_limit_milliseconds(lim * 1000)`,
so external callers (e.g. the CLI option-parser, dependent
projects) keep working without source changes.
* The MiniSat 2 back-end now implements the time limit using a
`std::thread` watchdog: when the limit is non-zero, a
watchdog thread waits on a `std::condition_variable` for
either the solve to complete (signalled via a flag) or the
deadline to elapse, and on timeout calls
`solver->interrupt()` (which is just a flag set on the
Minisat::Solver object and is safe to invoke from another
thread). The watchdog is joined before `do_prop_solve`
returns. Result: cross-OS, sub-millisecond granularity, no
global signal state, no `SIGALRM` collision risk.
* The IPASIR back-end's existing
`set_time_limit_seconds` override is renamed/repurposed as
`set_time_limit_milliseconds`; the deadline is computed in
`std::chrono::milliseconds` instead of seconds. The
callback already polled a `chrono::steady_clock` deadline,
so this is essentially a stored-precision change.
* The `solver_factoryt` call site in `solver_factory.cpp`
keeps using the seconds helper (the CLI flag
`--solver-time-limit` remains expressed in seconds for
user familiarity); only the internal API is now in
milliseconds.
Tests:
* Existing pigeonhole(20) SCENARIOs in
`unit/solvers/sat/satcheck_minisat2.cpp` and
`unit/solvers/sat/satcheck_ipasir.cpp` now request a
200-millisecond limit and verify
(a) that `prop_solve` returns `P_ERROR`, and
(b) that wall-clock elapsed time is well under five seconds
(a generous slack for slow CI hardware and watchdog-join
latency).
Locally the satcheck_minisat2 test completes in ~207 ms,
confirming sub-millisecond accuracy of the watchdog.
This is the second piece of the wider solver-timeout rework.
Subsequent work will add CaDiCaL terminator support and let the
branch-pruning code use sub-second per-call budgets directly.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The CaDiCaL back-end was the last SAT back-end in CBMC for which
`set_time_limit_milliseconds` (and its `set_time_limit_seconds`
helper) silently fell through to `propt`'s default no-op + warning.
This change adds native time-limit support via CaDiCaL's
`Terminator` callback API:
* `satcheck_cadical_baset` declares a nested `terminatort` class
in the header (forward-declared to keep `cadical.hpp` out of
the header).
* `terminatort` is defined in the .cpp and inherits
`CaDiCaL::Terminator`. It stores a deadline as a
`steady_clock::time_point` and returns true once the deadline
has passed.
* `set_time_limit_milliseconds(uint32_t)` stores the limit on
the solver instance.
* `do_prop_solve` constructs / refreshes the deadline before
calling `solver->solve()`, calls
`solver->connect_terminator(...)` while solving, and calls
`solver->disconnect_terminator()` afterwards. CaDiCaL polls
the terminator at decision points during solving and aborts
cleanly on a true return.
* On interrupt, `do_prop_solve` now returns `propt::resultt
::P_ERROR` -- matching the MiniSat 2 and IPASIR back-ends --
instead of throwing `analysis_exceptiont`. The
`<util/exception_utils.h>` include is dropped as it is no
longer needed.
A `set_time_limit_milliseconds(200)` SCENARIO is added to
`unit/solvers/sat/satcheck_cadical.cpp` mirroring the existing
MiniSat 2 / IPASIR timeout tests: a pigeonhole(20) CNF (provably
exponential for resolution-based SAT) is built directly via
`new_variable` / `lcnf`, and the solver is asserted to return
`P_ERROR` within five seconds. Locally the test completes in
~208 ms (200 ms deadline + ~8 ms terminator-polling latency),
confirming sub-millisecond accuracy of the CaDiCaL implementation.
After this commit, all three SAT back-ends with native interrupt
infrastructure (MiniSat 2 via std::thread watchdog, IPASIR via
`ipasir_set_terminate`, CaDiCaL via `connect_terminator`) honour
`--solver-time-limit` and `set_time_limit_milliseconds` with
sub-second granularity, are portable across operating systems,
and return `P_ERROR` on interrupt for callers to detect.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Query the SAT solver at each branch point to eliminate infeasible branches. Works in both --paths mode (push/pop scoping) and monolithic mode (incremental conversion with converted_for_pruning flag). Uses a non-simplifying MiniSat solver to avoid backwardSubsumptionCheck overhead; falls back to solver_factoryt when MiniSat is not available (e.g., CaDiCaL builds). Adaptive pruning: cumulative 2-second time budget auto-disables pruning when overhead exceeds benefit. Assumes are explicitly asserted as guard => cond. Includes fix for symex::args name collisions across paths via a static type registry. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The branch pruning solver performs a single SAT check without the iterative refinement loop. With --refine, --refine-arrays, or --refine-strings, this can give unsound results on the initial over-approximation (e.g., incorrectly determining a branch is infeasible). Disable branch pruning for all refinement modes. The --refine flag (used by --incremental-loop) also creates a bv_refinement solver. The branch pruning solver doesn't perform the refinement loop, so it must be disabled for all refinement modes, not just --refine-arrays and --refine-strings. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com> The branch pruning solver performs a single SAT check without the iterative refinement loop. With --refine, --refine-arrays, or --refine-strings, this can give unsound results on the initial over-approximation (e.g., incorrectly determining a branch is infeasible). Disable branch pruning for all refinement modes.
Add a command-line option to disable the solver-based branch pruning that is now active by default in both --paths and monolithic modes. This is useful for: - Comparative benchmarking (measuring pruning impact) - Working around potential edge cases in branch pruning - Debugging verification results Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Tests covering: prune_assume, no_prune_nondet, prune_chained, prune_loop, prune_malloc, refine_arrays_sound, refine_sound, guarded_assume, monolithic_prune, malloc_dynamic_bounds, stdlib_no_paths_explosion. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
3834a1a to
9c48042
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Query the SAT solver at each branch point to eliminate infeasible branches. Works in both --paths mode (push/pop scoping) and monolithic mode (incremental conversion with converted_for_pruning flag). Uses a non-simplifying MiniSat solver to avoid backwardSubsumptionCheck overhead; falls back to solver_factoryt when MiniSat is not available (e.g., CaDiCaL builds).
Adaptive pruning: cumulative 2-second time budget auto-disables pruning when overhead exceeds benefit.
Assumes are explicitly asserted as guard => cond. Includes fix for symex::args name collisions across paths via a static type registry.