Skip to content

ci_analysis: focus on post-merge develop runs; medians, trend, critical path#9035

Open
tautschnig wants to merge 3 commits into
diffblue:developfrom
tautschnig:ci-analysis-improvement
Open

ci_analysis: focus on post-merge develop runs; medians, trend, critical path#9035
tautschnig wants to merge 3 commits into
diffblue:developfrom
tautschnig:ci-analysis-improvement

Conversation

@tautschnig

Copy link
Copy Markdown
Collaborator

The CI performance analysis previously selected the most recent successful runs of the workflow regardless of trigger, so it mixed in pull_request runs (which execute on throw-away merge commits and run far more often) and excluded the failed/timed-out runs that a performance investigation most needs. It also reported per-run means, which are easily skewed by the large run-to-run variance of the macOS runners.

This reworks scripts/ci_analysis.py to:

  • select runs by branch (--branch develop) and event (--event push) so the default is post-merge develop runs, and include completed runs of any conclusion (failed/timed-out included);
  • report medians (robust to runner variance) with min/max/n rather than means;
  • add a per-run trend table (date, commit, result, slowest job) so a genuine upward trend is distinguishable from one-off slow runs;
  • add a per-job "critical path" section reporting the longest single ctest suite (which cannot be parallelised and hence bounds wall-clock under -jN) alongside the sum of all suites, highlighting suites worth splitting.
  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Jun 12, 2026
Copilot AI review requested due to automatic review settings June 12, 2026 13:09
@tautschnig tautschnig requested review from a team, kroening and peterschrammel as code owners June 12, 2026 13:09

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines scripts/ci_analysis.py to make CI performance investigations more representative of post-merge behavior by filtering to branch/event runs (default: develop + push), including failed/timed-out runs, and switching summary statistics from mean to median. It also expands reporting to include a per-run trend view and a per-job “critical path” section to highlight serial bottlenecks.

Changes:

  • Filter analysed workflow runs by branch/event (defaults to post-merge develop push) while including all completed conclusions (not just successes).
  • Replace cross-run mean-based summaries with median/min/max/n reporting.
  • Add run trend output (slowest job per run) and per-job critical path reporting (longest single ctest suite vs total).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/ci_analysis.py Outdated
Comment on lines +594 to +615
longest: dict[str, list[float]] = defaultdict(list)
longest_name: dict[str, str] = {}
total: dict[str, list[float]] = defaultdict(list)
for r in results:
for job_name, detail in r.get("details", {}).items():
suites = detail.get("ctest_tests", [])
if not suites:
continue
top_suite = max(suites, key=lambda s: s["duration_s"])
longest[job_name].append(top_suite["duration_s"])
total[job_name].append(sum(s["duration_s"] for s in suites))
# Remember the name of the longest suite seen most recently.
longest_name[job_name] = top_suite["name"]
out: dict[str, dict] = {}
for job_name in longest:
out[job_name] = {
"longest_med": median(longest[job_name]),
"longest_suite": longest_name.get(job_name, "?"),
"total_med": median(total[job_name]),
"n": len(longest[job_name]),
}
return out
Comment thread scripts/ci_analysis.py
Comment on lines +568 to +572
def _stats(durs: list[float]) -> str:
"""``<median> (med) · <min>–<max> · n=<count>`` for a list of durations."""
return (f"{fmt_duration(median(durs))} (med) · "
f"{fmt_duration(min(durs))}–{fmt_duration(max(durs))} · "
f"n={len(durs)}")
Comment thread scripts/ci_analysis.py Outdated
@@ -580,14 +666,10 @@
if test_durations:
info(f"\n Slowest individual ctest tests (by mean, across runs):")
Comment thread scripts/ci_analysis.py Outdated
@@ -600,14 +682,10 @@
if suite_durations:
info(f"\n Slowest make test suites (by mean, across runs):")
Comment thread scripts/ci_analysis.py
Comment on lines 700 to 702
info(f"\n Slowest individual tests within ctest suites "
f"(by mean, across runs):")
sorted_indiv = sorted(indiv_durations.items(),
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.65%. Comparing base (40cbfd8) to head (0a0b544).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #9035   +/-   ##
========================================
  Coverage    80.65%   80.65%           
========================================
  Files         1713     1713           
  Lines       189427   189427           
  Branches        73       73           
========================================
  Hits        152783   152783           
  Misses       36644    36644           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tautschnig and others added 3 commits June 12, 2026 20:16
…al path

The CI performance analysis previously selected the most recent successful
runs of the workflow regardless of trigger, so it mixed in pull_request runs
(which execute on throw-away merge commits and run far more often) and
excluded the failed/timed-out runs that a performance investigation most
needs. It also reported per-run means, which are easily skewed by the large
run-to-run variance of the macOS runners.

This reworks scripts/ci_analysis.py to:
- select runs by branch (--branch develop) and event (--event push) so the
  default is post-merge develop runs, and include completed runs of any
  conclusion (failed/timed-out included);
- report medians (robust to runner variance) with min/max/n rather than means;
- add a per-run trend table (date, commit, result, slowest job) so a genuine
  upward trend is distinguishable from one-off slow runs;
- add a per-job "critical path" section reporting the longest single ctest
  suite (which cannot be parallelised and hence bounds wall-clock under -jN)
  alongside the sum of all suites, highlighting suites worth splitting.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The PR introducing the median-based aggregation, trend table and critical-path
section had no test coverage. Most of ci_analysis.py is I/O-bound on the gh
CLI, but the new helpers (median, _stats, _slowest_job_per_run,
_critical_paths) are pure functions over plain dicts/lists and are worth
pinning down.

Add a stdlib-only, pytest-style module covering odd/even/empty/single-element
medians, the empty-input behaviour of _stats, slowest-job selection, and -- in
particular -- that the critical-path output reports a suite name consistent
with the median it shows even when the slowest suite differs between runs.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
We have a number of Python helper scripts under scripts/ but no automated
tests for any of them. Add a workflow that runs the pytest suite over
scripts/, so test coverage for these scripts can be built out incrementally
(ci_analysis.py is the first to gain tests).

The job is restricted to pull requests and path-filtered to runs that touch a
Python script under scripts/ (or this workflow), so it stays off the critical
path of unrelated changes.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the ci-analysis-improvement branch from 406aabb to 0a0b544 Compare June 12, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants