ci_analysis: focus on post-merge develop runs; medians, trend, critical path#9035
ci_analysis: focus on post-merge develop runs; medians, trend, critical path#9035tautschnig wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refines scripts/ci_analysis.py to make CI performance investigations more representative of post-merge behavior by filtering to branch/event runs (default: develop + push), including failed/timed-out runs, and switching summary statistics from mean to median. It also expands reporting to include a per-run trend view and a per-job “critical path” section to highlight serial bottlenecks.
Changes:
- Filter analysed workflow runs by branch/event (defaults to post-merge
developpush) while including all completed conclusions (not just successes). - Replace cross-run mean-based summaries with median/min/max/n reporting.
- Add run trend output (slowest job per run) and per-job critical path reporting (longest single ctest suite vs total).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| longest: dict[str, list[float]] = defaultdict(list) | ||
| longest_name: dict[str, str] = {} | ||
| total: dict[str, list[float]] = defaultdict(list) | ||
| for r in results: | ||
| for job_name, detail in r.get("details", {}).items(): | ||
| suites = detail.get("ctest_tests", []) | ||
| if not suites: | ||
| continue | ||
| top_suite = max(suites, key=lambda s: s["duration_s"]) | ||
| longest[job_name].append(top_suite["duration_s"]) | ||
| total[job_name].append(sum(s["duration_s"] for s in suites)) | ||
| # Remember the name of the longest suite seen most recently. | ||
| longest_name[job_name] = top_suite["name"] | ||
| out: dict[str, dict] = {} | ||
| for job_name in longest: | ||
| out[job_name] = { | ||
| "longest_med": median(longest[job_name]), | ||
| "longest_suite": longest_name.get(job_name, "?"), | ||
| "total_med": median(total[job_name]), | ||
| "n": len(longest[job_name]), | ||
| } | ||
| return out |
| def _stats(durs: list[float]) -> str: | ||
| """``<median> (med) · <min>–<max> · n=<count>`` for a list of durations.""" | ||
| return (f"{fmt_duration(median(durs))} (med) · " | ||
| f"{fmt_duration(min(durs))}–{fmt_duration(max(durs))} · " | ||
| f"n={len(durs)}") |
| @@ -580,14 +666,10 @@ | |||
| if test_durations: | |||
| info(f"\n Slowest individual ctest tests (by mean, across runs):") | |||
| @@ -600,14 +682,10 @@ | |||
| if suite_durations: | |||
| info(f"\n Slowest make test suites (by mean, across runs):") | |||
| info(f"\n Slowest individual tests within ctest suites " | ||
| f"(by mean, across runs):") | ||
| sorted_indiv = sorted(indiv_durations.items(), |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #9035 +/- ##
========================================
Coverage 80.65% 80.65%
========================================
Files 1713 1713
Lines 189427 189427
Branches 73 73
========================================
Hits 152783 152783
Misses 36644 36644 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…al path The CI performance analysis previously selected the most recent successful runs of the workflow regardless of trigger, so it mixed in pull_request runs (which execute on throw-away merge commits and run far more often) and excluded the failed/timed-out runs that a performance investigation most needs. It also reported per-run means, which are easily skewed by the large run-to-run variance of the macOS runners. This reworks scripts/ci_analysis.py to: - select runs by branch (--branch develop) and event (--event push) so the default is post-merge develop runs, and include completed runs of any conclusion (failed/timed-out included); - report medians (robust to runner variance) with min/max/n rather than means; - add a per-run trend table (date, commit, result, slowest job) so a genuine upward trend is distinguishable from one-off slow runs; - add a per-job "critical path" section reporting the longest single ctest suite (which cannot be parallelised and hence bounds wall-clock under -jN) alongside the sum of all suites, highlighting suites worth splitting. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The PR introducing the median-based aggregation, trend table and critical-path section had no test coverage. Most of ci_analysis.py is I/O-bound on the gh CLI, but the new helpers (median, _stats, _slowest_job_per_run, _critical_paths) are pure functions over plain dicts/lists and are worth pinning down. Add a stdlib-only, pytest-style module covering odd/even/empty/single-element medians, the empty-input behaviour of _stats, slowest-job selection, and -- in particular -- that the critical-path output reports a suite name consistent with the median it shows even when the slowest suite differs between runs. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
We have a number of Python helper scripts under scripts/ but no automated tests for any of them. Add a workflow that runs the pytest suite over scripts/, so test coverage for these scripts can be built out incrementally (ci_analysis.py is the first to gain tests). The job is restricted to pull requests and path-filtered to runs that touch a Python script under scripts/ (or this workflow), so it stays off the critical path of unrelated changes. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
406aabb to
0a0b544
Compare
The CI performance analysis previously selected the most recent successful runs of the workflow regardless of trigger, so it mixed in pull_request runs (which execute on throw-away merge commits and run far more often) and excluded the failed/timed-out runs that a performance investigation most needs. It also reported per-run means, which are easily skewed by the large run-to-run variance of the macOS runners.
This reworks scripts/ci_analysis.py to: