build(ci): ensure C++ toolchain before pytest by wanghan-iapcm · Pull Request #5573 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-06-22T11:29:50Z

Summary

The Test Python job (.github/workflows/test_python.yml) does checkout → setup-python → pip install → pytest and never verifies the C++ compiler, yet torch inductor shells out to the system g++ to JIT-compile C++ kernels at test time (e.g. the _CompiledModel / torch.compile tests).

When a GitHub-hosted runner VM occasionally boots with a broken gcc — the g++ driver present but its backend cc1plus missing — this surfaces as an opaque failure deep inside the suite:

torch._inductor.exc.InductorError: CppCompileError: C++ compile error
g++: fatal error: cannot execute 'cc1plus': execvp: No such file or directory

This was observed once on an unrelated docs-only PR. The re-run on a fresh VM with identical code passed, confirming it is a transient per-VM toolchain corruption, not a code defect — and a survey of other recent Test Python failures showed all of them were genuine code/test failures, none with this cc1plus signature.

Change

Add a step right after setup-python that:

reinstalls build-essential (re-laying down gcc, which ships cc1plus), self-healing a corrupted VM, and
verifies a trivial C++ compile, so if the toolchain is still broken the job fails immediately with a clear message instead of 40 minutes into pytest.

Notes / known limitations

This makes the flake self-correct without a manual job re-run. It does not eliminate the underlying hosted-runner image variability; pinning to a container image would be the fully-deterministic alternative but is a larger change.
pytest-rerunfailures would not help here: retrying the same test on the same VM hits the same missing cc1plus.

Summary by CodeRabbit

Chores
- Updated internal CI/CD workflow to enhance build validation during automated testing.

Hosted-runner VMs occasionally boot with a broken gcc where the g++ driver exists but cc1plus is missing. torch inductor JIT-compiles C++ kernels at test time, so this surfaces as an opaque 'g++: cannot execute cc1plus' CppCompileError deep inside the Test Python suite (observed once on an unrelated docs PR; a re-run on a fresh VM passed with identical code). Add a step that reinstalls build-essential and verifies a trivial C++ compile before the test run, self-healing a corrupted VM and failing fast with a clear message if the toolchain is still broken.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed5b92ddc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T11:32:23Z

+        # fail fast here with a clear message if it is still broken.
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y --reinstall build-essential


Reinstall the compiler package, not the metapackage

For the runner corruption described here (the g++ driver exists but cc1plus is missing), this step does not actually restore the missing backend: build-essential is only a small informational metapackage whose dependency list includes g++, and an apt-get -s install --reinstall build-essential dry run reinstalls exactly one package, build-essential, not its already-installed dependencies. As a result, a VM with a damaged g++/g++-11 package will still fail the compile check instead of self-healing; reinstall g++/the versioned compiler package directly.

Useful? React with 👍 / 👎.

coderabbitai · 2026-06-22T11:33:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 68815877-1ac9-411b-aea9-72397e0e288c

📥 Commits

Reviewing files that changed from the base of the PR and between 03682bf and ed5b92d.

📒 Files selected for processing (1)

.github/workflows/test_python.yml

📝 Walkthrough

Walkthrough

The Python CI workflow gains a new step in the testpython job that reinstalls build-essential via apt-get and compiles a minimal C++ program with g++ to verify that cc1plus is present before the torch inductor tests execute.

Changes

Python CI C++ Toolchain Verification

Layer / File(s)	Summary
g++ compile verification step `.github/workflows/test_python.yml`	Adds a step that reinstalls `build-essential` and compiles a trivial C++ program with `g++`, failing the job early if `cc1plus` is absent.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'build(ci): ensure C++ toolchain before pytest' directly and specifically describes the main change—adding a CI step to verify the C++ toolchain before running pytest tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-22T12:25:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.15%. Comparing base (03682bf) to head (ed5b92d).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5573   +/-   ##
=======================================
  Coverage   82.14%   82.15%           
=======================================
  Files         900      900           
  Lines      104139   104138    -1     
  Branches     4471     4470    -1     
=======================================
  Hits        85550    85550           
  Misses      17178    17178           
+ Partials     1411     1410    -1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

wanghan-iapcm · 2026-06-22T15:43:58Z

Closing this — the diagnosis is wrong, as @njzjz pointed out: if the gcc toolchain were broken at boot, the editable pip install -e . (which compiles deepmd's PyTorch C++ ops) would have failed first.

The failed-run log confirms it. On the affected VM (run 27929887537, job Test Python (10, 3.13)):

04:36:50 -> 04:38:16: deepmd-kit's C++ built successfully (~86s, needs cc1plus)
04:38:47: dp --version works -> install fully succeeded
04:38:49: pytest starts
~05:32:16 (~53 min later): torch inductor's g++ ... main.cpp fails with cannot execute 'cc1plus'

So the toolchain was healthy at install and only failed deep into the test run. A reinstall-at-start (this PR) runs while the toolchain is still fine and does nothing for a mid-run degradation, so it would not reliably prevent this. Only 1 of 24 shards failed and a re-run on a fresh VM passed, i.e. a transient single-VM event. Closing as misdiagnosed; investigating the real mid-run cause separately.

dosubot Bot added the build label Jun 22, 2026

wanghan-iapcm requested a review from njzjz June 22, 2026 11:30

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

wanghan-iapcm closed this Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(ci): ensure C++ toolchain before pytest#5573

build(ci): ensure C++ toolchain before pytest#5573
wanghan-iapcm wants to merge 1 commit into
deepmodeling:masterfrom
wanghan-iapcm:build/ci-ensure-cxx-toolchain

wanghan-iapcm commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

wanghan-iapcm commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wanghan-iapcm commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change

Notes / known limitations

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wanghan-iapcm commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wanghan-iapcm commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

codecov Bot commented Jun 22, 2026 •

edited

Loading