Skip to content

[DRAFT] experiment: add test sharding#17438

Draft
daniel-sanche wants to merge 39 commits into
mainfrom
ci_sharding
Draft

[DRAFT] experiment: add test sharding#17438
daniel-sanche wants to merge 39 commits into
mainfrom
ci_sharding

Conversation

@daniel-sanche

@daniel-sanche daniel-sanche commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Experimenting with test sharding

Unit Tests

  • if less than 10 packages are changed, execute one test per python version (existing behaviour)
  • if more than 10 packages would be in a test, add an additional shard for each python runtime, and spread out packages between them
  • use an upper limit of 10 shards (per runtime), with packages evenly distributed
  • adds an end unit test complete step, which is only green if all shards pass. This can be our new required check for unit tests

System Tests

  • Previously system tests run sequentially as a single script
  • With this change, they run in parallel, reported as separate Targets in TestGrid
  • test logs are still all combined into the primary build.log for viewing in one place

Fail Fast

  • if any single unit test shard or system test job fails, the remaining end early to provide quick feedback

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces parallel execution for system tests and implements a sharding mechanism for CI jobs, including a new Python script to group packages and updates to the test runner script. Feedback on these changes focuses on improving reliability and safety: first, by using an EXIT trap in .kokoro/system.sh to guarantee cleanup of isolated gcloud configuration directories in case of test failures; and second, by avoiding global toggles of set -e in ci/run_conditional_tests.sh and instead capturing test exit codes using the || operator.

Comment thread .kokoro/system.sh
Comment on lines +42 to +43
# Isolate gcloud config for parallel execution
export CLOUDSDK_CONFIG=$(mktemp -d)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since run_package_test runs with set -e enabled, any test failure or error will cause the subshell to exit immediately. This prevents the manual cleanup at the end of the function from running, leading to leaked temporary directories in /tmp.

Using an EXIT trap ensures that the temporary directory is reliably cleaned up when the subshell exits, regardless of whether the tests succeeded or failed.

Suggested change
# Isolate gcloud config for parallel execution
export CLOUDSDK_CONFIG=$(mktemp -d)
# Isolate gcloud config for parallel execution
export CLOUDSDK_CONFIG=$(mktemp -d)
trap 'rm -rf "${CLOUDSDK_CONFIG}"' EXIT

Comment thread .kokoro/system.sh
Comment on lines +93 to +94
# Clean up isolated gcloud config
rm -rf "${CLOUDSDK_CONFIG}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the EXIT trap handling the cleanup of CLOUDSDK_CONFIG automatically, this manual cleanup is redundant and can be safely removed.

Comment on lines +92 to +95
set +e
${test_script}
ret=$?
set -e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Toggling set -e globally can lead to unexpected side effects or mask errors in other parts of the script if not carefully managed. A more idiomatic and safer way to capture the exit status of a command under set -e is to use the || operator, which prevents the shell from exiting on failure without needing to disable set -e globally.

Suggested change
set +e
${test_script}
ret=$?
set -e
ret=0
${test_script} || ret=$?

@daniel-sanche daniel-sanche changed the title [DRAFT] chore: add test sharding [DRAFT] experiment: add test sharding Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant