-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[DRAFT] experiment: add test sharding #17438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
daniel-sanche
wants to merge
39
commits into
main
Choose a base branch
from
ci_sharding
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
01d4753
added unit test sharding
daniel-sanche a7502d4
run system tests concurrently
daniel-sanche 9ad0f0c
added changes to many packages for testing
daniel-sanche 02c7653
added sharding test file
daniel-sanche 3c8d057
added unit-complete to gather all shards
daniel-sanche 73ab091
changed shard params
daniel-sanche 74f1ef7
fixed tests
daniel-sanche a450254
fix coverage
daniel-sanche af4d10c
fix system tests
daniel-sanche 15916df
split out system test logs
daniel-sanche 49cac0f
attempt fix for lint
daniel-sanche d8dd522
update system tests to show logs for each target
daniel-sanche de74a68
update sharding logic
daniel-sanche 46d1583
fixed lint/mypy runs
daniel-sanche 6e3531d
removed sqlalchemy-spanner from touched packages
daniel-sanche e6167bd
system tests print all logs in main build log
daniel-sanche fd5792d
removed many SHARD_TEST files
daniel-sanche 8b825d5
updated lint and mypy logic
daniel-sanche 32968a9
add shard descriptions
daniel-sanche ff1274f
remove global run on ci/ change
daniel-sanche c49b06d
attempt cover fix
daniel-sanche 1cd8c1f
attempt fix for cover
daniel-sanche a0f5c9a
allow hidden files for cover
daniel-sanche 71f444a
fail fast
daniel-sanche cf1a1f1
use individual coverage checks
daniel-sanche b00967f
loosen firestore coverage requirement
daniel-sanche 7bb323c
activated more packages
daniel-sanche 3633089
10 packages total
daniel-sanche 980d8eb
change default for coverage percent
daniel-sanche a5cc497
activated 11th package (enable sharding)
daniel-sanche b36b559
fixed typo
daniel-sanche c3cf49e
change shard logic
daniel-sanche acdfb18
iterating on shard logic
daniel-sanche 9c80a51
rename tests
daniel-sanche cf554ca
add label to number in shard
daniel-sanche 886d6d9
unit tests fail if initialize fails
daniel-sanche 83ca04a
added known-bad package
daniel-sanche ae05642
added summary to cover step
daniel-sanche 7b565a4
added no-fail .coveragerc to sqlalchemy-spanner
daniel-sanche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ | |
| action { | ||
| define_artifacts { | ||
| regex: "**/*sponge_log.xml" | ||
| regex: "**/*sponge_log.log" | ||
| } | ||
| } | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ | |
| action { | ||
| define_artifacts { | ||
| regex: "**/*sponge_log.xml" | ||
| regex: "**/*sponge_log.log" | ||
| } | ||
| } | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -38,7 +38,18 @@ pwd | |
| run_package_test() { | ||
| local package_name=$1 | ||
| local package_path="packages/${package_name}" | ||
|
|
||
|
|
||
| # Create a dedicated directory for this package's artifacts | ||
| local artifact_dir="${KOKORO_ARTIFACTS_DIR}/${package_name}" | ||
| mkdir -p "${artifact_dir}" | ||
|
|
||
| # Define standard output paths for logs and XML results | ||
| export XML_OUTPUT_FILE="${artifact_dir}/sponge_log.xml" | ||
| local log_file="${artifact_dir}/sponge_log.log" | ||
|
|
||
| # Isolate gcloud config for parallel execution | ||
| export CLOUDSDK_CONFIG=$(mktemp -d) | ||
|
|
||
| # Declare local overrides to prevent bleeding into the next loop iteration | ||
| local PROJECT_ID | ||
| local GOOGLE_APPLICATION_CREDENTIALS | ||
|
|
@@ -48,6 +59,8 @@ run_package_test() { | |
|
|
||
| echo "------------------------------------------------------------" | ||
| echo "Configuring environment for: ${package_name}" | ||
| echo "Log file: ${log_file}" | ||
| echo "XML results: ${XML_OUTPUT_FILE}" | ||
| echo "------------------------------------------------------------" | ||
|
|
||
| case "${package_name}" in | ||
|
|
@@ -82,17 +95,86 @@ run_package_test() { | |
| # Run the actual test | ||
| pushd "${package_path}" > /dev/null | ||
| set +e | ||
| "${system_test_script}" | ||
| # *** CRITICAL: system-single.sh MUST use XML_OUTPUT_FILE to generate the JUnit XML *** | ||
| "${system_test_script}" > "${log_file}" 2>&1 | ||
| local res=$? | ||
| set -e | ||
| popd > /dev/null | ||
|
|
||
|
|
||
| # Clean up isolated gcloud config | ||
| rm -rf "${CLOUDSDK_CONFIG}" | ||
|
Comment on lines
+104
to
+105
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| return $res | ||
| } | ||
|
|
||
| # A file for running system tests | ||
| system_test_script="${PROJECT_ROOT}/.kokoro/system-single.sh" | ||
| # Ensure KOKORO_ARTIFACTS_DIR is set, fallback to a local directory | ||
| export KOKORO_ARTIFACTS_DIR="${KOKORO_ARTIFACTS_DIR:-${PROJECT_ROOT}/.artifacts}" | ||
|
|
||
| # Parallel execution settings | ||
| MAX_PARALLEL=4 | ||
| running_pids=() | ||
| declare -A pid_to_pkg | ||
| declare -A pid_to_resfile | ||
|
|
||
| # Array to keep track of results for the final summary | ||
| results=() | ||
| handle_finished_job() { | ||
| local pid=$1 | ||
| local pkg=${pid_to_pkg[$pid]} | ||
| local resfile=${pid_to_resfile[$pid]} | ||
| local pkg_log="${KOKORO_ARTIFACTS_DIR}/${pkg}/sponge_log.log" | ||
|
|
||
| # wait $pid might fail if it was already reaped by wait -n, | ||
| # so we ignore its exit code and use the resfile. | ||
| wait "$pid" 2>/dev/null || true | ||
|
|
||
| local res=1 | ||
| if [ -s "$resfile" ]; then | ||
| res=$(cat "$resfile") | ||
| rm "$resfile" | ||
| else | ||
| # If the file is empty or missing, the subshell crashed early | ||
| res=1 | ||
| fi | ||
|
|
||
| echo "============================================================" | ||
| echo "System tests for ${pkg} finished (Exit code: ${res})" | ||
| echo "Artifacts in: ${KOKORO_ARTIFACTS_DIR}/${pkg}" | ||
| echo "============================================================" | ||
|
|
||
| # Print the package log to the console | ||
| if [ -f "${pkg_log}" ]; then | ||
| echo "--- Start of Logs for ${pkg} ---" | ||
| cat "${pkg_log}" | ||
| echo "--- End of Logs for ${pkg} ---" | ||
| else | ||
| echo "Console Fallback Warning: Log file not found at ${pkg_log}" | ||
| fi | ||
| echo "" | ||
|
|
||
| if [ -z "${res}" ] || [ "${res}" -ne 0 ]; then | ||
| echo "============================================================" | ||
| echo "FAIL-FAST: System tests for ${pkg} failed!" | ||
| echo "Cancelling all remaining running background jobs..." | ||
| echo "============================================================" | ||
|
|
||
| # Kill all other active background processes | ||
| for active_pid in "${running_pids[@]}"; do | ||
| if [ "$active_pid" != "$pid" ] && kill -0 "$active_pid" 2>/dev/null; then | ||
| pkg_to_cancel=${pid_to_pkg[$active_pid]} | ||
| echo "Cancelling active system tests for ${pkg_to_cancel} (PID: ${active_pid})..." | ||
| # Send SIGTERM to allow graceful cleanup of resources if possible, or SIGKILL | ||
| kill -9 "$active_pid" 2>/dev/null || true | ||
| fi | ||
| done | ||
|
|
||
| exit "${res}" | ||
| else | ||
| results+=("${pkg}: PASSED") | ||
| fi | ||
| } | ||
| # Run system tests for each package with directory packages/*/tests/system | ||
| for path in `find 'packages' \ | ||
| \( -type d -wholename 'packages/*/tests/system' \) -o \ | ||
|
|
@@ -140,10 +222,55 @@ for path in `find 'packages' \ | |
| set -e | ||
|
|
||
| if [[ "${package_modified}" -gt 0 || "$KOKORO_BUILD_ARTIFACTS_SUBDIR" == *"continuous"* ]]; then | ||
| # Call the function - its internal exports won't affect the next loop | ||
| run_package_test "$package_name" || RETVAL=$? | ||
| # Wait if we have reached MAX_PARALLEL | ||
| while [[ ${#running_pids[@]} -ge $MAX_PARALLEL ]]; do | ||
| set +e | ||
| wait -n | ||
| set -e | ||
| # Find which job finished | ||
| new_pids=() | ||
| for pid in "${running_pids[@]}"; do | ||
| if kill -0 "$pid" 2>/dev/null; then | ||
| new_pids+=("$pid") | ||
| else | ||
| handle_finished_job "$pid" | ||
| fi | ||
| done | ||
| running_pids=("${new_pids[@]}") | ||
| done | ||
|
|
||
| # Start the next test in the background | ||
| res_file=$(mktemp) | ||
| ( | ||
| if run_package_test "$package_name"; then | ||
| echo 0 > "$res_file" | ||
| else | ||
| res=$? | ||
| echo $res > "$res_file" | ||
| fi | ||
| ) & | ||
| pid=$! | ||
| running_pids+=($pid) | ||
| pid_to_pkg[$pid]=$package_name | ||
| pid_to_resfile[$pid]=$res_file | ||
| echo "Started system tests for ${package_name} (PID: ${pid})" | ||
| else | ||
| echo "No changes in ${package_name} and not a continuous build, skipping." | ||
| fi | ||
| done | ||
|
|
||
| # Wait for all remaining jobs | ||
| for pid in "${running_pids[@]}"; do | ||
| handle_finished_job "$pid" | ||
| done | ||
|
|
||
| echo "------------------------------------------------------------" | ||
| echo "System Test Summary" | ||
| echo "------------------------------------------------------------" | ||
| for res in "${results[@]}"; do | ||
| echo "$res" | ||
| done | ||
| echo "------------------------------------------------------------" | ||
|
|
||
| exit ${RETVAL} | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
run_package_testruns withset -eenabled, any test failure or error will cause the subshell to exit immediately. This prevents the manual cleanup at the end of the function from running, leading to leaked temporary directories in/tmp.Using an
EXITtrap ensures that the temporary directory is reliably cleaned up when the subshell exits, regardless of whether the tests succeeded or failed.