Migrate ILAMB regression baselines to Framework B (cmip6 batch)#738
Merged
Conversation
First batch of Framework-B committed baselines for ILAMB,
covering the mrsos-wangmao, gpp-fluxnet2015 and lai-avh15c1 cmip6 test cases.
Native blobs are not yet minted (native={});
the gated mint workflow populates them and is the vehicle
for validating CI/local committed-bundle consistency.
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Diagnostics may emit their series in an implementation-defined order that differs across platforms (e.g. macOS vs the Linux CI runner). The regression baseline comparator walks JSON arrays positionally, so an unstable order made a committed bundle minted on one platform falsely fail the gate on another, even when every series value agreed within tolerance. Sort the series by their dimensions (with index_name as a tie-breaker) at the single serialisation point so the order is canonical everywhere. This makes local and CI mints interchangeable and keeps series.json diffs to value changes only.
ILAMB's build_execution_result re-reads the per-execution scalar CSVs (via
_load_csv_and_merge) and the netCDF time-trace files to reconstruct its
metrics and series, but the CMEC output bundle only declared the plots and
HTML. So those data files were never persisted with the results and were
absent from the regression native baseline, leaving the committed bundle
impossible to replay ("No objects to concatenate").
Register the *.csv and *.nc files in the output bundle's data section so the
curated capture persists them and replay can reconstruct the execution. This
is also a real persistence fix: the scientific scalar data was not being saved
from production runs either.
The committed catalog.yaml _metadata.hash for gpp-fluxnet2015, lai-avh15c1 and mrsos-wangmao no longer matched the catalog_hash recorded in each manifest.json, so the PR coupling gate failed with "input catalog.yaml does not match manifest.json catalog_hash". datasets.hash is not reproducible across pandas versions: the latest mint recomputed a different hash into the manifest, while the committed catalog kept the developer-authored value. Re-point each catalog to its manifest's hash, the value tied to the baseline that was actually minted.
The mint commit step staged manifest.json and regression/** but not catalog.yaml. datasets.hash (catalog.yaml's _metadata.hash, recorded as the manifest's catalog_hash) is not reproducible across pandas versions, so the runner regenerated catalog.yaml with a new hash and wrote it into the committed manifest, but discarded the regenerated catalog. The committed manifest and catalog then disagreed, breaking the PR coupling gate. Stage catalog.yaml alongside the manifest so they cannot drift apart on future mints.
Switch the mint and nightly-drift workflows from ubuntu-latest to the self-hosted arc-climate-ref runners, which mount the persistent dataset, software and intake-esgf caches. Set TQDM_DISABLE so the cache-warming downloads do not flood the logs. The drift job runs only on schedule/workflow_dispatch and stays guarded by the repository check, so moving it to self-hosted infra does not expose the runners to untrusted fork pull-request code.
The PR-tier regression gate replays only the cases a pull request touches,
and replay rebuilds the committed bundle from the native output blobs plus
the committed catalog/manifest -- `build_execution_result` never reads the
input datasets. The gate was nonetheless fetching the full CMIP sample data
and every provider's ESGF inputs before replaying, which dominated the job
runtime (tens of minutes) for no benefit.
Remove the `fetch-sample-data` and `test-cases fetch` block from the gate so
it only pays for the small native blobs of the cases it actually replays.
Also make the gate script runnable locally: it emits GitHub Actions log groups
and annotations only under Actions, prints plain output otherwise, and accepts
an optional base ref (defaulting to origin/${GITHUB_BASE_REF:-main}). A
`make regression-gate` target is the convenient entry point.
Update the regression-baselines reference and the testing-diagnostics how-to to
document the no-input-data behaviour and the local entry point.
…-baselines * origin/main: chore: label change as breaking fix(core): make dataset hash deterministic across pandas versions docs(changelog): name the fragment with the PR number fix: mark execution failed when result ingestion fails docs(changelog): name the fragment with the PR number fix: compare dataset versions numerically when selecting latest chore(deps): bump tornado from 6.5.6 to 6.5.7
… merge PR #741 (merged into main) changed datasets.hash to a pandas-version-independent algorithm. Recompute the catalog_hash for the gpp-fluxnet2015, lai-avh15c1 and mrsos-wangmao cmip6 baselines with the new algorithm so each catalog.yaml _metadata.hash matches its manifest.json catalog_hash and the recomputed value, keeping the coupling gate green now that the algorithm has changed.
…to feat/regression-ilamb-baselines * origin/ci/faster-regression-pr-gate: docs(changelog): add fragment for PR #742 ci(regression): speed up the PR gate by dropping the unused input fetch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Begins the ILAMB regression-baseline migration to the per-package (Framework B) layout — RFC 0005 PR-6a, the first of the provider migrations (ilamb → pmp → esmvaltool) following the merged baseline machinery (#724, #727, #732, #733).
This first batch adds committed baselines for three cmip6 ILAMB test cases —
mrsos-wangmao,gpp-fluxnet2015,lai-avh15c1— generated locally viaref test-cases run --force-regen. Native blobs are not yet minted (native={}); the gatedregression-mintworkflow populates them against R2 and is the vehicle for validating CI/local committed-bundle consistency (macOS vs Linux, gate tolerancertol=1e-6).Notes:
cmip6+cmip7); this PR coverscmip6only. Thecmip7cases depend on in-flight CMIP7 compatibility work (Make ECS diagnostic compatible with CMIP7 data #671, Make TCR diagnostic compatible with CMIP7 data #686, Make TCRE diagnostic compatible with CMIP7 data #702, Make ozone diagnostic compatible with CMIP7 data #704) and will follow.tests/test-data/regression/ilamb) is left in place and removed in the teardown PR once all ILAMB cases are migrated.Checklist
Please confirm that this pull request has done the following:
test_validate_test_case_regression(previously skipped for lack of data)changelog/