Migrate ILAMB regression baselines to Framework B (cmip6 batch) by lewisjared · Pull Request #738 · Climate-REF/climate-ref

lewisjared · 2026-06-18T05:03:46Z

Description

Begins the ILAMB regression-baseline migration to the per-package (Framework B) layout — RFC 0005 PR-6a, the first of the provider migrations (ilamb → pmp → esmvaltool) following the merged baseline machinery (#724, #727, #732, #733).

This first batch adds committed baselines for three cmip6 ILAMB test cases — mrsos-wangmao, gpp-fluxnet2015, lai-avh15c1 — generated locally via ref test-cases run --force-regen. Native blobs are not yet minted (native={}); the gated regression-mint workflow populates them against R2 and is the vehicle for validating CI/local committed-bundle consistency (macOS vs Linux, gate tolerance rtol=1e-6).

Notes:

ILAMB diagnostics now define two test cases each (cmip6 + cmip7); this PR covers cmip6 only. The cmip7 cases depend on in-flight CMIP7 compatibility work (Make ECS diagnostic compatible with CMIP7 data #671, Make TCR diagnostic compatible with CMIP7 data #686, Make TCRE diagnostic compatible with CMIP7 data #702, Make ozone diagnostic compatible with CMIP7 data #704) and will follow.
The deprecated Framework-A central tree (tests/test-data/regression/ilamb) is left in place and removed in the teardown PR once all ILAMB cases are migrated.

Checklist

Please confirm that this pull request has done the following:

Tests added — N/A; data-only, consumed by the existing parametrized test_validate_test_case_regression (previously skipped for lack of data)
Documentation added (where applicable)
Changelog item added to changelog/

First batch of Framework-B committed baselines for ILAMB, covering the mrsos-wangmao, gpp-fluxnet2015 and lai-avh15c1 cmip6 test cases. Native blobs are not yet minted (native={}); the gated mint workflow populates them and is the vehicle for validating CI/local committed-bundle consistency.

codecov · 2026-06-18T05:06:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
core	`92.56% <100.00%> (+<0.01%)`	⬆️
providers	`91.82% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...-core/src/climate_ref_core/metric_values/typing.py	`93.84% <100.00%> (+0.09%)`	⬆️
...limate-ref-ilamb/src/climate_ref_ilamb/standard.py	`85.49% <100.00%> (+0.29%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Diagnostics may emit their series in an implementation-defined order that differs across platforms (e.g. macOS vs the Linux CI runner). The regression baseline comparator walks JSON arrays positionally, so an unstable order made a committed bundle minted on one platform falsely fail the gate on another, even when every series value agreed within tolerance. Sort the series by their dimensions (with index_name as a tie-breaker) at the single serialisation point so the order is canonical everywhere. This makes local and CI mints interchangeable and keeps series.json diffs to value changes only.

ILAMB's build_execution_result re-reads the per-execution scalar CSVs (via _load_csv_and_merge) and the netCDF time-trace files to reconstruct its metrics and series, but the CMEC output bundle only declared the plots and HTML. So those data files were never persisted with the results and were absent from the regression native baseline, leaving the committed bundle impossible to replay ("No objects to concatenate"). Register the *.csv and *.nc files in the output bundle's data section so the curated capture persists them and replay can reconstruct the execution. This is also a real persistence fix: the scientific scalar data was not being saved from production runs either.

The committed catalog.yaml _metadata.hash for gpp-fluxnet2015, lai-avh15c1 and mrsos-wangmao no longer matched the catalog_hash recorded in each manifest.json, so the PR coupling gate failed with "input catalog.yaml does not match manifest.json catalog_hash". datasets.hash is not reproducible across pandas versions: the latest mint recomputed a different hash into the manifest, while the committed catalog kept the developer-authored value. Re-point each catalog to its manifest's hash, the value tied to the baseline that was actually minted.

The mint commit step staged manifest.json and regression/** but not catalog.yaml. datasets.hash (catalog.yaml's _metadata.hash, recorded as the manifest's catalog_hash) is not reproducible across pandas versions, so the runner regenerated catalog.yaml with a new hash and wrote it into the committed manifest, but discarded the regenerated catalog. The committed manifest and catalog then disagreed, breaking the PR coupling gate. Stage catalog.yaml alongside the manifest so they cannot drift apart on future mints.

Switch the mint and nightly-drift workflows from ubuntu-latest to the self-hosted arc-climate-ref runners, which mount the persistent dataset, software and intake-esgf caches. Set TQDM_DISABLE so the cache-warming downloads do not flood the logs. The drift job runs only on schedule/workflow_dispatch and stays guarded by the repository check, so moving it to self-hosted infra does not expose the runners to untrusted fork pull-request code.

The PR-tier regression gate replays only the cases a pull request touches, and replay rebuilds the committed bundle from the native output blobs plus the committed catalog/manifest -- `build_execution_result` never reads the input datasets. The gate was nonetheless fetching the full CMIP sample data and every provider's ESGF inputs before replaying, which dominated the job runtime (tens of minutes) for no benefit. Remove the `fetch-sample-data` and `test-cases fetch` block from the gate so it only pays for the small native blobs of the cases it actually replays. Also make the gate script runnable locally: it emits GitHub Actions log groups and annotations only under Actions, prints plain output otherwise, and accepts an optional base ref (defaulting to origin/${GITHUB_BASE_REF:-main}). A `make regression-gate` target is the convenient entry point. Update the regression-baselines reference and the testing-diagnostics how-to to document the no-input-data behaviour and the local entry point.

…-baselines * origin/main: chore: label change as breaking fix(core): make dataset hash deterministic across pandas versions docs(changelog): name the fragment with the PR number fix: mark execution failed when result ingestion fails docs(changelog): name the fragment with the PR number fix: compare dataset versions numerically when selecting latest chore(deps): bump tornado from 6.5.6 to 6.5.7

… merge PR #741 (merged into main) changed datasets.hash to a pandas-version-independent algorithm. Recompute the catalog_hash for the gpp-fluxnet2015, lai-avh15c1 and mrsos-wangmao cmip6 baselines with the new algorithm so each catalog.yaml _metadata.hash matches its manifest.json catalog_hash and the recomputed value, keeping the coupling gate green now that the algorithm has changed.

…to feat/regression-ilamb-baselines * origin/ci/faster-regression-pr-gate: docs(changelog): add fragment for PR #742 ci(regression): speed up the PR gate by dropping the unused input fetch

lewisjared added 2 commits June 18, 2026 15:03

docs(changelog): add fragment for ilamb cmip6 baselines

2b5bdd8

lewisjared temporarily deployed to native-baselines June 18, 2026 05:05 — with GitHub Actions Inactive

lewisjared added 2 commits June 18, 2026 15:22

test(ilamb): skip offline validate when native lives in the store

915a328

ci(regression): scope mint fetch to the dispatched diagnostic/test case

fa0cf92

lewisjared had a problem deploying to native-baselines June 18, 2026 05:52 — with GitHub Actions Failure

ci(regression): provision provider reference data before minting

8d90c20

lewisjared temporarily deployed to native-baselines June 18, 2026 06:34 — with GitHub Actions Inactive

github-actions Bot and others added 3 commits June 18, 2026 06:40

chore(regression): mint native baselines for ilamb

7e4ff0c

docs(changelog): add fragment for deterministic series ordering

eca9033

lewisjared temporarily deployed to native-baselines June 18, 2026 07:06 — with GitHub Actions Inactive

github-actions Bot and others added 2 commits June 18, 2026 07:53

chore(regression): mint native baselines for ilamb

c2aa0b0

ci(regression): rebase before pushing the mint commit-back

41b7c1d

lewisjared temporarily deployed to native-baselines June 18, 2026 08:00 — with GitHub Actions Inactive

chore(regression): mint native baselines for ilamb

efd5ce5

lewisjared temporarily deployed to native-baselines June 18, 2026 08:04 — with GitHub Actions Inactive

github-actions Bot and others added 2 commits June 18, 2026 08:08

chore(regression): mint native baselines for ilamb

a655ff5

lewisjared temporarily deployed to native-baselines June 18, 2026 08:28 — with GitHub Actions Inactive

chore(regression): mint native baselines for ilamb

cdc66f8

lewisjared temporarily deployed to native-baselines June 18, 2026 08:32 — with GitHub Actions Inactive

chore(regression): mint native baselines for ilamb

337798d

lewisjared temporarily deployed to native-baselines June 18, 2026 08:36 — with GitHub Actions Inactive

github-actions Bot and others added 5 commits June 18, 2026 08:40

chore(regression): mint native baselines for ilamb

e06ea60

lewisjared added 4 commits June 18, 2026 21:29

docs(changelog): add fragment for PR #742

5fe80f5

Merge remote-tracking branch 'origin/ci/faster-regression-pr-gate' in…

975adaa

…to feat/regression-ilamb-baselines * origin/ci/faster-regression-pr-gate: docs(changelog): add fragment for PR #742 ci(regression): speed up the PR gate by dropping the unused input fetch

lewisjared merged commit 608a56b into main Jun 18, 2026
27 checks passed

lewisjared deleted the feat/regression-ilamb-baselines branch June 18, 2026 11:45

lewisjared mentioned this pull request Jun 18, 2026

Add remaining ILAMB cmip6 regression baselines; stabilise output.json and tag reference series #743

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate ILAMB regression baselines to Framework B (cmip6 batch)#738

Migrate ILAMB regression baselines to Framework B (cmip6 batch)#738
lewisjared merged 24 commits into
mainfrom
feat/regression-ilamb-baselines

lewisjared commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lewisjared commented Jun 18, 2026

Description

Checklist

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 18, 2026 •

edited

Loading