Skip to content

Migrate ILAMB regression baselines to Framework B (cmip6 batch)#738

Merged
lewisjared merged 24 commits into
mainfrom
feat/regression-ilamb-baselines
Jun 18, 2026
Merged

Migrate ILAMB regression baselines to Framework B (cmip6 batch)#738
lewisjared merged 24 commits into
mainfrom
feat/regression-ilamb-baselines

Conversation

@lewisjared

Copy link
Copy Markdown
Contributor

Description

Begins the ILAMB regression-baseline migration to the per-package (Framework B) layout — RFC 0005 PR-6a, the first of the provider migrations (ilamb → pmp → esmvaltool) following the merged baseline machinery (#724, #727, #732, #733).

This first batch adds committed baselines for three cmip6 ILAMB test cases — mrsos-wangmao, gpp-fluxnet2015, lai-avh15c1 — generated locally via ref test-cases run --force-regen. Native blobs are not yet minted (native={}); the gated regression-mint workflow populates them against R2 and is the vehicle for validating CI/local committed-bundle consistency (macOS vs Linux, gate tolerance rtol=1e-6).

Notes:

Checklist

Please confirm that this pull request has done the following:

  • Tests added — N/A; data-only, consumed by the existing parametrized test_validate_test_case_regression (previously skipped for lack of data)
  • Documentation added (where applicable)
  • Changelog item added to changelog/

First batch of Framework-B committed baselines for ILAMB,
covering the mrsos-wangmao, gpp-fluxnet2015 and lai-avh15c1 cmip6 test cases.
Native blobs are not yet minted (native={});
the gated mint workflow populates them and is the vehicle
for validating CI/local committed-bundle consistency.
@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 05:05 — with GitHub Actions Inactive
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
core 92.56% <100.00%> (+<0.01%) ⬆️
providers 91.82% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...-core/src/climate_ref_core/metric_values/typing.py 93.84% <100.00%> (+0.09%) ⬆️
...limate-ref-ilamb/src/climate_ref_ilamb/standard.py 85.49% <100.00%> (+0.29%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 06:34 — with GitHub Actions Inactive
github-actions Bot and others added 3 commits June 18, 2026 06:40
Diagnostics may emit their series in an implementation-defined order that
differs across platforms (e.g. macOS vs the Linux CI runner). The regression
baseline comparator walks JSON arrays positionally, so an unstable order made
a committed bundle minted on one platform falsely fail the gate on another,
even when every series value agreed within tolerance.

Sort the series by their dimensions (with index_name as a tie-breaker) at the
single serialisation point so the order is canonical everywhere. This makes
local and CI mints interchangeable and keeps series.json diffs to value
changes only.
@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 07:06 — with GitHub Actions Inactive
@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 08:00 — with GitHub Actions Inactive
@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 08:04 — with GitHub Actions Inactive
github-actions Bot and others added 2 commits June 18, 2026 08:08
ILAMB's build_execution_result re-reads the per-execution scalar CSVs (via
_load_csv_and_merge) and the netCDF time-trace files to reconstruct its
metrics and series, but the CMEC output bundle only declared the plots and
HTML. So those data files were never persisted with the results and were
absent from the regression native baseline, leaving the committed bundle
impossible to replay ("No objects to concatenate").

Register the *.csv and *.nc files in the output bundle's data section so the
curated capture persists them and replay can reconstruct the execution. This
is also a real persistence fix: the scientific scalar data was not being saved
from production runs either.
@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 08:28 — with GitHub Actions Inactive
@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 08:32 — with GitHub Actions Inactive
@lewisjared lewisjared temporarily deployed to native-baselines June 18, 2026 08:36 — with GitHub Actions Inactive
github-actions Bot and others added 5 commits June 18, 2026 08:40
The committed catalog.yaml _metadata.hash for gpp-fluxnet2015, lai-avh15c1 and mrsos-wangmao
no longer matched the catalog_hash recorded in each manifest.json,
so the PR coupling gate failed with "input catalog.yaml does not match manifest.json catalog_hash".

datasets.hash is not reproducible across pandas versions:
the latest mint recomputed a different hash into the manifest,
while the committed catalog kept the developer-authored value.
Re-point each catalog to its manifest's hash, the value tied to the baseline that was actually minted.
The mint commit step staged manifest.json and regression/** but not catalog.yaml.
datasets.hash (catalog.yaml's _metadata.hash, recorded as the manifest's catalog_hash)
is not reproducible across pandas versions,
so the runner regenerated catalog.yaml with a new hash and wrote it into the committed manifest,
but discarded the regenerated catalog.
The committed manifest and catalog then disagreed, breaking the PR coupling gate.
Stage catalog.yaml alongside the manifest so they cannot drift apart on future mints.
Switch the mint and nightly-drift workflows from ubuntu-latest
to the self-hosted arc-climate-ref runners,
which mount the persistent dataset, software and intake-esgf caches.
Set TQDM_DISABLE so the cache-warming downloads do not flood the logs.

The drift job runs only on schedule/workflow_dispatch and stays guarded by the
repository check, so moving it to self-hosted infra does not expose the runners
to untrusted fork pull-request code.
The PR-tier regression gate replays only the cases a pull request touches,
and replay rebuilds the committed bundle from the native output blobs plus
the committed catalog/manifest -- `build_execution_result` never reads the
input datasets. The gate was nonetheless fetching the full CMIP sample data
and every provider's ESGF inputs before replaying, which dominated the job
runtime (tens of minutes) for no benefit.

Remove the `fetch-sample-data` and `test-cases fetch` block from the gate so
it only pays for the small native blobs of the cases it actually replays.

Also make the gate script runnable locally: it emits GitHub Actions log groups
and annotations only under Actions, prints plain output otherwise, and accepts
an optional base ref (defaulting to origin/${GITHUB_BASE_REF:-main}). A
`make regression-gate` target is the convenient entry point.

Update the regression-baselines reference and the testing-diagnostics how-to to
document the no-input-data behaviour and the local entry point.
…-baselines

* origin/main:
  chore: label change as breaking
  fix(core): make dataset hash deterministic across pandas versions
  docs(changelog): name the fragment with the PR number
  fix: mark execution failed when result ingestion fails
  docs(changelog): name the fragment with the PR number
  fix: compare dataset versions numerically when selecting latest
  chore(deps): bump tornado from 6.5.6 to 6.5.7
… merge

PR #741 (merged into main) changed datasets.hash to a pandas-version-independent
algorithm. Recompute the catalog_hash for the gpp-fluxnet2015, lai-avh15c1 and
mrsos-wangmao cmip6 baselines with the new algorithm so each catalog.yaml
_metadata.hash matches its manifest.json catalog_hash and the recomputed value,
keeping the coupling gate green now that the algorithm has changed.
…to feat/regression-ilamb-baselines

* origin/ci/faster-regression-pr-gate:
  docs(changelog): add fragment for PR #742
  ci(regression): speed up the PR gate by dropping the unused input fetch
@lewisjared lewisjared merged commit 608a56b into main Jun 18, 2026
27 checks passed
@lewisjared lewisjared deleted the feat/regression-ilamb-baselines branch June 18, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant