Skip to content

feat(wateruse): add water-use module for the NWDC API#328

Merged
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:feat/wateruse
Jun 24, 2026
Merged

feat(wateruse): add water-use module for the NWDC API#328
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:feat/wateruse

Conversation

@thodson-usgs

@thodson-usgs thodson-usgs commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a dataretrieval.wateruse module for retrieving USGS National Water
Availability Assessment Data Companion (NWDC)
water-use estimates from
https://api.water.usgs.gov/nwaa-data/data. Estimates are modeled on a HUC12
grid and queryable by county, state, or hydrologic unit. This is the modern
replacement for the defunct legacy NWIS water-use service, so
nwis.get_water_use now points callers here.

It covers the same data as the R
dataRetrieval::read_waterdata_use_data
getter, but is written to the Python package's conventions rather than ported
from the R structure.

from dataretrieval import wateruse

df, md = wateruse.get_wateruse(
    model="wu-public-supply-wd",
    variable=["pswdtot", "pswdgw", "pswdsw"],
    state="RI",
    start_date="2020-01",
    time_resolution="monthly",
)

Design notes

The NWDC is a plain CSV REST service, not an OGC API Features collection
it has no /collections or /conformance, and its error envelope is
{"detail": ...} rather than the OGC engine's {code, description}. So it does
not use the high-level OGC path (get_ogc_data, the CQL2 byte-chunker, the
GeoJSON pager). It does reuse the engine's generic transport plumbing,
supplying only NWDC-specific strategies, and stays consistent with the package
where the shared pieces fit:

  • Returns the conventional (DataFrame, BaseMetadata) tuple.
  • Reuses utils._default_headers(), so the documented API_USGS_PAT token
    raises the NWDC rate limit just as it does for the OGC getters.
  • Raises through the shared typed DataRetrievalError taxonomy (via
    utils._raise_for_status with an injected detail extractor), surfacing the
    NWDC detail (e.g. "Invalid model name: ...") in the message.
  • Locations are idiomatic state / county / huc selectors (mirroring
    ngwmn / waterdata), each accepting a single value or a list. Since NWDC
    takes one location per request, a multi-value selector fans out — one
    request per location, run concurrently over a shared client.
  • Date / resolution params are idiomatic snake_case (start_date, end_date,
    time_resolution), mapped to the NWDC wire names internally.
  • Multi-valued variable is comma-joined into a single GET.
  • Pagination is real and handled transparently. Large areas paginate with
    an RFC 8288 Link: <...>; rel="next" header (a huc2 → 7 pages, a populous
    state → 4; small queries → a single page). wateruse drives the engine's
    generic _paginate with NWDC parse / cursor / error strategies and
    concatenates the pages.
  • huc12_id is parsed as a string so leading zeros survive.

Engine refactor

Building wateruse surfaced that it could reuse the OGC engine's transport
instead of re-implementing it — and extracting the reusable seams also
de-duplicated the engine itself. Net source ≈ −66 LOC, behavior-preserving:

  • planning._merge_response — one low-level "fold N responses into one"
    behind both pagination (_paginate) and the chunked / fan-out aggregation
    (_combine_chunk_responses), replacing two near-duplicate implementations.
  • utils.Ambient[T] — a small generic ContextVar-with-scope class that
    collapses each per-call ambient (_row_cap, _ogc_base_url, _dialect, the
    chunker's _chunked_client) from a var + hand-written @contextmanager
    setter pair into a single declaration.
  • Rate-limit correctness fix: x-ratelimit-remaining now reports the
    lowest value any concurrent sub-request saw (the quota actually left after
    a fan-out) via a shared _lowest_remaining, instead of the last-by-index —
    fixing a latent inaccuracy in the OGC chunker too.

What's included

  • dataretrieval/wateruse.py, wired into dataretrieval/__init__.py.
  • The engine refactor across ogc/{engine,planning,chunking}.py, utils.py,
    and waterdata/utils.py.
  • tests/wateruse_test.py — offline pytest-httpx coverage: single-page parse,
    string huc12_id, comma-joined variables, dropped-None params, snake_case →
    wire-name mapping, Link-header pagination, bare-host
    normalization, shared-header reuse, state/county/huc selectors + fan-out, and
    typed-error / detail handling; plus updates to tests/waterdata_* for the
    engine changes.
  • docs/source/reference/wateruse.rst + toctree entry.
  • README.md usage example and "Available Data Services" entry.
  • demos/USGS_WaterUse_Examples.ipynb — a motivating walkthrough (where
    Wisconsin's public water supply comes from, and its summer demand peak).

Verification

  • Offline suites pass — wateruse plus the OGC engine / chunking / utils suites
    the refactor touches; ruff check / ruff format / mypy --strict clean.
  • Smoke-tested against the live API: single- and multi-page queries, monthly
    and annual resolutions, paginated results byte-identical to the unpaginated
    equivalent, concurrent fan-out over multiple states, and the lowest-remaining
    rate-limit header confirmed.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs changed the title feat(wateruse): add water-use module wrapping the NWDC API feat(wateruse): add water-use module for the NWDC API Jun 22, 2026
@thodson-usgs thodson-usgs force-pushed the feat/wateruse branch 4 times, most recently from 19105d8 to 0f20ada Compare June 24, 2026 21:00
Add `dataretrieval.wateruse` for USGS National Water Availability Assessment
Data Companion (NWDC) water-use estimates — modeled on a HUC12 grid and
queryable by state, county, or hydrologic unit. This is the modern replacement
for the defunct legacy NWIS water-use service (`nwis.get_water_use` now points
callers here).

    from dataretrieval import wateruse

    df, md = wateruse.get_wateruse(
        model="wu-public-supply-wd",
        variable=["pswdtot", "pswdgw", "pswdsw"],
        state="RI",
        start_date="2020-01",
        time_resolution="monthly",
    )

The NWDC is a plain CSV REST service, not an OGC API Features collection, so the
module supplies the NWDC-specific pieces (CSV parsing, the RFC 8288 Link-header
pagination cursor, the `{detail}` error envelope, and state/county/huc location
builders) but reuses the OGC engine's generic transport rather than
re-implementing it: the shared pager (`_paginate`), the Jupyter-safe anyio sync
bridge (`_run_sync`), response/frame aggregation, and `_default_headers`. It
keeps the package conventions where they fit — a `(DataFrame, BaseMetadata)`
return, the typed `DataRetrievalError` taxonomy (surfacing the NWDC `detail`),
`API_USGS_PAT` token support, idiomatic snake_case params, and `state` /
`county` / `huc` selectors that each accept a value or a list (a list fans out
one concurrent request per location). Large areas paginate transparently.

A `FutureWarning` flags the module as experimental, since the NWDC service is
new and still changing.

Extracting the reusable engine seams also de-duplicated the engine itself
(~-66 LOC, behavior-preserving): `planning._merge_response` now backs both
pagination and fan-out aggregation; a generic `utils.Ambient[T]`
contextvar-with-scope helper collapses the per-call ambients; and
`x-ratelimit-remaining` now reports the lowest value any concurrent sub-request
saw (the quota actually left after a fan-out), fixing a latent inaccuracy in the
OGC chunker too.

Includes offline pytest-httpx coverage, a reference page, a README example, and
a demo notebook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
@thodson-usgs thodson-usgs marked this pull request as ready for review June 24, 2026 21:10
@thodson-usgs thodson-usgs merged commit 4daf771 into DOI-USGS:main Jun 24, 2026
9 checks passed
@thodson-usgs thodson-usgs deleted the feat/wateruse branch June 24, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant