Skip to content

feat(waterdata): get_queryables + queryables monitor + passthrough enablement#333

Draft
thodson-usgs wants to merge 2 commits into
DOI-USGS:mainfrom
thodson-usgs:feat/waterdata-queryables
Draft

feat(waterdata): get_queryables + queryables monitor + passthrough enablement#333
thodson-usgs wants to merge 2 commits into
DOI-USGS:mainfrom
thodson-usgs:feat/waterdata-queryables

Conversation

@thodson-usgs

Copy link
Copy Markdown
Collaborator

Draft — held until the upstream API is ready. Several of the newly
enabled queryables are accepted by the service but don't yet round-trip in
the data (e.g. filtering get_daily(state_name="Hawaii") returns rows with no
state_name column, and not all attributes filter reliably yet). The
get_queryables helper and the monitoring test are ready now; the
queryable-enablement waits on the upstream data side.

Summary

Three related changes around the Water Data OGC API's queryables (the
properties each collection can be filtered on):

  1. waterdata.get_queryables(collection) — returns a collection's queryable
    properties as a tidy (DataFrame, BaseMetadata), one row per property with
    its type, title, and description. Lets callers discover the available
    filters programmatically.

  2. A live monitoring testtests/waterdata_queryables_test.py compares
    each collection's advertised queryables against a committed snapshot
    (tests/data/waterdata_queryables.json, 489 properties across 11
    collections). It fails when the upstream API adds / removes / renames a
    queryable — the signal to regenerate the snapshot and enable anything new.

  3. Passthrough enablement — the OGC data getters exposed ~11 of each
    collection's ~50 queryables as named params; the rest (mostly the shared
    monitoring-location attributes — state_name, county_code, site_type,
    altitude, …, now filterable on the data endpoints) were reachable only via
    the raw filter CQL. Each OGC getter now accepts **queryables, so any
    queryable can be passed as a filter:

    # filter daily discharge by a monitoring-location attribute
    df, md = waterdata.get_daily(parameter_code="00060", state_name="Wisconsin")

How the passthrough works

get_daily, get_continuous, get_latest_continuous, get_latest_daily,
get_field_measurements, get_field_measurements_metadata, get_peaks,
get_channel, get_monitoring_locations, get_time_series_metadata, and
get_combined_metadata each gain **queryables. The shared
waterdata.utils._get_args flattens that kwargs dict into the request args, so a
passthrough filter is normalized (iterables → comma-joined, etc.) and sent
exactly like a named param. get_cql (the raw-CQL escape hatch) is intentionally
excluded.

No client-side queryable list is bundled: the service validates names itself —
an unknown queryable returns HTTP 400, surfaced as the typed
DataRetrievalError. (The committed snapshot is used only by the monitoring
test, not for runtime validation, so it can't drift the package.)

Provisional — passthrough now, explicit named params later?

This PR uses a passthrough (**queryables). That decision is deliberate but
not final:

Why passthrough now

  • Compact: avoids adding ~40 near-identical params to each of 11 getters (a
    ~400-param explosion of mostly-shared location attributes).
  • Auto-tracks the API: when upstream adds a queryable, the monitoring test flags
    it and it's already usable — no per-getter code change to expose it.
  • DRY: one _get_args change enables every getter uniformly.

Why we may switch to explicit named params

  • Discoverability: explicit params show up in IDE autocomplete and
    help(get_daily); **queryables hides them.
  • Per-param docs & types: each queryable could carry its own description and
    type hint instead of one generic note.
  • Typo safety: a misspelled explicit param is a TypeError at the call site;
    a misspelled passthrough queryable is only caught at runtime as an HTTP 400.
  • Self-documenting surface: the signature would state exactly what each
    collection supports rather than "anything the service accepts."

The natural future step is to generate explicit params (with docstrings)
from the queryables snapshot, getting discoverability without hand-maintaining
~400 params. Until then, the passthrough unblocks the capability with minimal
surface area.

Verification

  • tests/waterdata_queryables_test.py — offline get_queryables parsing /
    error tests, offline passthrough tests (the filter reaches the /items
    request, lists comma-joined), and the 11-collection live monitor. All pass.
  • ruff check / ruff format / mypy --strict clean across the package.
  • Live sanity: a normal get_daily is unchanged by the **queryables addition;
    a passthrough state_name= filter is accepted by the service (no 400).

Before merge (once upstream is ready)

  • Regenerate the snapshot if queryables changed; confirm the held queryables now
    round-trip in the data.
  • Add a NEWS.md entry.
  • Decide passthrough vs. generated explicit params per the discussion above.

🤖 Generated with Claude Code

thodson-usgs and others added 2 commits June 24, 2026 11:15
Add `waterdata.get_queryables(collection)`, returning the OGC queryable
properties of a Water Data collection (`daily`, `continuous`,
`monitoring-locations`, ...) as a tidy `(DataFrame, BaseMetadata)` — one row per
filterable property with its type, title, and description.

Add `tests/waterdata_queryables_test.py`: offline parsing / error tests plus a
live monitor that compares each collection's advertised queryables against a
committed snapshot (`tests/data/waterdata_queryables.json`). The monitor fails
when the upstream API adds / removes / renames a queryable — the signal to
regenerate the snapshot and enable any new queryables on the matching getter.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
The OGC data getters (`get_daily`, `get_continuous`, `get_peaks`, ...) exposed
~11 of each collection's ~50 queryables as named params; the rest — mostly the
shared monitoring-location attributes (`state_name`, `county_code`, `site_type`,
`altitude`, ...) now filterable on the data endpoints — were reachable only via
the raw `filter` CQL.

Accept any queryable as a passthrough kwarg: each OGC getter gains
`**queryables`, and the shared `_get_args` flattens it so an extra filter such
as `state_name="Wisconsin"` is normalized and sent exactly like a named param.
The service itself validates names (an unknown one returns HTTP 400 → typed
error), so no client-side queryable list is bundled.

The passthrough is provisional (see the PR description for the trade-off vs.
explicit per-property keyword arguments).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant