Skip to content

Bug fixes and enhancements#139

Merged
HadenSmith merged 9 commits into
mainfrom
bug-fixes-and-enhancements
Jun 17, 2026
Merged

Bug fixes and enhancements#139
HadenSmith merged 9 commits into
mainfrom
bug-fixes-and-enhancements

Conversation

@HadenSmith

Copy link
Copy Markdown
Collaborator

Summary

This PR adds new statistical functionality, improves numerical robustness, and tightens CI/test reliability across Numerics.

Changes

  • Added Fisher-Z and Yeo-Johnson link functions, including factory/controller registration and unit tests.
  • Added and refactored pivotal bootstrap support, including fit/context/diagnostic helpers, invalid-draw handling, XML documentation, and expanded tests.
  • Improved distribution and sampling robustness by using double.NegativeInfinity instead of double.MinValue sentinel values in likelihood-related paths, with added edge-case coverage.
  • Fixed time-series behavior around KNN resampling, NaN propagation in smoothing/block aggregation, culture-invariant parsing, transient download failures, and offset-bearing timestamps.
  • Updated JOSS bibliography DOI metadata.
  • Updated the snapshot workflow so it does not rerun the full test suite, and resolved MSTest analyzer warnings in the test project.

Validation

  • Added and updated unit tests for the new link functions, pivotal bootstrap, MCMC recomputation, time-series resampling, download reliability, and numerical edge cases.
  • Verified the multi-target test project build:
    • dotnet build Test_Numerics\Test_Numerics.csproj -c Release --no-restore
    • Result: 0 Warning(s), 0 Error(s)

…l-data paths. Persistence paths that relied on CurrentCulture silently corrupted numeric data on non-US locales: a USGS daily value of "1234.56" parsed by double.TryParse on de-DE returned 123456.0 (treating "." as a group

separator) rather than failing. USGS, GHCN, BOM, and the embedded Sobol resource always emit US-format numbers, so any non-US-locale user importing them today silently received NaN or wrong values.

- TimeSeriesDownload.cs: add NumberStyles + InvariantCulture / DateTimeStyles to USGS daily, USGS peak, GHCN, and BOM parsing
- TimeSeries.cs: fallback DateTime.TryParse in the XElement constructor now uses InvariantCulture (matches the "o" round-trip primary path)
- SobolSequence.cs, LinkController.cs: int.TryParse now uses InvariantCulture for embedded-resource and XML-attribute reads
- MCMCDiagnostics.cs: replace the wasteful decimal.Parse(N.ToString()) / 100M round-trip with direct double math, eliminating both the string conversion and a latent CurrentCulture dependency

Display code (Summary tables, ParameterToString, DisplayLabel, exception messages) intentionally continues to use CurrentCulture so users see numbers in their locale's format. XML/JSON serialization layers were already correct and unchanged.

Added a new method in MCMCResults so that parameter results can be recomputed if the credible interval width changes.
…silently treating it as 0. A user reported that running peaks-over-threshold analysis in RMC-BestFit on GHCN precipitation data with missing days produced wrong results when smoothing was set to a multi-day moving sum: missing days were silently treated as zero

precipitation, producing spurious sub-threshold exceedances and biased annual sums. The defect spanned MovingSum, MovingAverage, and the Sum/Average paths in CalendarYearSeries and CustomYearSeries, all of which masked NaN with 0 in the running aggregation. The defect was inconsistent within the library too: MonthlySeries and QuarterlySeries already propagated NaN naively, so the same input produced different answers depending on which block method was called.

This change adopts the convention used by every major data-analysis package
(pandas Series.rolling min_periods, bottleneck move_sum min_count, xarray
rolling, MATLAB movsum/movmean nanflag, R zoo rollsum/rollmean fallback): the
default is strict NaN propagation — any NaN in the window or block produces
NaN out. MovingSum and MovingAverage gain an optional minValidCount parameter
(default = period) that maps directly to pandas's min_periods, letting power
users opt into skip-NaN aggregation; in that mode MovingAverage divides by the
count of observed values rather than the window size, fixing the prior
low-bias. Block methods (CalendarYearSeries, CustomYearSeries, MonthlySeries,
QuarterlySeries) now consistently propagate NaN for Sum and Average, and guard
against empty blocks so series with sparse months no longer throw on
.Last(). Difference is unchanged — IEEE-754 subtraction already propagates
NaN. Adds NaN-coverage tests for moving-window, block, and POT pipelines that
were previously absent.
…x offset-bearing timestamp parsing, and stop silently masking BOM API drift. A user reported that BOM integration tests were "all passing" only because BomAvailable() was returning false on a single transient KiWIS 500/DatasourceError, which the test-side _serviceAvailability cache then memoized as "down" for the rest of the run, silently short-circuiting every BOM test. The same hazards lived in CHMN, USGS, and GHCN — none had retry on transient 5xx, all four ran every download through an IsConnectedToInternet() probe that hit USGS specifically (so a USGS slowdown looked like "no internet" to BOM/CHMN/GHCN callers), and the connectivity probe didn't even check IsSuccessStatusCode. Two correctness bugs were also lurking: BOM and CHMN real-time timestamps include explicit offsets like "+10:00" or "-08:00", and DateTime.TryParse with DateTimeStyles.None converts those to local time per Microsoft's documented behavior, shifting daily series by a calendar day on any non-AEST/non-PST machine; and the USGS peak-data parser checked segments.Count() >= 5 before indexing segments[6] for PeakStage, which would IndexOutOfRange on short historical rows. This change adds a single GetWithRetryAsync helper (three attempts, 0.5s/1s/2s backoff, retries 5xx/408/429/network errors and KiWIS DatasourceError bodies) used by all four providers, switches the global connectivity check to https://www.google.com/generate_204 with a success-status check, replaces DateTime.TryParse with DateTimeOffset.TryParse for BOM and CHMN real-time timestamps, deletes the BOM silent ts_id fallback that was quietly returning hourly data when the caller asked for daily, fixes the USGS peak bounds check, URL-encodes the spaces in the CHMN real-time URL, and gives the test-side AvailableAsync cache a 30-second TTL on negative results so a single transient flake no longer disables a provider's whole test suite.
…function calls and likelihod calls in the base classes. Improving tests to catch the sentinal returns. Improving NUTS sampler to handle double.NegativeInfinity in the gradient.
…ootstrap method, the pivotal bootstrap. Added tests for each.
…ap methods. Refactored pivotal bootstrap classes to align with existing architecture.
@HadenSmith HadenSmith merged commit 1873f8f into main Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant