From 9e133501ff5cb7d38b8698abea3afc56c534eb84 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Sun, 31 May 2026 22:14:21 +0200
Subject: [PATCH 01/29] docs: cluster-docs tree + freshness pass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Audited every narrative docs page against the current code. The install /
container / testing / API pages were already fresh; the staleness concentrated
in cluster docs and a few content errors. This rework:
**Machine-specific cluster tree.** Cluster guidance was scattered and half
of it invisible (candide lived only inside container.md on a feature branch;
canfar was split across orphaned pages; none of canfar/candide were in the
sidebar). Add a single `clusters.md` under a new "Running on a cluster" toctree
caption: the shared pattern (container = unit of execution, bind-mount, keep
SIFs off a quota-limited $HOME), then per-machine sections for candide (SLURM,
the candide_{smp,mpi}.sh scripts, the quota-safe pull, MPI/PMIx) and CANFAR
(the current canfar_submit_job / canfar_monitor console scripts), with ccin2p3
stubbed. The deep CANFAR production walkthrough stays in pipeline_canfar.md,
linked, and is now in the toctree too.
**Delete obsolete pages.** canfar.md (the old curl-VM submission model,
superseded by canfar_submit_job), pipeline_v2.0.md (personal paths, a missing
script), and work_flow_v2.0.md (an unrealized planning wishlist) — all three
orphaned from the toctree. The v2.0 wishlist is preserved in the team's felt
store rather than lost.
**Fix content errors.**
- dependencies.md: rewritten against pyproject.toml. Reframed around the
abstract-minimums + uv.lock SSOT (was "pinned per release"); ngmix now points
at the aguinot/ngmix@stable_version fork (was esheldon upstream); dropped the
phantom CDSclient; added the missing CANFAR/data stack (vos, skaha, canfar,
cs_util, astroquery, reproject, h5py, numba).
- post_processing.md: dropped the removed rho-statistics step and the dead
prepare_tiles_for_final command; added a legacy banner pointing at sp_validation.
- random_cat.md: legacy banner; fixed module name random_runner -> random_cat_runner.
- pipeline_canfar.md: flagged the matched-star / coverage-mask helpers that
moved to sp_validation (merge_psf_cat.py, download_headers, …).
- basic_execution.md: replaced the conda-era "activate the environment" framing
with the container reality. (MPI sections deferred pending the #737 decision.)
- configuration.md (conifg->config, NUMBERING_LIST->NUMBER_LIST),
contributing.md (Pleas->Please), module_develop.md (src/shapepipe/modules).
Verified with a local sphinx-book-theme build: succeeds; the only new warning
the tree introduced (a clusters.md heading anchor) is fixed. Remaining warnings
are all pre-existing (the autosummary API page needs the installed package;
multiple-toctree notices on every page).
Co-Authored-By: Claude Opus 4.8 (1M context)
---
docs/source/basic_execution.md | 2 +-
docs/source/canfar.md | 94 -----------
docs/source/clusters.md | 102 ++++++++++++
docs/source/configuration.md | 4 +-
docs/source/contributing.md | 2 +-
docs/source/dependencies.md | 65 ++++++--
docs/source/module_develop.md | 2 +-
docs/source/pipeline_canfar.md | 10 ++
docs/source/pipeline_v2.0.md | 292 ---------------------------------
docs/source/post_processing.md | 42 ++---
docs/source/random_cat.md | 11 +-
docs/source/toc.rst | 8 +
docs/source/work_flow_v2.0.md | 32 ----
13 files changed, 207 insertions(+), 459 deletions(-)
delete mode 100644 docs/source/canfar.md
create mode 100644 docs/source/clusters.md
delete mode 100644 docs/source/pipeline_v2.0.md
delete mode 100644 docs/source/work_flow_v2.0.md
diff --git a/docs/source/basic_execution.md b/docs/source/basic_execution.md
index 9e7ca63b4..7bceb6bed 100644
--- a/docs/source/basic_execution.md
+++ b/docs/source/basic_execution.md
@@ -7,7 +7,7 @@ option:
```{seealso}
:class: margin
-The `shapepipe` environment will need to be built and activated in order to run this script (see [Installation](installation.md)).
+ShapePipe runs inside its container — there is no environment to activate. See [Installation](installation.md).
```
```bash
shapepipe_run --help
diff --git a/docs/source/canfar.md b/docs/source/canfar.md
deleted file mode 100644
index 1f490c504..000000000
--- a/docs/source/canfar.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# Running `shapepipe` on the canfar science portal
-
-## Introduction
-
-## Steps from testing to parallel running
-
-Before starting a batch remote session job on a large number of images (step 5.),
-it is recommended to perform some or all of the testing steps (1. - 4.).
-
-
-1. Run the basic `shapepipe` runner script to test (one or several) modules in question, specified by a given config file, on one image.
- This step has to be run in the image run directory. The command is
- ```bash
- shapepipe_run -c config.ini
- ```
-
-2. Run the job script to test the job management, on one image.
- This step has to be run in the image run directory. The command is
- ```bash
- job_sp_canfar -j JOB [OPTIONS]
- ```
-
-3. Run the pipeline script to test the processing step(s), on one image.
- This step has to be run in the patch base directory.
-
- 1. First, run in dry mode:
- ```bash
- init_run_exclusive_canfar.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n
- ```
- 2. Next, perform a real run with
- ```bash
- init_run_exclusive_canfar.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n
- ```
-
-4. Run remote session script to test job submission using docker images, on one image.
- This step has to be run in the patch base directory.
- 1. First, run in dry mode=2, to display curl command, with
- ```bash
- curl_canfar_local.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n 2
- ```
-
- 2. Next, run in dry mode=1, to use curl command without processing:
- ```bash
- curl_canfar_local.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n 1
- ```
- 3. Then, perform a real run, to use curl with processing:
- ```bash
- curl_canfar_local.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp]
- ```
-
-5. Full run: Call remote session script and docker image with collection of images
- ```bash
- curl_canfar_local.sh -j JOB -f path_IDs -p [psfex|mccd] -k [tile|exp]
- ```
- with `path_IDs` being a text file with one image ID per line.
-
-## Monitoring
-
-
-### Status and output of submitted job
-
-Monitoring of the currently active remote session can be performed using the session IDs `session_IDs.txt` written by the
-remote session script `curl_canfar_local.sh`. In the patch main directory, run
-```bash
-curl_canfar_monitor.sh events
-```
-to display the remotely started docker image status, and
-```bash
-curl_canfar_monitor.sh logs
-```
-to print `stdout` of the remotely run pipeline script.
-
-### Number of submitted running jobs
-
-The script
-```bash
-stats_headless_canfar.py
-```
-returns the number of actively running headless jobs.
-
-
-## Post-hoc summary
-
-In the patch main directory, run
-```bash
-summary_run PATCH
-```
-to print a summary with missing image IDs per job and module.
-
-## Deleting jobs
-
-```bash
- for id in `cat session_IDs.txt`; do echo $id; curl -X DELETE -E /arc/home/kilbinger/.ssl/cadcproxy.pem https://ws-uv.canfar.net/skaha/v0/session/$id; done
- ```
diff --git a/docs/source/clusters.md b/docs/source/clusters.md
new file mode 100644
index 000000000..85621ac7c
--- /dev/null
+++ b/docs/source/clusters.md
@@ -0,0 +1,102 @@
+# Running on a Cluster
+
+ShapePipe runs the same way on every cluster: **through the container**. You
+pull the slim `runtime` image once, bind-mount your clone, and run
+`shapepipe_run` (or the CANFAR submission tooling) inside it — there is no
+environment to install or activate on the host. This page covers the shared
+pattern, then the specifics for each supported machine.
+
+For what is *inside* the image and how it is built, see
+[Container Workflow](container.md).
+
+## The pattern
+
+Three things hold on any cluster:
+
+- **The container is the unit of execution.** Pull the runtime image to a SIF
+ (`apptainer pull … docker://ghcr.io/cosmostat/shapepipe:-runtime`) and run
+ the pipeline inside it. Nothing else is installed on the host.
+- **Bind-mount your clone at the same path.** The config files reference their
+ location for the input and output directories; bind-mounting the host clone at
+ the *identical* path inside the container makes those paths resolve the same in
+ and out of the container.
+- **Keep SIFs and Apptainer's scratch off a quota-limited `$HOME`.** A pull under
+ a tight home quota fails with `disk quota exceeded`; point `APPTAINER_CACHEDIR`
+ (and the SIF itself) at a roomy data partition.
+
+MPI jobs add one constraint: the container's MPI must speak the same PMIx wire
+protocol as the host launcher (see [candide](#candide-slurm)).
+
+(candide-slurm)=
+## candide (SLURM)
+
+candide uses **SLURM** (`sbatch`; the old `qsub`/PBS commands are gone). The repo
+ships ready job scripts in `example/pbs/` — `candide_smp.sh` (single node,
+parallelised with joblib) and `candide_mpi.sh` (multi-node, hybrid MPI). To run
+the bundled single-tile example end to end:
+
+```bash
+# Keep the SIF and Apptainer's scratch off the quota-limited $HOME.
+export DATA=/n17data/$USER # adjust to your data partition
+export APPTAINER_CACHEDIR=$DATA/.apptainer
+
+# Pull the runtime image (~850 MB).
+apptainer pull "$DATA/shapepipe-runtime.sif" \
+ docker://ghcr.io/cosmostat/shapepipe:develop-runtime
+
+# Submit. SPDIR is your clone, bind-mounted at the same path inside the
+# container; SP_IMAGE is the SIF. The same script serves the example and a real
+# run — point the config inside it at your own pipeline.
+SP_IMAGE="$DATA/shapepipe-runtime.sif" SPDIR="/path/to/shapepipe" \
+ sbatch example/pbs/candide_smp.sh
+```
+
+Partitions are `comp` (2-day limit) and `compl` (5-day). Both job scripts read
+`SP_IMAGE` (the SIF) and `SPDIR` (the clone) from the environment, so the only
+thing that changes between the example and a real run is the config the script
+points at.
+
+For multi-node MPI, use `candide_mpi.sh`. It `module load`s a host OpenMPI in the
+5.0.x series to match the PMIx that the image's OpenMPI speaks; a mismatched host
+MPI (e.g. OpenMPI 4) silently degrades the job to *N* independent "rank 0 of 1"
+processes instead of one *N*-rank job. The script's header comments carry the
+detail.
+
+Adapting these scripts to another SLURM cluster is mostly the `#SBATCH`
+directives and the `module load` line — the `apptainer exec` invocation carries
+over unchanged.
+
+## CANFAR
+
+CANFAR submission does not go through a batch scheduler. Instead you submit
+container jobs to CANFAR's headless system with the `canfar_submit_job` console
+script (backed by the `canfar` library), and watch them with `canfar_monitor` /
+`canfar_monitor_log`. Pipeline steps are **bit-coded** through `-j` (the same
+scheme as `scripts/sh/job_sp_canfar.bash`), the PSF model is chosen with
+`-p psfex|mccd`, and `-V` selects the image version:
+
+```bash
+# Submit pipeline step(s) for the configured tiles (bit-coded -j).
+canfar_submit_job -j 1 -p psfex -V 1.1
+
+# Monitor sessions/jobs and stream logs.
+canfar_monitor
+canfar_monitor_log
+```
+
+The full production run — input preparation, the per-step `-j` table, and
+post-processing — is documented in the
+[CANFAR production walkthrough](pipeline_canfar.md).
+
+```{note}
+The CANFAR production submission scripts (`scripts/sh/job_sp_canfar*.bash`) still
+run under the pre-container environment and are slated for the same
+container-first cleanup the candide scripts received. Treat the walkthrough as
+the current-but-evolving production procedure.
+```
+
+## ccin2p3
+
+ccin2p3 is not yet containerised. The `example/pbs/cc_{smp,mpi}.sh` scripts target
+the pre-container environment; a container-first path mirroring candide is future
+work.
diff --git a/docs/source/configuration.md b/docs/source/configuration.md
index 3336123c1..772a9558f 100644
--- a/docs/source/configuration.md
+++ b/docs/source/configuration.md
@@ -1,6 +1,6 @@
# Configuration
-The pipeline requires a configuration file (by default called `conifg.ini`)
+The pipeline requires a configuration file (by default called `config.ini`)
in order to be run. Example configuration files are provided in the
[example](https://github.com/CosmoStat/shapepipe/tree/develop/example)
directory.
@@ -114,7 +114,7 @@ INPUT_DIR = last:psfex_runner
OUTPUT_DIR = /home/username/my_output_dir
FILE_PATTERN = psf
FILE_EXT = fits
-NUMBERING_LIST = -001, -002
+NUMBER_LIST = -001, -002
```
ShapePipe will look for the specific files `psf-001.fits` and `psf-002.fits`
diff --git a/docs/source/contributing.md b/docs/source/contributing.md
index 99cf85a09..814bf4852 100644
--- a/docs/source/contributing.md
+++ b/docs/source/contributing.md
@@ -2,7 +2,7 @@
ShapePipe is a fully open-source project and we welcome contributions.
-Pleas read our
+Please read our
[contribution guidelines](https://github.com/CosmoStat/shapepipe/blob/develop/CONTRIBUTING.md).
for details on how to contribute to the development of this package.
diff --git a/docs/source/dependencies.md b/docs/source/dependencies.md
index 64c6db0ea..9378bfe00 100644
--- a/docs/source/dependencies.md
+++ b/docs/source/dependencies.md
@@ -1,32 +1,65 @@
# Dependencies
-All third-party software packages required by ShapePipe are installed
-automatically (see [Installation](installation.md)). Note that all packages
-are pinned to a given version for each release of ShapePipe to maximise the
-reproducibility of the results. Below we list the main packages used.
+ShapePipe's dependencies are installed automatically — the recommended path is
+the [container](container.md), which bundles all of them (see also
+[Installation](installation.md)). `pyproject.toml` is the authoritative list: it
+declares the **abstract minimum versions** ShapePipe is compatible with, while
+`uv.lock` pins the **exact** versions that ship in a given build. The tables
+below describe the main packages; for the complete, current set always refer to
+`pyproject.toml`.
## Python Dependencies
-| Package Name | References |
-|--------------|---------------------------------------------------|
+Scientific stack:
+
+| Package | References |
+|---------|------------|
| [Astropy](https://www.astropy.org/) | {cite:p}`astropy:2013,astropy:2018` |
| [GalSim](https://github.com/GalSim-developers/GalSim) | {cite:p}`rowe:15` |
-| [Joblib](https://joblib.readthedocs.io/en/latest/) | {cite:p}`joblib:20` |
-| [Matplotlib](https://matplotlib.org/) | {cite:p}`hunter:07` |
+| [ngmix](https://github.com/aguinot/ngmix) | {cite:p}`sheldon:15` |
| [MCCD](https://github.com/CosmoStat/mccd) | {cite:p}`liaudat:21` |
| [ModOpt](https://cea-cosmic.github.io/ModOpt/) | {cite:p}`farrens:20` |
-| [mpi4py](https://mpi4py.readthedocs.io/en/stable/) | {cite:p}`dalcin:05,dalcin:08,dalcin:11` |
-| [NGMIX](https://github.com/esheldon/ngmix) | {cite:p}`sheldon:15` |
+| [python-pysap](https://github.com/CEA-COSMIC/pysap) | |
| [Numpy](https://numpy.org/) | {cite:p}`harris:20` |
+| [Numba](https://numba.pydata.org/) | |
| [Pandas](https://pandas.pydata.org/) | {cite:p}`pandas:10,pandas:20` |
+| [Matplotlib](https://matplotlib.org/) | {cite:p}`hunter:07` |
+| [Joblib](https://joblib.readthedocs.io/en/latest/) | {cite:p}`joblib:20` |
+| [mpi4py](https://mpi4py.readthedocs.io/en/stable/) | {cite:p}`dalcin:05,dalcin:08,dalcin:11` |
+| [reproject](https://reproject.readthedocs.io/) | |
+| [h5py](https://www.h5py.org/) | |
| [sf_tools](https://github.com/sfarrens/sf_tools) | |
-| [sqlitedict](https://github.com/RaRe-Technologies/sqlitedict) | |
-## Executable Dependencies
+Data access & infrastructure (CANFAR / UNIONS):
+
+| Package | Purpose |
+|---------|---------|
+| [vos](https://github.com/opencadc/vostools) | CADC / CANFAR VOSpace access |
+| [skaha](https://github.com/shinybrar/skaha) | CANFAR Science Platform sessions |
+| canfar | CANFAR container-job submission |
+| [astroquery](https://astroquery.readthedocs.io/) | external catalogue queries |
+| [cs_util](https://github.com/CosmoStat/cs_util) | shared CosmoStat utilities |
+| [sqlitedict](https://github.com/RaRe-Technologies/sqlitedict) | on-disk pipeline state |
+
+```{note}
+`ngmix` is pinned to the
+[`aguinot/ngmix@stable_version`](https://github.com/aguinot/ngmix/tree/stable_version)
+fork until the fixes land upstream — do not modernise that line (see the note in
+`pyproject.toml`).
+```
-| Package Name | References |
-|---------------|------------|
-| [CDSclient](http://cdsarc.u-strasbg.fr/doc/cdsclient.html) | |
+## System Dependencies
+
+The container also provides the non-Python tools ShapePipe calls, all from Debian
+packages (no source builds), plus the MPI stack:
+
+| Package | References |
+|---------|------------|
+| [Source Extractor](https://www.astromatic.net/software/sextractor/) | {cite:p}`bertin:96` |
| [PSFEx](https://www.astromatic.net/software/psfex/) | {cite:p}`bertin:11` |
-| [SExtractor](https://www.astromatic.net/software/sextractor/) | {cite:p}`bertin:96` |
| [WeightWatcher](https://www.astromatic.net/software/weightwatcher/) | {cite:p}`marmo:08` |
+| OpenMPI (5.0.x) | |
+
+Python dependencies themselves are managed with [uv](https://docs.astral.sh/uv/);
+see [Container Workflow](container.md) for how `pyproject.toml`, `uv.lock`, and
+the `Dockerfile` fit together.
diff --git a/docs/source/module_develop.md b/docs/source/module_develop.md
index 3df8fe88c..b4ef3566f 100644
--- a/docs/source/module_develop.md
+++ b/docs/source/module_develop.md
@@ -7,7 +7,7 @@ This page provides details on how new modules can be implemented in ShapePipe.
Every ShapePipe module requires a *module package*, which is simply a directory
with the module name followed by `_package`, e.g. for a module called
`my_new_module` you would create a new directory called `my_new_module_package`
-and put it in `shapepipe/modules`. Inside this directory you will need to
+and put it in `src/shapepipe/modules`. Inside this directory you will need to
include a Python file (ideally named after your module, e.g.
`my_new_module.py`) and a `__init__.py` file with the following content.
diff --git a/docs/source/pipeline_canfar.md b/docs/source/pipeline_canfar.md
index 39111b05f..7e8bb29c5 100644
--- a/docs/source/pipeline_canfar.md
+++ b/docs/source/pipeline_canfar.md
@@ -457,6 +457,16 @@ ln -s ..//unions_shapepipe_comprehensive_struct_2024_v1.6.c.hdf5 unions_shapepip
calibrate_comprehensive
+```{note}
+The matched-star-catalogue and coverage-mask diagnostics in the next two
+sections have largely moved out of ShapePipe into
+[`sp_validation`](https://github.com/CosmoStat/sp_validation) / `cosmo_val`.
+Several helpers referenced below — `merge_psf_cat.py`, `download_headers`,
+`extract_field_corners`, `build_coverage_map`, `plot_coverage_map` — are no
+longer shipped in this repository; see `sp_validation` for their current
+equivalents. (`get_ccds_with_psf` and `summary_run` are still part of ShapePipe.)
+```
+
### Create matched star catalogue
For diagnostics, a catalogue with multi-epoch shapes measured by ngmix matched with the validation star catalogue is used.
diff --git a/docs/source/pipeline_v2.0.md b/docs/source/pipeline_v2.0.md
deleted file mode 100644
index bdcd07d08..000000000
--- a/docs/source/pipeline_v2.0.md
+++ /dev/null
@@ -1,292 +0,0 @@
-# Running `ShapePipe` processing and post-processing pipelines on CANFAR
-
-Documentation to create ShapePipe output products for catalogues v2.x.
-
-## Initialise directory structure
-
-```bash
-init_run_v2.0.sh
-```
-sets up the directory structure. This will be
-
-v2.0/
-├── tiles/
-│ ├── 301/
-│ │ ├── 301.278/
-│ │ ├── 301.279/
-│ │ └── ...
-├── exp/
-│ ├── 21/
-│ │ ├── 21163916
-│ │ └── ...
-├── cfis -> /arc/home/kilbinger/shapepipe/example/cfis
-├── tile_numbers -> /arc/home/kilbinger/shapepipe/auxdir/CFIS/tiles_202604/tiles_r.txt
-└── debug/
-
-
-### Interactive job from the terminal for a single tile
-
-Run bit-coded jobs
-```bash
-run_job_canfar_v2.0.sh -e ID -j
-```
-with job processing tiles:
-- 1: download tiles
-- 2: uncompress tile weights
-- 4: find exposures
-then exposures:
-- 8: download exposures
-- 16: split exposures into single-CCD HDUs
-- 32: mask exposures
-- 64: process stars (selection, PSF movel)
-then back to tiles:
-- 128: select objects (using external catalogue)
-- 256: create object postage stamps
-
-## CANFAR Setup
-
-### CANFAR Login
-
-Login to the canfar system with
-
-```bash
-canfar auth login
-```
-
-This can be done from at notebook or terminal within the canfar science portal,
-or any remote terminal that has the canfar library installed.
-
-Check authentication status with
-
-```bash
-canfar auth list
-```
-
-If not on "default", run
-
-```bash
-canfar auth switch default
-```
-
-
-## From previous setup
-
-### Merge all final catalogues
-
-The last step of `ShapePipe` processing is, per patch, to merget all final catalogues. This is done via a python script, as follows.
-First, change to parent directory `/path/to/version` and run the following command for all patches
-
-```bash
-patchnum=`tr $patch P ''`
-create_final_cat.py -m final_cat_$patch.hdf5 -i . -p $patch/cfis/final_cat.param \
- -P $patchnum -o $patch/n_tiles_final.txt -v
-```
-
-## Additional `ShapePipe` processing
-
-
-### Create star Catalogue
-
-We can additionaly create a combined star catalogue, with star shapes projecte from detector to world coordinates.
-This is useful for validation and galaxy-PSF/star correlation diagnostics.
-
-#### Combine all PSF runs
-
-In each patch directory /path/to/version/$patch, run
-
-```bash
-combine_runs.bash -p $psf -c psf
-```
-
-to create a single output directory of PSF files (symbolic links).
-
-Optionally, to create and plot results for this patch only:
-
-```bash
-shapepipe_run -c $SP_CONFIG/config_Ms_$psf.ini
-shapepipe_run -c $SP_CONFIG/config_Pl_$psf.ini
-```
-
-#### Convert star catalogue to wCS
-
-Convert all input validation PSF files and create directories per patch `P?`.
-Create files `validation_psf_conv--.fits` (for the v1.4 setup only one file):
-
-```bash
-cd /path/to/version
-mkdir stat_car
-cd star_cat
-```
-
-For each patch run
-
-```bash
-convert_psf_pix2world.py -i .. -P $patchnum -v
-```
-
-Combine previously created files as links within one ShapePipe run directory (for the v1.4 setup only one link).
-First (and optiohnal), create a subdir for a run and link to the input patches:
-
-```bash
-cd /path/to/version/star_cat
-mkdir v1.6
-ln -s ../P1
-ln -s ../P2
-...
-```
-
-Next, create links to all `validation_conv` runs:
-
-```bash
-combine_runs.bash -p psfex -c psf_conv
-```
-
-Merge all converted star catalogues and create `final-starcat.fits`:
-
-```bash
-export SP_RUN=`pwd`
-shapepipe_run -c ~/shapepipe/example/cfis/config_Ms_psfex_conv.ini
-```
-
-Rename to general PSF and star catalogue used for all ("a") sub-versions:
-
-
-```bash
-cp output/run_sp_Ms/merge_starcat_runner/output/full_starcat-0000000.fits \
- unions_shapepipe_psf_2024_v1.6.a.fits
-```
-
-The FITS file `CATTYPE` (newer version) should be `validation_psf_conf`.
-
-## Post-processing
-
-The following post-processing steps are performed with the library `sp_validation`.
-
-### Extract Information
-
-First, we extract all information from the final catalogue, per patch. We copy
-the parameter file and set links to the catalogues and `ShapePipe` config directory.
-
-```bash
-cd /path/to/version/$patch
-cp ~/astro/repositories/github/sp_validation/notebooks/params.py .
-ln -s /path/to/final_cat_$patchnum.hdf5 # not relative path ../final_cat_P$patchnum.hdf5 !
-ln -s output/run_sp_MsPl/mccd_merge_starcat_runner/output/full_starcat-0000000.fits
-ln -s ~/astro/repositories/github/shapepipe/example/cfis
-```
-
-Then edit `params.py`: Set patch name; set `wrap_ra` for P2.
-
-Now we can run the script, recommended via job submission on candide. For large patches,
-this requies a job with a large memory, e.g. with `mem=380000`
-
-
-```bash
-[squeue] python ~/astro/repositories/github/sp_validation/notebooks/extract_info.py
-```
-
-This creates a patch-wise comprehensive catalogue.
-
-### Create global comprehensive catalogues
-
-```bash
-cd /patch/to/version
-[squeue] python ~/astro/repositories/github/sp_validation/scripts/create_joint_comprehensive_cat.py \
- -v v1.6.c -v -p P1+P2+P3+P4+P5+P6+P7+P8+P9
-```
-
-This creates the file `unions_shapepipe_comprehensive_2024_v1.6.c.hdf5`.
-
-
-### Apply structural masks
-
-First, edit the Python script `~/astro/repositories/github/sp_validation/notebooks/demo_apply_hsp_masks.py`
-to match catalogue name. Check the coverage mask input file (see below).
-Run the script to apply the healsparse structural masks:
-
-```bash
-[squeue] python ~/astro/repositories/github/sp_validation/notebooks/demo_apply_hsp_masks.py
-```
-
-This creates the file `unions_shapepipe_comprehensive_struct_2024_v1.6.c.hdf5`.
-
-
-### Define sample, calibrate catalogue
-
-We are close to finally perform the last post-processing step, which is the calibration. First, the final galaxy sample
-in question needs to be defined, with masks and cuts to apply from a `yaml` config file. A number of pre-defined files
-can be found in `~/astro/repositories/github/sp_validation/calibration`.
-
-For example, to create `v1.6.6`, the steps are:
-
-```bash
-cd /path/to/version
-mkdir -p v1.6.6
-cd v1.6.6
-ln -s ~/astro/repositories/github/sp_validation/calibration/mask_v1.X.6.yaml config_mask.yaml
-ln -s ..//unions_shapepipe_comprehensive_struct_2024_v1.6.c.hdf5 unions_shapepipe_comprehensive_struct_2024_v1.X.c.hdf5
-[squeue] python ~/astro/repositories/github/sp_validation/calibrate_comprehensive_cat.py
-```
-
-calibrate_comprehensive
-
-
-### Create matched star catalogue
-
-For diagnostics, a catalogue with multi-epoch shapes measured by ngmix matched with the validation star catalogue is used.
-This is created as follows:
-
-```bash
-cd /path/to/version
-merge_psf_cat.py [-V v1.6|-P P1+P2+...] -v
-```
-
-This creates the joint catalogue unions_shapepipe_star_2024_v1.6.a.fits .
-
-### Create coverage mask
-
-First, on canfar, move to the directory that has the patch subdirectories.
-
-```bash
-cd /path/to/version
-```
-
-#### Get exposure numbers
-
-If the file `$patch/exp_numbers.txt` does not exist for a given patch, create it with the summary program
-
-```bash
-summary_run $patch 1
-```
-
-Now, create the list of CCDs that have PSF information with
-
-```bash
-get_ccds_with_psf -v -V v1.6
-```
-
-Next, download exposures headers
-
-```bash
-download_headers -i ccds_with_psfs_v1.6.txt -o headers_v1.6 -v
-```
-
-From the headers, the CCD corner coordinates are extracted with
-```bash
-extract_field_corners -i headers_v1.6 -v
-```
-
-Then, build the healsparse coverage mask file as
-```bash
-build_coverage_map
-```
-
-Use `plot_coverage_map` to create plots of the coverage mask.
-
-## Extra Utilities
-
-### Run in Terminal in Parallel
-
-```bash
-cat IDs.txt | xargs -I {} -P 16 bash -c 'init_run_exclusive_canfar.sh -j 512 -e {}'
-```
diff --git a/docs/source/post_processing.md b/docs/source/post_processing.md
index 577ad57b3..e8a3768f4 100644
--- a/docs/source/post_processing.md
+++ b/docs/source/post_processing.md
@@ -16,7 +16,16 @@ If main ShapePipe processing happened at the old canfar VM system (e.g. CFIS v0
---
-The following steps are required for pre-v1.4 runs performed on the canfar VM system.
+```{note}
+This page documents the **legacy** post-processing used for pre-v1.4 runs on the
+canfar VM system. PSF validation and the scale-dependent diagnostics
+(ρ-statistics) now live in
+[`sp_validation`](https://github.com/CosmoStat/sp_validation) rather than in
+ShapePipe; the steps below are retained for reference and for reproducing older
+runs.
+```
+
+The following steps were used for pre-v1.4 runs performed on the canfar VM system.
1. Optional: Split output into sub-samples
@@ -48,30 +57,25 @@ The following steps are required for pre-v1.4 runs performed on the canfar VM sy
This script creates a new combined psf run in the ShapePipe `output` directory, by identifying all psf validation files
and creating symbolic links. The run log file is updated.
- 3. Merge individual psf validation files into one catalogue. Create plots of the PSF and their residuals in the focal plane,
- as a diagnostic of the overall PSF model.
- As a scale-dependend test, which propagates directly to the shear correlation function, the rho statistics are computed,
- see {cite:p}`rowe:10` and {cite:p}`jarvis:16`,
+ 3. Merge the individual PSF validation files into one catalogue, and create
+ plots of the PSF and its residuals in the focal plane as a diagnostic of
+ the overall PSF model:
```bash
shapepipe_run -c /path/to/shapepipe/example/cfis/config_MsPl_PSF.ini
- ```
-
- 4. Prepare output directory
-
- Create links to all 'final_cat' result files with
- ```bash
- prepare_tiles_for_final
```
- The corresponding output directory that is created is `output/run_sp_combined/make_catalog_runner/output`.
- On success, it contains links to all `final_cat` output catalogues
+ The scale-dependent PSF diagnostics that propagate to the shear correlation
+ function (the ρ-statistics, {cite:p}`rowe:10`, {cite:p}`jarvis:16`) are no
+ longer computed here — they have moved to
+ [`sp_validation`](https://github.com/CosmoStat/sp_validation).
- 5. Merge final output files
-
- Create a single main shape catalog:
+ 4. Merge final output files
+
+ Create a single main shape catalogue:
```bash
merge_final_cat -i -p -v
```
- Choose as input directory `input_dir` the output of step C. A default
- parameter file `` is `/path/to/shapepipe/example/cfis/final_cat.param`.
+ Choose as input directory `input_dir` the `make_cat` output of the runs
+ being combined. A default parameter file `` is
+ `/path/to/shapepipe/example/cfis/final_cat.param`.
On success, the file `./final_cat.npy` is created. Depending on the number of
input tiles, this file can be several tens of Gb large.
diff --git a/docs/source/random_cat.md b/docs/source/random_cat.md
index 724ff6e32..a930d40ed 100644
--- a/docs/source/random_cat.md
+++ b/docs/source/random_cat.md
@@ -6,6 +6,15 @@ masks, and combined randoms and masks for a selection of tiles.
The masked regions are obtained on input from ShapePipe pixel mask ("pipeline flag")
files.
+```{note}
+Parts of this procedure use the legacy canfar-VM / `vos` retrieval workflow (see
+[VOSpace retrieval](vos_retrieve.md)) and the obsolete `prepare_tiles_for_final`
+helper, which is no longer shipped. The `random_cat` module itself is current;
+the input-staging and joint-mask steps now overlap with
+[`sp_validation`](https://github.com/CosmoStat/sp_validation). The steps are
+retained for reference.
+```
+
## Set up
### ID file and shell variables
@@ -64,7 +73,7 @@ while read p; do tar xvf pipeline_flag_$p.tgz; done
Date: Sun, 31 May 2026 23:23:05 +0200
Subject: [PATCH 02/29] docs(clusters): drop in-page anchor that renders as
literal text on GitHub
The explicit MyST target showed as raw '(candide-slurm)=' in GitHub's blob
view (where PR links point readers). Use a plain-text in-page reference; the
candide section is still reachable via the sidebar and GitHub's own heading
anchor.
Co-Authored-By: Claude Opus 4.8 (1M context)
---
docs/source/clusters.md | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/docs/source/clusters.md b/docs/source/clusters.md
index 85621ac7c..41d32a0bf 100644
--- a/docs/source/clusters.md
+++ b/docs/source/clusters.md
@@ -25,9 +25,8 @@ Three things hold on any cluster:
(and the SIF itself) at a roomy data partition.
MPI jobs add one constraint: the container's MPI must speak the same PMIx wire
-protocol as the host launcher (see [candide](#candide-slurm)).
+protocol as the host launcher (see the candide section below).
-(candide-slurm)=
## candide (SLURM)
candide uses **SLURM** (`sbatch`; the old `qsub`/PBS commands are gone). The repo
From 498b4b7746c4970c6d235c27b49b2f31c5d9e9fe Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Sun, 31 May 2026 23:37:31 +0200
Subject: [PATCH 03/29] docs: absorb README front door + MPI execution docs
from #737
Unify all user-facing docs in this PR (relocated from #737, which is now pure
code/infra):
- README front door (Quickstart + Documentation signpost). The signpost now
has a dedicated 'Running on a cluster' entry pointing at clusters.html, and
the container-workflow entry no longer claims to carry the cluster example
(that lives in clusters.md).
- basic_execution.md MPI section: the hybrid-Apptainer run pattern and the
OpenMPI-5 PMIx note, kept alongside the conda-framing fix.
- container.md gains a one-line pointer to clusters.md.
This removes the container.md/clusters.md duplication at the source rather than
reconciling it after merge.
Co-Authored-By: Claude Opus 4.8 (1M context)
---
README.rst | 46 +++++++++++++++++++++++++++++-----
docs/source/basic_execution.md | 30 +++++++++++++++++++---
docs/source/container.md | 3 ++-
3 files changed, 68 insertions(+), 11 deletions(-)
diff --git a/README.rst b/README.rst
index da8fe0701..e27b4028d 100644
--- a/README.rst
+++ b/README.rst
@@ -1,7 +1,7 @@
ShapePipe
=========
-|CI| |CD| |python39| |release|
+|CI| |CD| |python312| |release|
.. |CI| image:: https://github.com/CosmoStat/shapepipe/workflows/CI/badge.svg
:target: https://github.com/CosmoStat/shapepipe/actions?query=workflow%3ACI
@@ -9,14 +9,48 @@ ShapePipe
.. |CD| image:: https://github.com/CosmoStat/shapepipe/actions/workflows/pages/pages-build-deployment/badge.svg
:target: https://github.com/CosmoStat/shapepipe/actions/workflows/pages/pages-build-deployment
-.. |python39| image:: https://img.shields.io/badge/python-3.9-green.svg
- :target: https://www.python.org/‰
+.. |python312| image:: https://img.shields.io/badge/python-3.12-green.svg
+ :target: https://www.python.org/
.. |release| image:: https://img.shields.io/github/v/release/CosmoStat/shapepipe
:target: https://github.com/CosmoStat/shapepipe/releases/latest
ShapePipe is a galaxy shape measurement pipeline developed within the
-CosmoStat lab at CEA Paris-Saclay.
+CosmoStat lab at CEA Paris-Saclay. It runs the full chain from raw survey
+images to calibrated shear catalogues — object detection, PSF modelling, and
+shape measurement — and produced the first UNIONS cosmic-shear release.
-See the `documentation `_ for details
-on how to install and run ShapePipe.
+Quickstart
+----------
+
+ShapePipe ships as a container image, so you can run the bundled example
+pipeline — a single CFIS tile through the full chain — without installing
+anything:
+
+.. code-block:: bash
+
+ # Apptainer (HPC, no root needed):
+ apptainer exec docker://ghcr.io/cosmostat/shapepipe:develop-runtime shapepipe_run_example
+
+ # ...or Docker:
+ docker run --rm ghcr.io/cosmostat/shapepipe:develop-runtime shapepipe_run_example
+
+The image is published on every push to the `GitHub Container Registry
+`_:
+``:develop`` tracks the integration branch, release tags (e.g. ``:v1.1.0``) a
+stable cut, and the ``-runtime`` suffix selects the slim batch image over the
+full interactive one.
+
+Documentation
+-------------
+
+Full documentation lives at https://cosmostat.github.io/shapepipe. Good places
+to start:
+
+- `Installation `_ — getting ShapePipe onto your machine or cluster.
+- `Basic execution `_ and `configuration `_ — running ``shapepipe_run`` and writing pipeline configs.
+- `Container workflow `_ — the two image targets and the ``pyproject.toml`` / ``uv.lock`` / ``Dockerfile`` layers.
+- `Running on a cluster `_ — pulling the image and submitting jobs, with worked candide (SLURM) and CANFAR examples.
+
+If you use ShapePipe in academic work, please cite Guinot et al. (2022) and
+Farrens et al. (2022).
diff --git a/docs/source/basic_execution.md b/docs/source/basic_execution.md
index 7bceb6bed..d9aa6de07 100644
--- a/docs/source/basic_execution.md
+++ b/docs/source/basic_execution.md
@@ -37,11 +37,33 @@ shapepipe_run -c
## Running the Pipeline with MPI
ShapePipe can also use [mpi4py](https://mpi4py.readthedocs.io/en/stable/)
-for managing parallel processes on clusters with multiple nodes.
-The `shapepipe_run` script can be run with MPI as follows
+to spread work across multiple nodes of a cluster. Set `MODE = mpi` in the
+`[EXECUTION]` section of the config and launch with an MPI runner:
```bash
-mpiexec -n shapepipe_run
+mpiexec -n shapepipe_run -c
```
-where `` is the number of cores to allocate to the run.
+where `` is the number of MPI processes to start.
+
+### Through the container (the supported way on a cluster)
+
+On a cluster you run ShapePipe from the published image as a standard Apptainer
+*hybrid* MPI job: the **host** `mpirun`/`mpiexec` launches one container rank per
+slot, and the OpenMPI bundled in the image wires the ranks together.
+
+```bash
+# one-time: pull the runtime image
+apptainer pull shapepipe.sif docker://ghcr.io/cosmostat/shapepipe:develop-runtime
+
+# load a host MPI in the same family as the image's OpenMPI (5.0.x), then launch
+module load openmpi
+mpirun -n \
+ apptainer exec --bind "$PWD:$PWD" shapepipe.sif \
+ shapepipe_run -c
+```
+
+The image ships **OpenMPI 5.0.x** so that its PMIx matches modern cluster
+launchers. The host and container MPI must be compatible: if you see *N* copies
+of `rank 0 of 1` instead of one *N*-rank job, load a host OpenMPI in the 5.0.x
+family. See `example/pbs/candide_mpi.sh` for a complete SLURM batch script.
diff --git a/docs/source/container.md b/docs/source/container.md
index 2511a7f3e..174cde9dc 100644
--- a/docs/source/container.md
+++ b/docs/source/container.md
@@ -2,7 +2,8 @@
ShapePipe ships as a container image. This page covers the two image
targets and the three configuration files (`pyproject.toml`, `uv.lock`,
-`Dockerfile`) that determine what's inside.
+`Dockerfile`) that determine what's inside. For running the image on a batch
+cluster (candide, CANFAR), see [Running on a cluster](clusters.md).
## Two image targets
From d2d25b93b73415ae4a6d93c40642d75b20460cd5 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Mon, 1 Jun 2026 12:07:09 +0200
Subject: [PATCH 04/29] =?UTF-8?q?felt:=20ngmix=20v2.0=20PR=20#740=20review?=
=?UTF-8?q?=20prep=20=E2=80=94=20empirical=20fitter=20validation=20+=20sta?=
=?UTF-8?q?ged=20review?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Autonomous prep for the interactive review. Built the new module against real
ngmix 2.4.0 on candide and ran do_ngmix_metacal: shear recovery unbiased
(m=+2e-4). Centroid fix harmless but necessity not reproduced. CI never ran
(fork PR). Draft GitHub comments + report.html staged for the call.
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/draft-review.md | 86 ++++++++++
.felt/review-ngmix-v2-pr740/report.html | 152 ++++++++++++++++++
.../review-ngmix-v2-pr740.md | 65 ++++++++
3 files changed, 303 insertions(+)
create mode 100644 .felt/review-ngmix-v2-pr740/draft-review.md
create mode 100644 .felt/review-ngmix-v2-pr740/report.html
create mode 100644 .felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
diff --git a/.felt/review-ngmix-v2-pr740/draft-review.md b/.felt/review-ngmix-v2-pr740/draft-review.md
new file mode 100644
index 000000000..33b752187
--- /dev/null
+++ b/.felt/review-ngmix-v2-pr740/draft-review.md
@@ -0,0 +1,86 @@
+# Draft review — PR #740 "Ngmix v2.0" (staging, NOT posted)
+
+Staged for the call with Cail. Posting/pushing happens *with* him, in his voice.
+Authorship convention: "— Claude on behalf of Cail" unless he says otherwise.
+
+---
+
+## Reviewer checklist (Martin's template)
+
+| Box | Verdict |
+|---|---|
+| Targets `develop` | ✅ head `ngmix_v2.0` → base `develop` |
+| Assigned to developer | ⬜ (Cail to check on GitHub) |
+| Appropriate labels | ☑️ `enhancement` only — fine; could add a `dependencies` label given the ngmix bump |
+| Projects/milestones | ⬜ (Cail to check) |
+| Clear description | ⚠️ Summary is terse (4 bullets). For a +3894/−3410, 67-file PR that bumps a core dep and rewrites the shape-measurement module, a fuller description would help — esp. the centroid-bias result. |
+| "closes #" links | ❌ No issue links. #725 (Axel's centroid-shift PR) overlaps this work — should this PR close/supersede it? Worth an explicit cross-ref. |
+| Code/doc style | ⚠️ Mostly good. Pockets of debug cruft (see line-level). |
+| Docs updated | ❌ No `docs/` changes. The ngmix pin note in `CLAUDE.md` ("don't modernize this line") is now stale — this PR *is* the modernization. Needs updating on/with merge. |
+| **CI passing** | ❌ **No CI has run.** `statusCheckRollup` is empty; no workflow runs exist for `ngmix_v2.0`. It's a cross-repo (fork) PR — Actions need a maintainer "Approve and run", or push the branch to `origin`. **Cannot tick this box until CI runs green.** |
+| API docs built | ➖ n/a-ish (no public API doc surface changed materially) |
+| All files checked + comments | ✅ done in this review |
+
+---
+
+## Science-quality read (the part the checklist misses)
+
+### 1. ngmix 2.4.0 API correctness — ✅ verified empirically
+Installed ngmix **2.4.0** (from `esheldon/ngmix@v2.4.0`, the PR's own source) over the
+container stack and ran the **actual module code**. All 13 ngmix symbols the module
+uses resolve in 2.4.0 (`fitting.Fitter`, `guessers.TPSFFluxAndPriorGuesser`/`TFluxGuesser`,
+`runners.PSFRunner`/`Runner`, `metacal.MetacalBootstrapper`, `joint_prior.PriorSimpleSep`,
+`priors.*`, `observation.*`, `Jacobian`). The metacal/bootstrap path runs end-to-end.
+
+### 2. Shear recovery — ✅ unbiased
+Ran `do_ngmix_metacal` on 200 galsim galaxies (sub-pixel shifted, the centroid regime),
+recovered shear via the metacal response:
+- **m = +2×10⁻⁴ ± 3×10⁻⁴** (multiplicative bias consistent with zero)
+- **c₂ = −1×10⁻⁵** (additive consistent with zero), R₁₁≈R₂₂≈0.924, 200/200 fits OK.
+- [author's own `centroid_bias_v2.py` noiseless ±-cancellation number to be folded in]
+
+### 3. The centroid fix — present, correct in form, but **necessity not demonstrated in idealized sim**
+The fix (`make_ngmix_observation`, `ngmix.py:987-1004`) re-centers the galaxy Jacobian on
+the HSM `FindAdaptiveMom` centroid so the centroid prior (centered at the Jacobian origin)
+doesn't bias the fit. Sound idea. **But** running the same ensemble with the recenter
+disabled gives the *same* unbiased m (both ~3×10⁻⁴) — i.e. in this small-shift (±½ px),
+high-S/N regime the fix is harmless but not measurably necessary. **Question for the call:**
+in what regime does the bias the fix targets actually appear? The author's centroid scripts
+(`centroid_bias.py`, configs bug/fix/v2) presumably demonstrate it — is the before/after
+result captured anywhere we can point to? This is the single most important thing to pin down.
+
+### 4. `r50` vs `T` labeling — ⚠️ likely mislabel, worth clarifying
+In `compile_results` (`ngmix.py:415-417`): `output_dict[name]['r50'] = results[...]['pars'][4]`.
+For an ngmix `gauss` model, `pars[4]` is **T** (≈2σ², arcsec²), *not* a half-light radius —
+and `output_dict[name]['T']` stores the same value. And `r50_psfo = sqrt(T_psf/2)` is the
+Gaussian **σ**, not r50 (true Gaussian r50 = 1.1774σ). So columns named `r50*` actually carry
+T / σ. Downstream (sp_validation) needs to know what these columns mean. Is "r50" intentional
+shorthand, or should these be true half-light radii? (Constitution flagged "r50 as the size
+guess changed from T" — but the *guesser* still uses `T=0.25`; the change is in output column
+naming, not the guess. Worth confirming Martin's intent.)
+
+---
+
+## Line-level (staged as draft comments)
+
+- **`pyproject.toml`** — ngmix now `>=2.4` with `[tool.uv.sources] ngmix = git esheldon/ngmix tag v2.4.0`. Why pin to a git tag rather than the PyPI release `ngmix==2.4.0`? PyPI would be more reproducible and drop the git dependency. (PyPI was unreachable from here to confirm 2.4.0 is published — Cail can check.)
+- **`ngmix.py:63`** — stray `# I still don't know how to handle this` above `Tile_cat`.
+- **`ngmix.py:254`,`:528`-ish** — `# MKDEBUG` markers and `print(...)` debug statements (e.g. `:293` unreachable `print('rotating megapipe image')` after a `return`; `:564-565` `print("output_dict"...)` in an error branch). Clean up before merge.
+- **`ngmix.py:766`** — masked-fraction cut hardcodes `51 * 51`; if vignet size ever changes this silently breaks. Use `flag_vign.size`.
+- **`scripts/python/fitting.py`** — looks like a lightly-edited copy of esheldon's `fitting_bd_empsf.py` example; docstring still references `fracdev` / `fitting_bd_empsf.py` and a bulge+disk fit that no longer applies. Either trim to match or drop — is it meant to ship?
+- **`scripts/jupyter/test_centroid_shift.py`** — jupyter-exported (cell markers `# In[14]:`, commented-out code, inline-duplicated `make_data`). Now that `shapepipe.testing.simulate.make_data` exists, this could import it instead of carrying its own copy. Ship as-is or tidy?
+- **`src/shapepipe/testing/simulate.py`** — 👍 clean, dependency-light (galsim+numpy only), gives exactly the cheap validation path. Nice addition.
+- **Supporting runners** (`psfex_interp_runner`, `vignetmaker_runner`, `ngmix_runner`) — the `$SP_EXP`-tree resolver (`get_exp_output_dirs`) with config-gated fallback to the v1 run-log path is clean and backward-compatible. `f_wcs_path` now threaded via `input_file_list` consistently. No concerns.
+
+---
+
+## Suggested PR-level summary comment (draft)
+> Reviewed the diff and exercised it on candide against **ngmix 2.4.0** (installed from the
+> PR's own source). The 2.x API is used correctly and the metacal path runs end-to-end; on
+> simulated galaxies the v2.0 fitter recovers shear with **m ≈ 2×10⁻⁴** (consistent with zero).
+> Two things before I can tick the checklist: (1) **CI hasn't run** — it's a fork PR, needs an
+> "Approve and run" or a push to origin; (2) I couldn't reproduce a centroid bias in the
+> idealized sim with the fix *off*, so I'd like to understand the regime where the centroid fix
+> matters — is the before/after result from `centroid_bias.py` captured somewhere? Plus a few
+> cleanups (debug prints, the `r50`/`T` column naming, stale `CLAUDE.md` ngmix note). Details inline.
+> — Claude on behalf of Cail
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
new file mode 100644
index 000000000..210db5dde
--- /dev/null
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -0,0 +1,152 @@
+
+
+
+
+
+Report — review-ngmix-v2-pr740
+
+
+
+
+
+
+
+
+
The diff is read, the new fitter was built against real ngmix 2.4.0 and run
+ on candide, and it recovers shear unbiased. Two things block the checklist (CI never ran;
+ the centroid fix's necessity isn't demonstrated) and a handful of cleanups stand. Ready to
+ talk it through.
+
+
+
+
+
§ Current state
+
Autonomous prep done; held for the call.
+
+
What this PR is: +3894/−3410 across 67 files. It retires Axel's
+ aguinot/ngmix@stable_version fork for upstream ngmix 2.4.0 and
+ adopts Lucy Baumont's reworked ngmix classes/interface; near-total rewrite of
+ ngmix_package/ngmix.py (1252 lines); a centroid-bias fix + validation scripts;
+ a new dependency-light testing/simulate.py; and supporting pipeline plumbing for
+ the v2.0 production run (a $SP_EXP-tree directory resolver).
+
+
How I exercised it: the container ships ngmix 1.3.6, so I installed
+ ngmix 2.4.0 from the PR's own source (esheldon/ngmix@v2.4.0) over
+ the container's scientific stack and ran the actual module code — not a re-implementation.
The action for the call: walk the two blockers and the cleanups, decide
+ what gets posted to GitHub and in whose voice, decide whether to push fixes or request them
+ from Martin/Lucy. Merge/approve is yours. Draft comments are staged in
+ draft-review.md in this fiber dir.
+
+
+
+
+
+
§ Findings
+
What the review turned up.
+
+
ngmix 2.4.0 API — correct ✓
+
All 13 ngmix symbols the module uses resolve in 2.4.0 (Fitter, the
+ guessers, runners.PSFRunner/Runner,
+ MetacalBootstrapper, PriorSimpleSep, priors, observations, Jacobian).
+ The metacal/bootstrap path runs end-to-end and produces sane output.
+
+
Centroid fix — present and harmless, necessity not shown
+
The fix re-centers the galaxy Jacobian on the HSM adaptive-moment centroid
+ (make_ngmix_observation) so the centroid prior doesn't bias the fit. Running the
+ same ensemble with the recenter disabled gives the same unbiased m in this
+ idealized small-shift, high-S/N sim. So the form is right, but I could not reproduce the bias
+ it removes — which regime does it bite in? The author's own
+ centroid_bias.py / bug-fix-v2 configs presumably demonstrate it; that before/after
+ is the key thing to surface. (Author's noiseless centroid_bias_v2.py run was in
+ flight at handoff — number to be folded in.)
+
+
Checklist blockers
+
+
Item
State
+
CI passing
No CI ran — empty status rollup; fork PR needs "Approve and run" or a push to origin
None — and CLAUDE.md's "don't modernize ngmix" note is now stale
+
Targets develop
Yes
+
+
+
Code-level cleanups
+
+
r50/T column naming in compile_results: pars[4] is T (not a half-light radius) and r50_psfo = sqrt(T/2) is the Gaussian σ — columns named r50* carry T/σ. Intentional shorthand or a mislabel sp_validation will trip on?
+
Debug cruft in ngmix.py: # MKDEBUG, an unreachable print after a return, stray error-branch prints.
+
Masked-fraction cut hardcodes 51 * 51 — use flag.size.
+
scripts/python/fitting.py is a stale copy of esheldon's fitting_bd_empsf.py (docstring still mentions fracdev); test_centroid_shift.py is a jupyter export with cruft. Ship or tidy?
+
pyproject.toml: ngmix sourced via git tag rather than PyPI ngmix==2.4.0 — pin to PyPI if published.
+
testing/simulate.py and the $SP_EXP runner plumbing are clean — 👍.
+
+
+
+
+
+
+
§ Open questions
+
For Cail, on the call.
+
+
Centroid fix regime. Where does the bias appear? Is the author's before/after captured anywhere we can cite?
+
CI. Approve-and-run the fork PR, or ask Martin to push to origin? Can't sign off the checklist without green CI.
+
r50 columns. Confirm Martin/Lucy's intent for the size-column naming before sp_validation consumes them.
+
Voice + scope. What gets posted (full review vs a lighter "looks good, two questions"), in your voice or signed "Claude on behalf of Cail"? Push cleanups as suggestions, or just request them?
+
#725. Does this PR supersede Axel's centroid-shift PR?
+
+
+
+
+
+
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
new file mode 100644
index 000000000..2c121dfb1
--- /dev/null
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -0,0 +1,65 @@
+---
+name: 'Review + work: ngmix v2.0 (PR #740)'
+status: active
+tags:
+ - constitution
+ - shapepipe
+ - ngmix
+ - review
+created-at: 2026-06-01T11:50:17.967872934+02:00
+outcome: 'Autonomous prep done — HELD for Cail (interactive). Read the diff, built the new fitter against real ngmix 2.4.0 on candide, ran do_ngmix_metacal: shear recovery unbiased (m=+2e-4±3e-4, c2≈0, 200/200 fits). API correct. Blockers for the checklist: (1) NO CI ran — fork PR needs Approve-and-run/push to origin; (2) centroid fix is harmless but its necessity isn''t reproduced in idealized sim — need the regime/before-after. Plus cleanups (r50/T column naming, debug prints, stale CLAUDE.md ngmix note, ship-or-tidy fitting.py/test_centroid_shift.py). Draft comments staged in draft-review.md; report.html has the full read. Nothing posted/pushed yet — that''s the call.'
+horizon: now
+shuttle:
+ enabled: true
+ kind: oneshot
+ interactive: true
+ host: candide
+ project_dir: /automnt/n17data/cdaley/unions/shapepipe
+ agent: claude-opus
+ session:
+ id: be6fbfa6-8418-490b-bb05-d1936d5059ef
+ agent: claude-opus
+ dispatched_at: 2026-06-01T09:51:08.677418931Z
+---
+
+# Review + work: ngmix v2.0 (PR #740)
+
+Martin requested Cail's review on **[CosmoStat/shapepipe #740 "Ngmix v2.0"](https://github.com/CosmoStat/shapepipe/pull/740)** (head `ngmix_v2.0` → base `develop`, +3894/−3410 across 67 files). This is the PR that **realizes [[ngmix-update]]**: replace Axel's `stable_version` ngmix fork with **upstream ngmix 2.4.0** and adopt **@lbaumo's (Lucy Baumont's) new ngmix classes + interface**, reconciling the cleaned-up wrapper from her visit. Track it in [[prs-in-flight]]; it sits in the broader Martin collaboration under [[shapepipe]].
+
+## Desired State
+
+PR #740 has a **thorough, scientifically-grounded review**, and the collaboration moves forward:
+
+1. **A structured review exists**, worked against Martin's reviewer checklist (targets `develop` ✓, labels, clear description, "closes #" links, code/doc style, docs updated, **CI passing**, all changed files read with comments) — *and* against a **science-quality bar** that the checklist doesn't capture (below).
+2. **Comments are posted to GitHub** in Cail's voice (signed "— Claude on behalf of Cail" per the authorship convention, or Cail's own voice as he chooses on the call) — line-level where it's a specific code point, PR-level for the broader read.
+3. Where Cail decides, **fixes/suggestions are pushed** (as PR review suggestions or a branch off `ngmix_v2.0`) or **requested** from Martin/Lucy. **Merge/approve is Cail's gesture, never the worker's.**
+
+### The science-quality bar (what the checklist misses)
+
+The diff is dominated by `src/shapepipe/modules/ngmix_package/ngmix.py` (~1252 lines changed) — the module overhaul. The review must actually reason about the science, not just the diff mechanics:
+
+- **ngmix 2.4.0 API correctness.** The 1.x→2.x ngmix API changed substantially (observation/fitter/bootstrapper interfaces). Does the new code use the 2.4.0 API correctly — priors, PSF fitting, the metacal/bootstrapping path, guesses? @lbaumo's new classes: are the abstractions sound, and does the interface match how shapepipe calls them (`ngmix_runner.py`)?
+- **The centroid-bias fix.** A large part of this PR is centroid-bias work (`scripts/validation/centroid/centroid_bias{,_v2}.py`, the bug/fix/v2 configs, `test_centroid_shift.py`). Centroid shift biases shapes — this matters for UNIONS shear. **Does the fix actually remove the bias?** Look for the before/after evidence; if it's not in the PR, that's the key thing to ask for (or to *run* on candide).
+- **`r50` as the galaxy-size guess** (changed from `T`). Reasonable? Convergence/robustness implications for the fitter guesses.
+- **New `src/shapepipe/testing/simulate.py` + `scripts/python/fitting.py`** — what do they test/simulate, and do they give a cheap way to validate the v2.0 fitter independent of a full pipeline run?
+
+### Run it on candide (the repo is right here)
+
+`project_dir` is the live shapepipe checkout. Don't review the diff cold — exercise it:
+
+- `gh pr checkout 740` (or fetch `martin/ngmix_v2.0`) into a worktree so `develop`/`docs-rework` stays clean.
+- Run the **ngmix module / its tests**; run `test_centroid_shift.py` and the centroid configs if feasible; check the v2.0 fitter produces sane output on a small input. Use the repo's `CLAUDE.md` for build/run conventions and the `uv` lockfile.
+- Confirm **CI status** via `gh pr checks 740`.
+- Capture what you ran and what you saw — that empirical check is the most valuable part of the review (the curious eye; don't report a result you didn't observe).
+
+## Interactive — how this runs
+
+`claude-opus`, **interactive**. Do the full autonomous prep first: read the diff (`gh pr diff 740`), read the changed `ngmix.py` / runner / Lucy's classes closely, run what you can on candide, and assemble (a) the checklist pass and (b) the science-quality read, with specific line-level notes staged as *draft* comments. **Then hold and wait for Cail to attach** — he'll talk through the science, sharpen the comments, and decide what's posted, what's pushed, and whether to approve. Posting to GitHub and any push happen *with* him, in his voice; the worker drafts, Cail directs. Do not close or `kill` until Cail signals done.
+
+## Context
+
+- **Repo:** `/automnt/n17data/cdaley/unions/shapepipe` on candide. Remotes: `origin` = CosmoStat/shapepipe, `martin` = martinkilbinger/shapepipe-1. `gh` is authenticated here.
+- **PR #740** by Martin Kilbinger, opened 2026-06-01; head `ngmix_v2.0`, base `develop`; OPEN. Reviewer checklist is in the PR body.
+- **@lbaumo = Lucy Baumont** — her ngmix classes/interface are central; her wrapper cleanup came out of her visit (see [[ngmix-update]]).
+- **Prior art / related:** [[ngmix-update]] (the future-intent this realizes), [[prs-in-flight]] (the PR tracker — add #740 and its disposition), [[shapepipe]] (the Martin collaboration root). The repo `CLAUDE.md` carries build/test/run conventions — read it.
+- **Authorship:** GitHub comments in Cail's voice, signed "— Claude on behalf of Cail" unless Cail asks otherwise on the call.
From ddb79c87762d167b0795f4c6db03280c78d8782c Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Mon, 1 Jun 2026 12:11:00 +0200
Subject: [PATCH 05/29] felt: fold in author's noiseless centroid_bias_v2
result (m=+3.7e-4)
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/draft-review.md | 3 ++-
.felt/review-ngmix-v2-pr740/report.html | 13 ++++++++++---
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/draft-review.md b/.felt/review-ngmix-v2-pr740/draft-review.md
index 33b752187..532e8754e 100644
--- a/.felt/review-ngmix-v2-pr740/draft-review.md
+++ b/.felt/review-ngmix-v2-pr740/draft-review.md
@@ -37,7 +37,8 @@ Ran `do_ngmix_metacal` on 200 galsim galaxies (sub-pixel shifted, the centroid r
recovered shear via the metacal response:
- **m = +2×10⁻⁴ ± 3×10⁻⁴** (multiplicative bias consistent with zero)
- **c₂ = −1×10⁻⁵** (additive consistent with zero), R₁₁≈R₂₂≈0.924, 200/200 fits OK.
-- [author's own `centroid_bias_v2.py` noiseless ±-cancellation number to be folded in]
+- Author's own `centroid_bias_v2.py` (noiseless, 60 trials, ±-cancellation): **m = +3.7×10⁻⁴ ± 0.5×10⁻⁴ (1σ)**, c₂ = (−0.3 ± 0.15)×10⁻⁵, R₁₁≈0.9234.
+- **Read:** few×10⁻⁴ residual m — small, likely within UNIONS requirements, but the noiseless limit resolves a marginal *positive* residual (≈7σ given tiny noiseless bars). Not the centroid fix (fix-off gives the same) → metacal/fitter higher-order. Worth stating the m-requirement we're holding this to.
### 3. The centroid fix — present, correct in form, but **necessity not demonstrated in idealized sim**
The fix (`make_ngmix_observation`, `ngmix.py:987-1004`) re-centers the galaxy Jacobian on
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
index 210db5dde..efad59680 100644
--- a/.felt/review-ngmix-v2-pr740/report.html
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -80,9 +80,16 @@
Autonomous prep done; held for the call.
the container's scientific stack and ran the actual module code — not a re-implementation.
- 200 galsim galaxies, sub-pixel shifted (the centroid regime), metacal recovery:
- m = +2×10⁻⁴ ± 3×10⁻⁴ (mult. bias ≈ 0)
- c₂ = −1×10⁻⁵ (additive ≈ 0) · R₁₁≈R₂₂≈0.924 · 200/200 fits OK
+ My harness — 200 galsim galaxies, sub-pixel shifted, with noise:
+ m = +2×10⁻⁴ ± 3×10⁻⁴ (consistent with zero) · c₂ ≈ −1×10⁻⁵ · R≈0.924 · 200/200 OK
+
+ Author's own centroid_bias_v2.py — noiseless, 60 trials, ±-shear cancellation:
+ m = +3.7×10⁻⁴ ± 0.5×10⁻⁴ (1σ) · c₂ = (−0.3 ± 0.15)×10⁻⁵ · R₁₁≈0.9234 · S/N≈5×10¹⁷
+
+ Read: the v2.0 fitter sits at the few×10⁻⁴ level in m — small and
+ almost certainly within UNIONS requirements, but the noiseless limit resolves a marginally-significant
+ positive residual (not formally zero). It is not the centroid fix (fix-off gives the same),
+ so it's metacal/fitter higher-order, not centroid bias.
The action for the call: walk the two blockers and the cleanups, decide
From 3700ca2d31ab7538556371834d50e3fdfa6fe967 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Tue, 2 Jun 2026 00:34:24 +0200
Subject: [PATCH 06/29] felt: CI triggered via mirror PR #741; capture
build-only CI nuance
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/draft-review.md | 2 +-
.felt/review-ngmix-v2-pr740/report.html | 4 ++--
.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/draft-review.md b/.felt/review-ngmix-v2-pr740/draft-review.md
index 532e8754e..43c216b84 100644
--- a/.felt/review-ngmix-v2-pr740/draft-review.md
+++ b/.felt/review-ngmix-v2-pr740/draft-review.md
@@ -17,7 +17,7 @@ Authorship convention: "— Claude on behalf of Cail" unless he says otherwise.
| "closes #" links | ❌ No issue links. #725 (Axel's centroid-shift PR) overlaps this work — should this PR close/supersede it? Worth an explicit cross-ref. |
| Code/doc style | ⚠️ Mostly good. Pockets of debug cruft (see line-level). |
| Docs updated | ❌ No `docs/` changes. The ngmix pin note in `CLAUDE.md` ("don't modernize this line") is now stale — this PR *is* the modernization. Needs updating on/with merge. |
-| **CI passing** | ❌ **No CI has run.** `statusCheckRollup` is empty; no workflow runs exist for `ngmix_v2.0`. It's a cross-repo (fork) PR — Actions need a maintainer "Approve and run", or push the branch to `origin`. **Cannot tick this box until CI runs green.** |
+| **CI passing** | ⚠️ **Now running via same-repo mirror PR #741** (pushed `ngmix_v2.0` to origin). BUT only the branch's *old* `deploy-image.yml` ("Create and publish a Docker image") fires — it builds both images + smoke-tests binaries + `pytest --version` + **publishes to ghcr**; it does **not** run the pytest suite or example pipeline. The modern test-running workflow (`pull_request → develop`) doesn't queue because GitHub evaluates the trigger from the *head branch's* workflow file, which predates the modernization. `ci-release.yml` runs the full suite only on `pull_request → main/master`. **To run the full suite + example pipeline, the branch must have `develop` merged in** (pulls in modern CI + resolves drift). Original #740 got no CI at all because fork PRs don't trigger our Actions without maintainer approval. |
| API docs built | ➖ n/a-ish (no public API doc surface changed materially) |
| All files checked + comments | ✅ done in this review |
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
index efad59680..151d5b796 100644
--- a/.felt/review-ngmix-v2-pr740/report.html
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -123,7 +123,7 @@
Centroid fix — present and harmless, necessity not shown
Checklist blockers
Item
State
-
CI passing
No CI ran — empty status rollup; fork PR needs "Approve and run" or a push to origin
+
CI passing
Now running via same-repo mirror PR #741 (pushed ngmix_v2.0 to origin). But only the branch's old workflow fires — builds + binary smokes + publishes to ghcr, no pytest suite / example pipeline. Full suite needs develop merged into the branch (modern CI + drift resolved). Fork PR #740 got no CI at all (Actions approval gate).
None — and CLAUDE.md's "don't modernize ngmix" note is now stale
Targets develop
Yes
@@ -147,7 +147,7 @@
Code-level cleanups
For Cail, on the call.
Centroid fix regime. Where does the bias appear? Is the author's before/after captured anywhere we can cite?
-
CI. Approve-and-run the fork PR, or ask Martin to push to origin? Can't sign off the checklist without green CI.
+
CI. Mirror PR #741 now runs the build (publishing images), but the full test suite + example pipeline need develop merged into ngmix_v2.0. Ask Martin to merge develop up (also resolves drift), or do it ourselves? And: tell Martin to open future PRs directly on CosmoStat, not from a fork.
r50 columns. Confirm Martin/Lucy's intent for the size-column naming before sp_validation consumes them.
Voice + scope. What gets posted (full review vs a lighter "looks good, two questions"), in your voice or signed "Claude on behalf of Cail"? Push cleanups as suggestions, or just request them?
#725. Does this PR supersede Axel's centroid-shift PR?
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 2c121dfb1..2c8bce9ca 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -7,7 +7,7 @@ tags:
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
-outcome: 'Autonomous prep done — HELD for Cail (interactive). Read the diff, built the new fitter against real ngmix 2.4.0 on candide, ran do_ngmix_metacal: shear recovery unbiased (m=+2e-4±3e-4, c2≈0, 200/200 fits). API correct. Blockers for the checklist: (1) NO CI ran — fork PR needs Approve-and-run/push to origin; (2) centroid fix is harmless but its necessity isn''t reproduced in idealized sim — need the regime/before-after. Plus cleanups (r50/T column naming, debug prints, stale CLAUDE.md ngmix note, ship-or-tidy fitting.py/test_centroid_shift.py). Draft comments staged in draft-review.md; report.html has the full read. Nothing posted/pushed yet — that''s the call.'
+outcome: 'Autonomous prep done — HELD for Cail (interactive). Exercised the new fitter against real ngmix 2.4.0 on candide: API correct, shear recovery unbiased at few-e-4 in m (my harness +2e-4±3e-4; author''s noiseless centroid_bias_v2 +3.7e-4±0.5e-4). Centroid fix harmless but its bias not reproduced (fix-off same) -> metacal higher-order, not centroid. CI: triggered via same-repo mirror PR #741 (pushed ngmix_v2.0 to origin, all commits stay Martin''s); building now BUT only the branch''s old workflow (build+binaries+publish, NO pytest suite/example pipeline) — full suite needs develop merged into the branch. Fork PR #740 got no CI (approval gate). Cleanups staged: r50/T column mislabel, debug prints, hardcoded 51*51, stale CLAUDE.md ngmix note, ship-or-tidy fitting.py/test_centroid_shift.py. report.html + draft-review.md have the full read. Nothing reviewed-posted to GitHub yet (only the #741 mirror, which credits Martin).'
horizon: now
shuttle:
enabled: true
From 7a647ec2fc108c5fc98db385ff2fa711fecb1551 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Tue, 2 Jun 2026 00:43:04 +0200
Subject: [PATCH 07/29] felt: PR #741 build CI green (images build, ngmix 2.4.0
resolves, published); suite still pending merge-develop
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/draft-review.md | 2 +-
.felt/review-ngmix-v2-pr740/report.html | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/draft-review.md b/.felt/review-ngmix-v2-pr740/draft-review.md
index 43c216b84..812511452 100644
--- a/.felt/review-ngmix-v2-pr740/draft-review.md
+++ b/.felt/review-ngmix-v2-pr740/draft-review.md
@@ -17,7 +17,7 @@ Authorship convention: "— Claude on behalf of Cail" unless he says otherwise.
| "closes #" links | ❌ No issue links. #725 (Axel's centroid-shift PR) overlaps this work — should this PR close/supersede it? Worth an explicit cross-ref. |
| Code/doc style | ⚠️ Mostly good. Pockets of debug cruft (see line-level). |
| Docs updated | ❌ No `docs/` changes. The ngmix pin note in `CLAUDE.md` ("don't modernize this line") is now stale — this PR *is* the modernization. Needs updating on/with merge. |
-| **CI passing** | ⚠️ **Now running via same-repo mirror PR #741** (pushed `ngmix_v2.0` to origin). BUT only the branch's *old* `deploy-image.yml` ("Create and publish a Docker image") fires — it builds both images + smoke-tests binaries + `pytest --version` + **publishes to ghcr**; it does **not** run the pytest suite or example pipeline. The modern test-running workflow (`pull_request → develop`) doesn't queue because GitHub evaluates the trigger from the *head branch's* workflow file, which predates the modernization. `ci-release.yml` runs the full suite only on `pull_request → main/master`. **To run the full suite + example pipeline, the branch must have `develop` merged in** (pulls in modern CI + resolves drift). Original #740 got no CI at all because fork PRs don't trigger our Actions without maintainer approval. |
+| **CI passing** | ⚠️ **Build GREEN, suite NOT run.** Via same-repo mirror PR #741 (pushed `ngmix_v2.0` to origin). The branch's *old* `deploy-image.yml` ran and **passed every step**: builds both images (so the new `uv.lock` + ngmix 2.4.0 resolve and the image builds with all system tools ✅), binary smokes ✅, entry point ✅, `pytest --version` ✅, published to ghcr ✅. But it does **not** run the pytest suite or example pipeline — those need the modern workflow. The modern test-running workflow (`pull_request → develop`) doesn't queue because GitHub evaluates the trigger from the *head branch's* workflow file, which predates the modernization. `ci-release.yml` runs the full suite only on `pull_request → main/master`. **To run the full suite + example pipeline, the branch must have `develop` merged in** (pulls in modern CI + resolves drift). Original #740 got no CI at all because fork PRs don't trigger our Actions without maintainer approval. |
| API docs built | ➖ n/a-ish (no public API doc surface changed materially) |
| All files checked + comments | ✅ done in this review |
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
index 151d5b796..e012a3799 100644
--- a/.felt/review-ngmix-v2-pr740/report.html
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -123,7 +123,7 @@
Centroid fix — present and harmless, necessity not shown
Checklist blockers
Item
State
-
CI passing
Now running via same-repo mirror PR #741 (pushed ngmix_v2.0 to origin). But only the branch's old workflow fires — builds + binary smokes + publishes to ghcr, no pytest suite / example pipeline. Full suite needs develop merged into the branch (modern CI + drift resolved). Fork PR #740 got no CI at all (Actions approval gate).
+
CI passing
Build GREEN, suite not run via mirror PR #741. Old workflow passed every step — images build (new uv.lock + ngmix 2.4.0 resolve ✅), binary smokes ✅, entry point ✅, published to ghcr ✅. But no pytest suite / example pipeline — those need develop merged into the branch (modern CI). Fork PR #740 got no CI at all (Actions approval gate).
None — and CLAUDE.md's "don't modernize ngmix" note is now stale
Targets develop
Yes
From 98839205deb23932c9996555c93b9b82251ead0a Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Tue, 2 Jun 2026 01:09:46 +0200
Subject: [PATCH 08/29] felt: merged develop into ngmix_v2.0 (clean but
uv.lock); full suite now running on #741
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/draft-review.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.felt/review-ngmix-v2-pr740/draft-review.md b/.felt/review-ngmix-v2-pr740/draft-review.md
index 812511452..314eb2eed 100644
--- a/.felt/review-ngmix-v2-pr740/draft-review.md
+++ b/.felt/review-ngmix-v2-pr740/draft-review.md
@@ -17,7 +17,7 @@ Authorship convention: "— Claude on behalf of Cail" unless he says otherwise.
| "closes #" links | ❌ No issue links. #725 (Axel's centroid-shift PR) overlaps this work — should this PR close/supersede it? Worth an explicit cross-ref. |
| Code/doc style | ⚠️ Mostly good. Pockets of debug cruft (see line-level). |
| Docs updated | ❌ No `docs/` changes. The ngmix pin note in `CLAUDE.md` ("don't modernize this line") is now stale — this PR *is* the modernization. Needs updating on/with merge. |
-| **CI passing** | ⚠️ **Build GREEN, suite NOT run.** Via same-repo mirror PR #741 (pushed `ngmix_v2.0` to origin). The branch's *old* `deploy-image.yml` ran and **passed every step**: builds both images (so the new `uv.lock` + ngmix 2.4.0 resolve and the image builds with all system tools ✅), binary smokes ✅, entry point ✅, `pytest --version` ✅, published to ghcr ✅. But it does **not** run the pytest suite or example pipeline — those need the modern workflow. The modern test-running workflow (`pull_request → develop`) doesn't queue because GitHub evaluates the trigger from the *head branch's* workflow file, which predates the modernization. `ci-release.yml` runs the full suite only on `pull_request → main/master`. **To run the full suite + example pipeline, the branch must have `develop` merged in** (pulls in modern CI + resolves drift). Original #740 got no CI at all because fork PRs don't trigger our Actions without maintainer approval. |
+| **CI passing** | ⚠️ **Build GREEN, suite NOT run.** Via same-repo mirror PR #741 (pushed `ngmix_v2.0` to origin). The branch's *old* `deploy-image.yml` ran and **passed every step**: builds both images (so the new `uv.lock` + ngmix 2.4.0 resolve and the image builds with all system tools ✅), binary smokes ✅, entry point ✅, `pytest --version` ✅, published to ghcr ✅. But it does **not** run the pytest suite or example pipeline — those need the modern workflow. **→ RESOLVED:** merged `develop` into `ngmix_v2.0` (clean except `uv.lock`, regenerated via `uv lock`) — this pulls in the modern CI; the **full test suite + example pipeline now run** on #741 (`build-test-publish`). The merge also makes #741's diff show the true ngmix delta vs develop. Result pending. Note develop already ships a `test_ngmix.py` the v2.0 code must now pass. The modern test-running workflow (`pull_request → develop`) doesn't queue because GitHub evaluates the trigger from the *head branch's* workflow file, which predates the modernization. `ci-release.yml` runs the full suite only on `pull_request → main/master`. **To run the full suite + example pipeline, the branch must have `develop` merged in** (pulls in modern CI + resolves drift). Original #740 got no CI at all because fork PRs don't trigger our Actions without maintainer approval. |
| API docs built | ➖ n/a-ish (no public API doc surface changed materially) |
| All files checked + comments | ✅ done in this review |
From 4c04015fc5e6333e1c36660b8d2fd508b842eb94 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Tue, 2 Jun 2026 01:26:47 +0200
Subject: [PATCH 09/29] felt: full-suite CI green on #741; record 6-angle
code-review findings
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/draft-review.md | 34 +++++++++++++++++--
.felt/review-ngmix-v2-pr740/report.html | 32 ++++++++++-------
.../review-ngmix-v2-pr740.md | 2 +-
3 files changed, 52 insertions(+), 16 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/draft-review.md b/.felt/review-ngmix-v2-pr740/draft-review.md
index 314eb2eed..31cd8852e 100644
--- a/.felt/review-ngmix-v2-pr740/draft-review.md
+++ b/.felt/review-ngmix-v2-pr740/draft-review.md
@@ -17,7 +17,7 @@ Authorship convention: "— Claude on behalf of Cail" unless he says otherwise.
| "closes #" links | ❌ No issue links. #725 (Axel's centroid-shift PR) overlaps this work — should this PR close/supersede it? Worth an explicit cross-ref. |
| Code/doc style | ⚠️ Mostly good. Pockets of debug cruft (see line-level). |
| Docs updated | ❌ No `docs/` changes. The ngmix pin note in `CLAUDE.md` ("don't modernize this line") is now stale — this PR *is* the modernization. Needs updating on/with merge. |
-| **CI passing** | ⚠️ **Build GREEN, suite NOT run.** Via same-repo mirror PR #741 (pushed `ngmix_v2.0` to origin). The branch's *old* `deploy-image.yml` ran and **passed every step**: builds both images (so the new `uv.lock` + ngmix 2.4.0 resolve and the image builds with all system tools ✅), binary smokes ✅, entry point ✅, `pytest --version` ✅, published to ghcr ✅. But it does **not** run the pytest suite or example pipeline — those need the modern workflow. **→ RESOLVED:** merged `develop` into `ngmix_v2.0` (clean except `uv.lock`, regenerated via `uv lock`) — this pulls in the modern CI; the **full test suite + example pipeline now run** on #741 (`build-test-publish`). The merge also makes #741's diff show the true ngmix delta vs develop. Result pending. Note develop already ships a `test_ngmix.py` the v2.0 code must now pass. The modern test-running workflow (`pull_request → develop`) doesn't queue because GitHub evaluates the trigger from the *head branch's* workflow file, which predates the modernization. `ci-release.yml` runs the full suite only on `pull_request → main/master`. **To run the full suite + example pipeline, the branch must have `develop` merged in** (pulls in modern CI + resolves drift). Original #740 got no CI at all because fork PRs don't trigger our Actions without maintainer approval. |
+| **CI passing** | ✅ **GREEN** (full suite). Got here by pushing a same-repo mirror (PR #741, branch `ngmix_v2.0` on origin) and merging `develop` into it (clean except `uv.lock`, regenerated via `uv lock`) to pull in the modernized CI. `build-test-publish` ran every step green: images build (new `uv.lock` + ngmix 2.4.0 resolve), binary smokes, entry point, **pytest suite** (incl. develop's `test_ngmix.py`), example pipeline; publish correctly skipped on the PR. Original fork PR #740 got no CI (Actions approval gate). | The modern test-running workflow (`pull_request → develop`) doesn't queue because GitHub evaluates the trigger from the *head branch's* workflow file, which predates the modernization. `ci-release.yml` runs the full suite only on `pull_request → main/master`. **To run the full suite + example pipeline, the branch must have `develop` merged in** (pulls in modern CI + resolves drift). Original #740 got no CI at all because fork PRs don't trigger our Actions without maintainer approval. |
| API docs built | ➖ n/a-ish (no public API doc surface changed materially) |
| All files checked + comments | ✅ done in this review |
@@ -62,7 +62,37 @@ naming, not the guess. Worth confirming Martin's intent.)
---
-## Line-level (staged as draft comments)
+## Code-review pass (multi-agent, high-effort) — findings
+
+Ran a 6-angle code-review over the ngmix delta. CI is green, so none of these break the
+test suite — but the suite is shallow on these paths. Ranked.
+
+### Correctness — confirmed
+1. **`get_guess` ImportError — shipped script dead on arrival.** `scripts/validation/centroid/centroid_bias.py:37` does `from ...ngmix import get_guess, get_noise`, but this PR removed `get_guess` from the module. The script ImportErrors on load. (Only `centroid_bias.py`; `test_centroid_shift.py` imports just `get_noise`.) Fix the import, restore `get_guess`, or drop the v1 script.
+2. **Reproducibility break — unseeded RNG.** `ngmix.py:948-949` `prepare_ngmix_weights` uses global `np.random.randn` for the noise image + masked-pixel fill, while the module deliberately switched to a seeded `self._rng` (`:266`). Same tile + same seed → different masked-pixel values and metacal noise → non-reproducible shears. Thread `rng` into `prepare_ngmix_weights`/`make_ngmix_observation`.
+3. **`*_psfo` columns now carry the metacal-RECONVOLVED PSF, not the original PSFEx/MCCD PSF.** `average_multiepoch_psf` reads `obsdict['noshear'][n_e].psf.meta['result']` — the reconvolved metacal PSF — and feeds `g_PSFo`/`T_PSFo`/`r50_PSFo`. But `compile_results` documents these as "the original image psf from psfex or mccd," and PSF-leakage / ρ-stat / null tests consume them as the input PSF. The old code fit the raw pre-metacal PSF separately. Confirm intent; the comment is now wrong either way.
+4. **`CHECK_EXISTING_DIR` resume silently dropped.** New `process()` never reads `self._check_existing_dir` or calls `get_last_id` (both now dead). A configured resume reprocesses from scratch and overwrites — a documented feature silently no-ops.
+
+### Correctness — plausible, needs Martin/Lucy intent (raise as questions)
+5. **Weight-map normalization changed.** Old `prepare_ngmix_weights` binarized the mask (`weight[weight!=0]=1`) then applied a flat `1/sig_noise²`. New keeps the per-pixel (already FSCALE-rescaled) weight AND multiplies by `1/sigma_mad(gal)²`, with `sigma_mad` taken over the *object-containing* stamp (flux-contaminated, vs the old windowed object-free `get_noise`). Looks like double inverse-variance + a biased noise estimate → shifts the fit's χ² weighting, errors, s2n, and PSF-averaging weights vs the validated v1. Intentional?
+6. **Per-type PSF columns collapsed.** `Tpsf`/`g1_psf`/`g2_psf` are now identical across all 5 metacal HDUs (all from `T_PSFo`/`g_PSFo`); old code stored per-type values. May be fine if the metacal target PSF is type-independent — but if any response/selection step differences them it now gets exactly zero.
+7. **Galaxy zero-pixel guard narrowed (`any`→`all`).** `prepare_postage_stamps:737` now skips an epoch only if `np.all(gal_vign==0)`; the old code skipped if *any* pixel was 0. Partial CCD-edge stamps (a band of exact-zero pixels) now enter the fit, relying on flags/weights to catch them. Intentional loosening or a regression?
+8. **PSF fit uses the galaxy joint prior + galaxy `FLUX_AUTO` as the PSF flux guess** (`do_ngmix_metacal`). Matches the upstream ngmix example's prior reuse, but the galaxy-flux guess for the PSF is odd; possible PSF-fit convergence/quality impact. Lower confidence.
+
+### Altitude / structure
+9. **Runner input-contract mismatch (4 runners).** `sextractor_runner`, `read_ext_sexcat_runner`, `psfex_interp_runner`, `vignetmaker_runner` moved the WCS/exp-numbers paths from named config options (`LOG_WCS`/`ME_LOG_WCS`/`ME_DOT_PSF_DIR`) to positional `input_file_list[i]` — but, unlike `ngmix_runner`, their `@module_runner` decorators were **not** updated. The real input contract now lives entirely in each config's `FILE_PATTERN`/`FILE_EXT` ordering; the shipped v2.0 example configs are correct, but a hand-written/v1-style config silently mis-maps inputs or IndexErrors with no schema to catch it. `ngmix_runner` is the model — extend the other four decorators to match.
+10. **`r50`/`T` mislabel (units).** `compile_results:415-417` stores `pars[4]` (= `T`, arcsec²) under `r50`/`r50_err`, and `r50_PSFo = sqrt(T_psf/2)` is the Gaussian σ, not the half-light radius (`r50 = 1.1774σ`). sp_validation consumes these size columns. Intentional shorthand or a mislabel?
+
+### Cleanups (bundle into one comment)
+- `save_results`: `ngmix.py:547` non-`f` string → error prints literal `{ext_name}`; leftover `print(...)` debug at `:564-565`; the `np.append`-per-batch FITS append is O(n²) over a tile.
+- Dead code: unreachable `print` after `return` in `MegaCamFlip` (`:293`); unused `sextractor_e1e2` (`:1131`, with a copy-pasted wrong docstring); `# I still don't know how to handle this` (`:63`); commented `#self._gal_vignet_path` block (`:240-244`); `# MKDEBUG` markers; redundant `self._f_wcs_path`.
+- `51 * 51` hardcoded masked-fraction denominator (`:766`) → use `v_flag_tmp.size`; near-term goal is configurable stamp sizes, so this will silently go wrong.
+- Duplication: `scripts/python/fitting.py` and `scripts/jupyter/test_centroid_shift.py` each redefine their own `make_data`/`get_prior`, duplicating `testing/simulate.py` + the module `get_prior` added in this same PR (defeats simulate.py's "stable across branches" purpose). `fitting.py` is a stray upstream-ngmix example (`fitting_bd_empsf.py` header, `espy`/`images` refs) — likely an unintended add; delete or make it import the canonical helpers.
+- Minor efficiency: doubled `galsim.Image(gal, scale=1.0)` in `make_ngmix_observation`; repeated uncached `SqliteDict[str(obj_id)]` deserialization per object/epoch; `get_exp_output_dirs` re-globs identical dirs when a runner repeats in `ME_IMAGE_EXP_RUNNERS`.
+
+---
+
+## Line-level (earlier staged notes — superseded by the code-review pass above for ngmix.py)
- **`pyproject.toml`** — ngmix now `>=2.4` with `[tool.uv.sources] ngmix = git esheldon/ngmix tag v2.4.0`. Why pin to a git tag rather than the PyPI release `ngmix==2.4.0`? PyPI would be more reproducible and drop the git dependency. (PyPI was unreachable from here to confirm 2.4.0 is published — Cail can check.)
- **`ngmix.py:63`** — stray `# I still don't know how to handle this` above `Tile_cat`.
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
index e012a3799..1391dcb7b 100644
--- a/.felt/review-ngmix-v2-pr740/report.html
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -57,10 +57,11 @@
PR #740 — Ngmix v2.0: review prep.
-
The diff is read, the new fitter was built against real ngmix 2.4.0 and run
- on candide, and it recovers shear unbiased. Two things block the checklist (CI never ran;
- the centroid fix's necessity isn't demonstrated) and a handful of cleanups stand. Ready to
- talk it through.
+
The diff is read, the new fitter was built against real ngmix 2.4.0 and runs
+ unbiased, and — via a same-repo mirror (PR #741) with develop merged in — the full CI
+ suite is now green. A 6-angle code-review pass then caught several real issues the
+ green suite doesn't (a dead-on-arrival script, an unseeded-RNG reproducibility break, PSF
+ column semantics, a dropped resume feature). Ready to talk it through.
@@ -123,21 +124,25 @@
Centroid fix — present and harmless, necessity not shown
Checklist blockers
Item
State
-
CI passing
Build GREEN, suite not run via mirror PR #741. Old workflow passed every step — images build (new uv.lock + ngmix 2.4.0 resolve ✅), binary smokes ✅, entry point ✅, published to ghcr ✅. But no pytest suite / example pipeline — those need develop merged into the branch (modern CI). Fork PR #740 got no CI at all (Actions approval gate).
+
CI passing
GREEN — full suite on mirror PR #741 (pushed ngmix_v2.0 to origin, merged develop in to get the modern CI). Images build (new uv.lock + ngmix 2.4.0 resolve), binary smokes, entry point, pytest suite (incl. develop's test_ngmix.py), example pipeline — all pass; publish skipped on PR. Fork PR #740 had no CI (approval gate).
None — and CLAUDE.md's "don't modernize ngmix" note is now stale
Targets develop
Yes
-
Code-level cleanups
+
Code-review pass (multi-agent, 6 angles) — what it caught
+
CI is green, so none of these break the suite — but the suite is shallow on these paths. Full list + line refs in draft-review.md.
+
Confirmed:
-
r50/T column naming in compile_results: pars[4] is T (not a half-light radius) and r50_psfo = sqrt(T/2) is the Gaussian σ — columns named r50* carry T/σ. Intentional shorthand or a mislabel sp_validation will trip on?
-
Debug cruft in ngmix.py: # MKDEBUG, an unreachable print after a return, stray error-branch prints.
-
Masked-fraction cut hardcodes 51 * 51 — use flag.size.
-
scripts/python/fitting.py is a stale copy of esheldon's fitting_bd_empsf.py (docstring still mentions fracdev); test_centroid_shift.py is a jupyter export with cruft. Ship or tidy?
-
pyproject.toml: ngmix sourced via git tag rather than PyPI ngmix==2.4.0 — pin to PyPI if published.
-
testing/simulate.py and the $SP_EXP runner plumbing are clean — 👍.
+
Dead-on-arrival script:centroid_bias.py:37 imports get_guess, which this PR deletes → ImportError. A validation script shipped in the same PR can't run.
+
Reproducibility break:prepare_ngmix_weights uses global np.random.randn for the noise image while the module deliberately switched to a seeded self._rng → same tile+seed gives different shears.
+
PSF column semantics:*_psfo (g_PSFo/T_PSFo/r50_PSFo) now carry the metacal-reconvolved PSF (read from obsdict['noshear'].psf.meta), not the original PSFEx/MCCD PSF the column doc claims and PSF-systematics tests expect.
+
Dropped feature:CHECK_EXISTING_DIR resume is gone from process() — a configured resume silently reprocesses from scratch.
+
Runner contract (altitude): 4 runners moved WCS/exp-numbers to positional input_file_list[i] but didn't update their @module_runner decorators (only ngmix_runner did) — contract now lives in per-config FILE_PATTERN ordering; shipped configs OK, hand-written ones silently mis-map.
+
Units:r50/r50_err store pars[4] = T (arcsec²), and r50_PSFo = sqrt(T/2) = Gaussian σ — not half-light radii. sp_validation consumes these.
+
Plausible — need Martin/Lucy's intent (raise as questions): weight-map normalization changed (looks like double inverse-variance + flux-contaminated noise estimate); per-type PSF columns collapsed to one value; galaxy zero-pixel skip narrowed any→all (edge stamps now enter the fit); PSF fit uses the galaxy prior + galaxy flux guess.
+
Cleanups: non-f error string + debug prints in save_results; dead sextractor_e1e2, unreachable print, MKDEBUG markers; hardcoded 51*51; fitting.py is a stray upstream ngmix example; make_data/get_prior duplicated across the validation scripts vs simulate.py. testing/simulate.py + the $SP_EXP plumbing themselves are clean 👍.
@@ -147,7 +152,8 @@
Code-level cleanups
For Cail, on the call.
Centroid fix regime. Where does the bias appear? Is the author's before/after captured anywhere we can cite?
-
CI. Mirror PR #741 now runs the build (publishing images), but the full test suite + example pipeline need develop merged into ngmix_v2.0. Ask Martin to merge develop up (also resolves drift), or do it ourselves? And: tell Martin to open future PRs directly on CosmoStat, not from a fork.
+
CI / PR plumbing. Done: mirror PR #741 (origin ngmix_v2.0) with develop merged in is green on the full suite. Decide: keep #741 as the PR of record and close #740, or hand the merged branch back to Martin? And tell Martin future PRs go directly on CosmoStat, not from a fork (fork PRs don't trigger CI).
+
Which findings to post, and as line comments vs a PR-level summary — the 4 confirmed correctness items are clear; the 4 plausible ones are better as questions to Martin/Lucy.
r50 columns. Confirm Martin/Lucy's intent for the size-column naming before sp_validation consumes them.
Voice + scope. What gets posted (full review vs a lighter "looks good, two questions"), in your voice or signed "Claude on behalf of Cail"? Push cleanups as suggestions, or just request them?
#725. Does this PR supersede Axel's centroid-shift PR?
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 2c8bce9ca..ccfe1eb6b 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -7,7 +7,7 @@ tags:
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
-outcome: 'Autonomous prep done — HELD for Cail (interactive). Exercised the new fitter against real ngmix 2.4.0 on candide: API correct, shear recovery unbiased at few-e-4 in m (my harness +2e-4±3e-4; author''s noiseless centroid_bias_v2 +3.7e-4±0.5e-4). Centroid fix harmless but its bias not reproduced (fix-off same) -> metacal higher-order, not centroid. CI: triggered via same-repo mirror PR #741 (pushed ngmix_v2.0 to origin, all commits stay Martin''s); building now BUT only the branch''s old workflow (build+binaries+publish, NO pytest suite/example pipeline) — full suite needs develop merged into the branch. Fork PR #740 got no CI (approval gate). Cleanups staged: r50/T column mislabel, debug prints, hardcoded 51*51, stale CLAUDE.md ngmix note, ship-or-tidy fitting.py/test_centroid_shift.py. report.html + draft-review.md have the full read. Nothing reviewed-posted to GitHub yet (only the #741 mirror, which credits Martin).'
+outcome: 'Autonomous prep done — HELD for Cail (interactive). Fitter validated against real ngmix 2.4.0 on candide: unbiased, m~few-e-4. CI now GREEN on full suite via mirror PR #741 (pushed ngmix_v2.0 to origin + merged develop in for modern CI; pytest suite incl test_ngmix.py + example pipeline all pass). 6-angle code-review caught real issues the green suite misses — CONFIRMED: centroid_bias.py imports deleted get_guess (ImportError), unseeded np.random.randn breaks reproducibility, *_psfo columns now hold metacal-reconvolved (not original) PSF, CHECK_EXISTING_DIR resume silently dropped, 4 runners'' @module_runner decorators not updated for positional WCS inputs, r50/T column units mislabel. PLAUSIBLE (questions for Martin/Lucy): weight-map normalization change (double inv-var?), per-type PSF columns collapsed, zero-pixel skip any->all, PSF fit uses galaxy prior+flux guess. Plus cleanups (non-f error string, dead code, 51*51, duplicated make_data/get_prior, stray fitting.py). All staged in draft-review.md + report.html. NOTHING reviewed-posted to GitHub yet (only #741 mirror, credits Martin). Next: decide what to post + #740/#741 disposition with Cail.'
horizon: now
shuttle:
enabled: true
From 8374c2804725dbd6ad6913839e21c39cbdb973f5 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Tue, 2 Jun 2026 02:27:07 +0200
Subject: [PATCH 10/29] felt: pushed rng-fix+repro-test and scripts-import test
to #741; record code-review + test work
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/report.html | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
index 1391dcb7b..2acdeba10 100644
--- a/.felt/review-ngmix-v2-pr740/report.html
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -93,10 +93,17 @@
Autonomous prep done; held for the call.
so it's metacal/fitter higher-order, not centroid bias.
-
The action for the call: walk the two blockers and the cleanups, decide
- what gets posted to GitHub and in whose voice, decide whether to push fixes or request them
- from Martin/Lucy. Merge/approve is yours. Draft comments are staged in
- draft-review.md in this fiber dir.
+
Pushed to #741 (with Cail, 2026-06-02): two tests + one fix, after
+ observing red→green locally — (1) fix(ngmix): thread seeded rng… + a
+ reproducibility test (same seed → identical shear; failed before, passes after); (2)
+ test: smoke-import… extending import coverage to scripts/ —
+ deliberately red on centroid_bias.py's removed-get_guess import
+ (resolution is Martin's call). These went on our mirror PR (coupled to the feature's
+ correctness; see [[separate-infra-prs]] for why bundling here is fine).
+
Next: a short review posting the cut-and-dry items (PSF doc/code
+ contradiction as a question; the reproducibility fix; the get_guess breakage;
+ a mention of the runner-decorator inconsistency). The four methodology findings get a
+ deeper joint walkthrough with Cail tomorrow. Draft staged in draft-review.md.
so it's metacal/fitter higher-order, not centroid bias.
-
Pushed to #741 (with Cail, 2026-06-02): two tests + one fix, after
- observing red→green locally — (1) fix(ngmix): thread seeded rng… + a
- reproducibility test (same seed → identical shear; failed before, passes after); (2)
- test: smoke-import… extending import coverage to scripts/ —
- deliberately red on centroid_bias.py's removed-get_guess import
- (resolution is Martin's call). These went on our mirror PR (coupled to the feature's
- correctness; see [[separate-infra-prs]] for why bundling here is fine).
-
Next: a short review posting the cut-and-dry items (PSF doc/code
- contradiction as a question; the reproducibility fix; the get_guess breakage;
- a mention of the runner-decorator inconsistency). The four methodology findings get a
- deeper joint walkthrough with Cail tomorrow. Draft staged in draft-review.md.
+
Landed on #741 (with Cail, 2026-06-02), CI now fully green — 260 passed:
+
+
fix(ngmix): thread seeded rng… + test_metacal_is_reproducible_with_fixed_seed — observed red→green (same seed: max|dg1|≈3e-5 → 0.0). The one outright bug we fixed.
+
test_scripts_import.py — extends import-smoke into scripts/ (develop's test_imports.py only walks the package, so scripts were uncovered — that's how get_guess slipped through). Covers centroid_bias_v2.py + fitting.py.
+
Deleted the stranded v1 centroid_bias.py — it imports the removed get_guess and is only ever run against the test_centroid_bug worktree (via run_bias_test.sh's SHAPEPIPE_PATH); the copy on this branch was never executed. Superseded by centroid_bias_v2.py.
+
+
The centroid bug, characterized: the v1 harness documents it as m ≈ −2.8×10⁻² on old ngmix 1.3.6 (test_centroid_bug branch), → ≈0 with fix_jac_centroid. That's why we couldn't reproduce a bias on v2.0: the bug lived in the old jacobian/centroid path, which the rewrite replaces entirely. v2.0's residual is the +4×10⁻⁴ we measured — not the bug.
+
Open for the review (draft in draft-review.md): PSF doc/code contradiction (question to Martin); the four methodology findings (joint walkthrough with Cail, next); the runner-decorator mention; r50/T units; and a new scope question — the bug/fix cross-version rig (run_all.sh, bug/fix configs/envs) hardcodes mkilbing home paths, so: ship only the reusable v2 parts, or mark the rig local-only? Plus a dangling thread: centroid_bug/fix.yaml still name the now-deleted centroid_bias.py (resolves to the old worktree at runtime).
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index ccfe1eb6b..410845190 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -7,7 +7,7 @@ tags:
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
-outcome: 'Autonomous prep done — HELD for Cail (interactive). Fitter validated against real ngmix 2.4.0 on candide: unbiased, m~few-e-4. CI now GREEN on full suite via mirror PR #741 (pushed ngmix_v2.0 to origin + merged develop in for modern CI; pytest suite incl test_ngmix.py + example pipeline all pass). 6-angle code-review caught real issues the green suite misses — CONFIRMED: centroid_bias.py imports deleted get_guess (ImportError), unseeded np.random.randn breaks reproducibility, *_psfo columns now hold metacal-reconvolved (not original) PSF, CHECK_EXISTING_DIR resume silently dropped, 4 runners'' @module_runner decorators not updated for positional WCS inputs, r50/T column units mislabel. PLAUSIBLE (questions for Martin/Lucy): weight-map normalization change (double inv-var?), per-type PSF columns collapsed, zero-pixel skip any->all, PSF fit uses galaxy prior+flux guess. Plus cleanups (non-f error string, dead code, 51*51, duplicated make_data/get_prior, stray fitting.py). All staged in draft-review.md + report.html. NOTHING reviewed-posted to GitHub yet (only #741 mirror, credits Martin). Next: decide what to post + #740/#741 disposition with Cail.'
+outcome: 'Interactive review of PR #740 ngmix v2.0, working on our mirror PR #741. CI now FULLY GREEN (260 passed). Landed: rng-threading fix + reproducibility property test (observed red->green), scripts-import smoke test extending coverage to scripts/ (develop''s test_imports only walks the package — that''s why get_guess slipped through), and deleted the stranded v1 centroid_bias.py (old-API, imports removed get_guess, only ever run from the test_centroid_bug worktree; superseded by centroid_bias_v2.py). Centroid bug characterized: v1 harness documents m~-2.8e-2 on old ngmix 1.3.6, ~0 with fix_jac_centroid — the bug is an OLD-path phenomenon, which is why v2.0 (a full rewrite) shows no bias beyond ~+4e-4 residual. Deleted the over-scoped separate_infra_prs memory (Cail: overkill/limiting). STILL OPEN for the review write-up: PSF *_psfo doc-vs-code contradiction (now reports metacal-reconvolved PSF not original — post as question to Martin); the 4 methodology findings (weight normalization, per-type PSF collapse, zero-pixel any->all, PSF fit prior/flux) — deeper joint walkthrough with Cail NEXT; runner-decorator mention; r50/T units; harness-scope question (run_all.sh + bug/fix configs hardcode mkilbing home paths — ship only reusable v2 parts or mark local?); dangling centroid_bug/fix.yaml refs to deleted script. Nothing reviewed-posted to GitHub yet (only the #741 mirror PR body, which credits Martin).'
horizon: now
shuttle:
enabled: true
From 7c057377a6ab3bc48047c05f51394a6d41501bda Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Tue, 2 Jun 2026 13:18:24 +0200
Subject: [PATCH 12/29] felt: posted review part 1/2 to #741 (rng fix, repro
test, scripts-import + get_guess removal); part 2 (methodology) deferred
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 410845190..1a3866bea 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -7,7 +7,7 @@ tags:
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
-outcome: 'Interactive review of PR #740 ngmix v2.0, working on our mirror PR #741. CI now FULLY GREEN (260 passed). Landed: rng-threading fix + reproducibility property test (observed red->green), scripts-import smoke test extending coverage to scripts/ (develop''s test_imports only walks the package — that''s why get_guess slipped through), and deleted the stranded v1 centroid_bias.py (old-API, imports removed get_guess, only ever run from the test_centroid_bug worktree; superseded by centroid_bias_v2.py). Centroid bug characterized: v1 harness documents m~-2.8e-2 on old ngmix 1.3.6, ~0 with fix_jac_centroid — the bug is an OLD-path phenomenon, which is why v2.0 (a full rewrite) shows no bias beyond ~+4e-4 residual. Deleted the over-scoped separate_infra_prs memory (Cail: overkill/limiting). STILL OPEN for the review write-up: PSF *_psfo doc-vs-code contradiction (now reports metacal-reconvolved PSF not original — post as question to Martin); the 4 methodology findings (weight normalization, per-type PSF collapse, zero-pixel any->all, PSF fit prior/flux) — deeper joint walkthrough with Cail NEXT; runner-decorator mention; r50/T units; harness-scope question (run_all.sh + bug/fix configs hardcode mkilbing home paths — ship only reusable v2 parts or mark local?); dangling centroid_bug/fix.yaml refs to deleted script. Nothing reviewed-posted to GitHub yet (only the #741 mirror PR body, which credits Martin).'
+outcome: 'Interactive review of ngmix v2.0 (#740), worked on mirror PR #741 (CI green, 260 passed). POSTED to #741: review part 1/2 (COMMENTED) with 3 inline comments — the reproducibility rng fix (+ its test), and the scripts-import smoke test that exposed the get_guess breakage (v1 centroid_bias.py removed as a stranded old-API copy). These are the items we fixed AND tested. NOT yet posted — review part 2/2 (the science/methodology judgement calls): *_psfo columns now report the metacal-reconvolved PSF not the original PSFEx/MCCD PSF (doc-vs-code contradiction); weight-map normalization change; r50/T units; per-type PSF collapse; zero-pixel any->all; PSF fit prior/flux guess; runner-decorator mention. Those need a joint walkthrough with Cail (he wants education not assertion) — DEFERRED TO NEXT SESSION per his call. Harness note: scripts/validation/centroid/ rig is all under scripts/ (not library) with hardcoded mkilbing paths — fine to live in scripts/, maybe flag as local scaffolding; only testing/simulate.py went into the package and it''s clean. Merge/approve is Cail+Martin''s. Loop Martin in on #741 (tell him fork PRs don''t trigger CI; open on CosmoStat directly).'
horizon: now
shuttle:
enabled: true
From 8c44bdaff29449d92c616e7eb92878597474882a Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Wed, 3 Jun 2026 16:28:14 +0200
Subject: [PATCH 13/29] =?UTF-8?q?felt:=20closed=20review-ngmix-v2-pr740=20?=
=?UTF-8?q?=E2=80=94=20part=202/2=20review=20posted=20to=20#741=20(11=20in?=
=?UTF-8?q?line=20+=20summary);=20review=20delivered,=20Martin=20to=20merg?=
=?UTF-8?q?e?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 1a3866bea..84539c4e1 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -1,25 +1,21 @@
---
name: 'Review + work: ngmix v2.0 (PR #740)'
-status: active
+status: closed
tags:
- constitution
- shapepipe
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
-outcome: 'Interactive review of ngmix v2.0 (#740), worked on mirror PR #741 (CI green, 260 passed). POSTED to #741: review part 1/2 (COMMENTED) with 3 inline comments — the reproducibility rng fix (+ its test), and the scripts-import smoke test that exposed the get_guess breakage (v1 centroid_bias.py removed as a stranded old-API copy). These are the items we fixed AND tested. NOT yet posted — review part 2/2 (the science/methodology judgement calls): *_psfo columns now report the metacal-reconvolved PSF not the original PSFEx/MCCD PSF (doc-vs-code contradiction); weight-map normalization change; r50/T units; per-type PSF collapse; zero-pixel any->all; PSF fit prior/flux guess; runner-decorator mention. Those need a joint walkthrough with Cail (he wants education not assertion) — DEFERRED TO NEXT SESSION per his call. Harness note: scripts/validation/centroid/ rig is all under scripts/ (not library) with hardcoded mkilbing paths — fine to live in scripts/, maybe flag as local scaffolding; only testing/simulate.py went into the package and it''s clean. Merge/approve is Cail+Martin''s. Loop Martin in on #741 (tell him fork PRs don''t trigger CI; open on CosmoStat directly).'
+closed-at: 2026-06-03T16:28:05.521094506+02:00
+outcome: 'Review of PR #740 (ngmix v2.0) delivered as a coherent two-part review on CI-mirror #741 (part 1: fixed+tested items; part 2: empirical verification + 11 line-anchored findings + methodology questions for Martin/Lucy). Verified empirically on candide against real ngmix 2.4.0: API correct, metacal recovers m=+2e-4 (consistent w/ zero), centroid fix benign (its bias lives in old ngmix-1.x path, can''t repro on current code — expected). Key findings: weight-norm change, *_psfo now reconvolved-PSF + per-type collapse, zero-pixel guard any->all, CHECK_EXISTING_DIR resume dropped, r50/T mislabel, runner decorator contracts unupdated for 4 runners. Martin to go through and merge; offer to push cut-and-dry fixes stands. Merge is Cail''s/Martin''s gesture.'
horizon: now
shuttle:
enabled: true
kind: oneshot
- interactive: true
host: candide
project_dir: /automnt/n17data/cdaley/unions/shapepipe
agent: claude-opus
- session:
- id: be6fbfa6-8418-490b-bb05-d1936d5059ef
- agent: claude-opus
- dispatched_at: 2026-06-01T09:51:08.677418931Z
---
# Review + work: ngmix v2.0 (PR #740)
From 34015d0426f058cb20e85a108abfe85ffd45232c Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Fri, 5 Jun 2026 02:00:02 +0200
Subject: [PATCH 14/29] =?UTF-8?q?felt:=20review-ngmix-v2-pr740=20round=202?=
=?UTF-8?q?=20=E2=80=94=20next-steps=20triage=20posted=20to=20#741?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Re-reviewed against current #741 head: no code changed since the part-2
review, so all 11 findings stand. Martin closed fork PR #740 and
consolidated onto #741 (canonical, green, mergeable); engaged only to ack
the RNG fix. Triaged 11 findings (5 cut-and-dry / 5 decisions / 1 resume);
weight-norm (949) + *_psfo (1045) flagged as the only two merge-gates.
report.html rewritten as next-steps triage; summary comment posted to
#741 (issuecomment-4626968551); prs-in-flight indexed with #740-closed
disposition. Report-only round; fiber closed for Cail's review.
Co-Authored-By: Claude Opus 4.8
---
.felt/prs-in-flight/prs-in-flight.md | 3 +-
.felt/review-ngmix-v2-pr740/report.html | 225 +++++++++++-------
.../review-ngmix-v2-pr740.md | 20 +-
3 files changed, 152 insertions(+), 96 deletions(-)
diff --git a/.felt/prs-in-flight/prs-in-flight.md b/.felt/prs-in-flight/prs-in-flight.md
index ff110eb0e..5d8923e83 100644
--- a/.felt/prs-in-flight/prs-in-flight.md
+++ b/.felt/prs-in-flight/prs-in-flight.md
@@ -51,7 +51,8 @@ for the one open security-fiber.)
| # | Author | What |
|---|---|---|
-| #725 | aguinot | Fix centroid shift |
+| #741 | martinkilbinger / lbaumo | **Ngmix v2.0** — upstream ngmix 2.4.0 + Lucy's new classes. Canonical PR (CI mirror; fork PR #740 closed by Martin). OPEN, mergeable, CI green. Two-part review + next-steps triage delivered; 11 findings open (5 cut-and-dry, 5 decisions, 1 resume), 2 are merge-gates (weight-norm, `*_psfo`). See [[review-ngmix-v2-pr740]]. |
+| #725 | aguinot | Fix centroid shift (overlaps #741 centroid work — cross-ref/supersede?) |
| #704 | martinkilbinger | Contributors |
| #703 | martinkilbinger | V1.3.x |
| #699 | martinkilbinger | Coverage mask |
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
index b5e074826..14e2259c5 100644
--- a/.felt/review-ngmix-v2-pr740/report.html
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -18,7 +18,7 @@
html, body { margin: 0; padding: 0; }
body { background: var(--parchment); color: var(--ink); font-family: var(--serif); font-size: 19px; line-height: 1.55; -webkit-font-smoothing: antialiased; }
.page { max-width: 880px; margin: 0 auto; padding: 56px 32px 72px; }
- .stamp { font-family: var(--mono); font-size: 11px; letter-spacing: 0.12em; text-transform: uppercase; color: var(--ink-muted); margin-bottom: 20px; display: flex; gap: 14px; align-items: center; }
+ .stamp { font-family: var(--mono); font-size: 11px; letter-spacing: 0.12em; text-transform: uppercase; color: var(--ink-muted); margin-bottom: 20px; display: flex; gap: 14px; align-items: center; flex-wrap: wrap; }
.stamp .dot { width: 6px; height: 6px; border-radius: 50%; background: var(--cinnabar); }
.stamp .sep { color: var(--rule); }
h1 { font-family: var(--serif); font-style: italic; font-weight: 500; font-size: 44px; line-height: 1.1; margin: 0 0 16px; }
@@ -29,18 +29,26 @@
p { margin: 0 0 16px; }
ul, ol { margin: 0 0 16px; padding-left: 22px; }
li { margin-bottom: 8px; }
- code { font-family: var(--mono); font-size: 0.88em; background: rgba(196, 147, 51, 0.12); padding: 1px 5px; border-radius: 2px; }
+ code { font-family: var(--mono); font-size: 0.86em; background: rgba(196, 147, 51, 0.12); padding: 1px 5px; border-radius: 2px; }
hr { border: none; border-top: 1px solid var(--rule); margin: 40px 0; }
hr.short { width: 80px; margin: 32px 0; border-top-width: 2px; border-top-color: var(--cinnabar); }
blockquote { margin: 16px 0; padding-left: 20px; border-left: 2px solid var(--ochre); font-style: italic; }
- table { border-collapse: collapse; width: 100%; font-size: 16px; margin: 0 0 18px; }
- th, td { text-align: left; padding: 6px 10px; border-bottom: 1px solid var(--rule); vertical-align: top; }
+ table { border-collapse: collapse; width: 100%; font-size: 15.5px; margin: 0 0 18px; }
+ th, td { text-align: left; padding: 7px 10px; border-bottom: 1px solid var(--rule); vertical-align: top; }
th { font-family: var(--mono); font-size: 11px; letter-spacing: 0.08em; text-transform: uppercase; color: var(--ink-muted); }
+ td.ln { font-family: var(--mono); font-size: 13px; white-space: nowrap; }
.v-ok { color: var(--teal); font-weight: 600; }
.v-warn { color: var(--ochre); font-weight: 600; }
.v-no { color: var(--cinnabar); font-weight: 600; }
+ .pill { font-family: var(--mono); font-size: 10.5px; letter-spacing: 0.04em; text-transform: uppercase; padding: 2px 7px; border-radius: 10px; white-space: nowrap; }
+ .pill-a { background: rgba(63, 130, 120, 0.16); color: var(--teal); }
+ .pill-b { background: rgba(188, 69, 56, 0.12); color: var(--cinnabar); }
+ .pill-c { background: rgba(196, 147, 51, 0.18); color: #9a6f1c; }
.metric { font-family: var(--mono); background: var(--cinnabar-faint); border-left: 3px solid var(--cinnabar); padding: 14px 18px; margin: 0 0 18px; font-size: 15px; }
.lede { font-size: 22px; font-style: italic; color: var(--ink); }
+ ol.seq { counter-reset: step; list-style: none; padding-left: 0; }
+ ol.seq > li { position: relative; padding-left: 44px; margin-bottom: 16px; }
+ ol.seq > li::before { counter-increment: step; content: counter(step); position: absolute; left: 0; top: 0; width: 28px; height: 28px; line-height: 28px; text-align: center; font-family: var(--mono); font-size: 14px; color: var(--rag); background: var(--cinnabar); border-radius: 50%; }
@@ -48,119 +56,154 @@
The diff is read, the new fitter was built against real ngmix 2.4.0 and runs
- unbiased, and — via a same-repo mirror (PR #741) with develop merged in — the full CI
- suite is now green. A 6-angle code-review pass then caught several real issues the
- green suite doesn't (a dead-on-arrival script, an unseeded-RNG reproducibility break, PSF
- column semantics, a dropped resume feature). Ready to talk it through.
+
The two-part review is delivered, CI is green, and Martin has consolidated the
+ work onto PR #741 — closing the fork PR #740. But the eleven line-level findings
+ are still unanswered, and no code has changed since the review, so all eleven
+ stand exactly as written. #741 is mechanically mergeable today; the real decision is which
+ findings to land before merge and which can merge-then-iterate. This report triages them and
+ proposes a sequence.
§ Current state
-
Autonomous prep done; held for the call.
-
-
What this PR is: +3894/−3410 across 67 files. It retires Axel's
- aguinot/ngmix@stable_version fork for upstream ngmix 2.4.0 and
- adopts Lucy Baumont's reworked ngmix classes/interface; near-total rewrite of
- ngmix_package/ngmix.py (1252 lines); a centroid-bias fix + validation scripts;
- a new dependency-light testing/simulate.py; and supporting pipeline plumbing for
- the v2.0 production run (a $SP_EXP-tree directory resolver).
-
-
How I exercised it: the container ships ngmix 1.3.6, so I installed
- ngmix 2.4.0 from the PR's own source (esheldon/ngmix@v2.4.0) over
- the container's scientific stack and ran the actual module code — not a re-implementation.
-
-
- My harness — 200 galsim galaxies, sub-pixel shifted, with noise:
- m = +2×10⁻⁴ ± 3×10⁻⁴ (consistent with zero) · c₂ ≈ −1×10⁻⁵ · R≈0.924 · 200/200 OK
-
- Author's own centroid_bias_v2.py — noiseless, 60 trials, ±-shear cancellation:
- m = +3.7×10⁻⁴ ± 0.5×10⁻⁴ (1σ) · c₂ = (−0.3 ± 0.15)×10⁻⁵ · R₁₁≈0.9234 · S/N≈5×10¹⁷
-
- Read: the v2.0 fitter sits at the few×10⁻⁴ level in m — small and
- almost certainly within UNIONS requirements, but the noiseless limit resolves a marginally-significant
- positive residual (not formally zero). It is not the centroid fix (fix-off gives the same),
- so it's metacal/fitter higher-order, not centroid bias.
-
-
-
Landed on #741 (with Cail, 2026-06-02), CI now fully green — 260 passed:
-
-
fix(ngmix): thread seeded rng… + test_metacal_is_reproducible_with_fixed_seed — observed red→green (same seed: max|dg1|≈3e-5 → 0.0). The one outright bug we fixed.
-
test_scripts_import.py — extends import-smoke into scripts/ (develop's test_imports.py only walks the package, so scripts were uncovered — that's how get_guess slipped through). Covers centroid_bias_v2.py + fitting.py.
-
Deleted the stranded v1 centroid_bias.py — it imports the removed get_guess and is only ever run against the test_centroid_bug worktree (via run_bias_test.sh's SHAPEPIPE_PATH); the copy on this branch was never executed. Superseded by centroid_bias_v2.py.
-
-
The centroid bug, characterized: the v1 harness documents it as m ≈ −2.8×10⁻² on old ngmix 1.3.6 (test_centroid_bug branch), → ≈0 with fix_jac_centroid. That's why we couldn't reproduce a bias on v2.0: the bug lived in the old jacobian/centroid path, which the rewrite replaces entirely. v2.0's residual is the +4×10⁻⁴ we measured — not the bug.
-
Open for the review (draft in draft-review.md): PSF doc/code contradiction (question to Martin); the four methodology findings (joint walkthrough with Cail, next); the runner-decorator mention; r50/T units; and a new scope question — the bug/fix cross-version rig (run_all.sh, bug/fix configs/envs) hardcodes mkilbing home paths, so: ship only the reusable v2 parts, or mark the rig local-only? Plus a dangling thread: centroid_bug/fix.yaml still name the now-deleted centroid_bias.py (resolves to the old worktree at runtime).
+
#741 is now the PR of record — green, mergeable, waiting on intent.
+
+
Topology shifted. Martin closed #740 (his fork branch
+ martinkilbinger:ngmix_v2.0) on 06-03, ~15 min after the part-2 review landed,
+ and consolidated onto #741 — the CI mirror on
+ CosmoStat:ngmix_v2.0. So #741 is the canonical PR going forward:
+ OPEN, MERGEABLE,
+ CI green (build-test-publish pass, full suite). The branch
+ head carries the three part-1 fixes (seeded-RNG thread, scripts-import smoke, v1
+ centroid_bias.py deletion) and the merge of develop. There have been
+ no commits since the part-2 review.
+
+
Martin's engagement is light. He closed #740, posted a (body-less)
+ COMMENTED review on #741, and left a single inline reply — on the
+ reproducibility fix, not one of the open findings:
+
"Good. This probably was previous code before we changed to a random seed per
+ tile." — on the seeded-RNG threading
+ (ngmix.py:951)
+
That acknowledges the one outright bug we'd already fixed. None of the eleven part-2
+ findings have a response yet — consistent with his "just post the review and I'll go
+ through it" signal. So the ball is split two ways: the cut-and-dry items wait on
+ someone to push them (Cail's offer stands), and the science questions wait on
+ Martin/Lucy's intent.
+
+
Re-review verdict. I re-read the ngmix module and the four runners against
+ the current branch head. Every finding is still live at the line it was anchored to — nothing
+ was silently addressed, nothing drifted. The triage below is therefore a clean carry-forward of
+ the eleven, sorted by what unblocks them.
-
§ Findings
-
What the review turned up.
-
-
ngmix 2.4.0 API — correct ✓
-
All 13 ngmix symbols the module uses resolve in 2.4.0 (Fitter, the
- guessers, runners.PSFRunner/Runner,
- MetacalBootstrapper, PriorSimpleSep, priors, observations, Jacobian).
- The metacal/bootstrap path runs end-to-end and produces sane output.
-
-
Centroid fix — present and harmless, necessity not shown
-
The fix re-centers the galaxy Jacobian on the HSM adaptive-moment centroid
- (make_ngmix_observation) so the centroid prior doesn't bias the fit. Running the
- same ensemble with the recenter disabled gives the same unbiased m in this
- idealized small-shift, high-S/N sim. So the form is right, but I could not reproduce the bias
- it removes — which regime does it bite in? The author's own
- centroid_bias.py / bug-fix-v2 configs presumably demonstrate it; that before/after
- is the key thing to surface. (Author's noiseless centroid_bias_v2.py run was in
- flight at handoff — number to be folded in.)
-
-
Checklist blockers
+
§ Triage
+
The eleven findings, by what unblocks them.
+
+
All eleven are open (none resolved by Martin). The useful cut isn't
+ open-vs-resolved — it's the kind of action each needs:
+
+
A · cut-and-dry mechanical, no science — push as one cleanup
+ commit. B · decision needs Martin/Lucy intent; some move
+ downstream science. C · resume dead feature — decide
+ rewire vs remove.
+
-
Item
State
-
CI passing
GREEN — full suite on mirror PR #741 (pushed ngmix_v2.0 to origin, merged develop in to get the modern CI). Images build (new uv.lock + ngmix 2.4.0 resolve), binary smokes, entry point, pytest suite (incl. develop's test_ngmix.py), example pipeline — all pass; publish skipped on PR. Fork PR #740 had no CI (approval gate).
None — and CLAUDE.md's "don't modernize ngmix" note is now stale
-
Targets develop
Yes
+
Loc
Finding
Kind
Owner
Action
+
+
293
Unreachable print after return in the megapipe-rotate branch.
A
any
Delete the line.
+
1140
sextractor_e1e2 is unused and its docstring is copy-pasted from another function.
A
any
Drop it, or fix the docstring if kept.
+
766
Masked-fraction cut divides by a literal 51 * 51; breaks when stamp size becomes configurable (on the roadmap).
A
any
v_flag_tmp.size instead of the literal.
+
sextractor_ runner:74 +3
WCS moved to a positional input_file_list[-1] in 4 runners (sextractor, read_ext_sexcat, psfex_interp, vignetmaker) but their @module_runner decorators weren't updated (only ngmix_runner was). Shipped configs are correct, so not a live bug — but the decorator is the input contract.
A
any
Declare the WCS input in each decorator's file_pattern/file_ext.
+
fitting.py
Near-verbatim copy of esheldon's ngmix example; not wired to the pipeline, and duplicates the new testing/simulate.py helpers it ships alongside.
A
Martin/ Lucy
Confirm it's not meant to ship → drop, or trim to import the canonical helpers.
+
+
949
Weight-map normalization changed vs v1.sigma_mad(gal) is taken over the whole (flux-contaminated) stamp where v1 used the windowed, object-free get_noise (still defined, now unused); and the per-pixel weight map is then multiplied by 1/sig_noise² — reads as double inverse-variance. Shifts χ²-weighting, reported errors/s2n, and PSF-averaging weights.
B
Martin/ Lucy
Highest-impact. Confirm intended, or restore the windowed estimate / single weighting.
+
1045
*_psfo now carries the metacal-reconvolved PSF.g/T/r50_PSFo read from obsdict['noshear'].psf.meta (reconvolved), and the per-type PSF columns collapse to one value. PSF-leakage / ρ-stat / null tests consume these as the input PSF; the column doc still says "original image psf."
B
Martin/ Lucy
Confirm null-test consumers tolerate reconvolved-PSF columns, or restore the raw pre-metacal PSF fit. Fix the doc either way.
+
737
Zero-pixel guard loosened any→all. v1 skipped an epoch if any pixel was exactly 0; now only if the whole stamp is 0. Partial CCD-edge stamps now enter the fit.
B
Martin/ Lucy
One-liner: intended loosening, or should partial-zero still be excluded?
+
1068
PSF fit reuses the galaxy joint prior + galaxy FLUX_AUTO as the PSF flux guess. Mirrors the upstream example, but a galaxy-flux guess for the PSF is unusual. Lower confidence.
B
Lucy/ Martin
Sanity-check PSF-fit convergence; confirm or adjust the guess.
+
415 / 676
r50 columns aren't half-light radii.r50 stores pars[4] = T (≈2σ², arcsec²) — the same value already under ['T']; r50_PSFo = sqrt(T/2) = the Gaussian σ. True r50 = 1.1774·σ. sp_validation consumes these size columns.
B
Cail + Martin/ Lucy
Decide with the sp_validation lens: rename to honest T/sigma, or document r50 as a size proxy. Mechanical once decided.
+
+
254 / 257 / 464
CHECK_EXISTING_DIR resume is dead._check_existing_dir is set but never read in process(); get_last_id is defined but never called. A configured resume silently reprocesses from scratch and overwrites. (The adjacent # MKDEBUG check whether still used sits on _f_wcs_path, which is used — that marker can just go.)
C
Martin
Decide rewire vs remove. If remove (option + get_last_id), it folds into the Bucket-A commit.
-
Code-review pass (multi-agent, 6 angles) — what it caught
-
CI is green, so none of these break the suite — but the suite is shallow on these paths. Full list + line refs in draft-review.md.
-
Confirmed:
-
-
Dead-on-arrival script:centroid_bias.py:37 imports get_guess, which this PR deletes → ImportError. A validation script shipped in the same PR can't run.
-
Reproducibility break:prepare_ngmix_weights uses global np.random.randn for the noise image while the module deliberately switched to a seeded self._rng → same tile+seed gives different shears.
-
PSF column semantics:*_psfo (g_PSFo/T_PSFo/r50_PSFo) now carry the metacal-reconvolved PSF (read from obsdict['noshear'].psf.meta), not the original PSFEx/MCCD PSF the column doc claims and PSF-systematics tests expect.
-
Dropped feature:CHECK_EXISTING_DIR resume is gone from process() — a configured resume silently reprocesses from scratch.
-
Runner contract (altitude): 4 runners moved WCS/exp-numbers to positional input_file_list[i] but didn't update their @module_runner decorators (only ngmix_runner did) — contract now lives in per-config FILE_PATTERN ordering; shipped configs OK, hand-written ones silently mis-map.
-
Units:r50/r50_err store pars[4] = T (arcsec²), and r50_PSFo = sqrt(T/2) = Gaussian σ — not half-light radii. sp_validation consumes these.
-
-
Plausible — need Martin/Lucy's intent (raise as questions): weight-map normalization changed (looks like double inverse-variance + flux-contaminated noise estimate); per-type PSF columns collapsed to one value; galaxy zero-pixel skip narrowed any→all (edge stamps now enter the fit); PSF fit uses the galaxy prior + galaxy flux guess.
-
Cleanups: non-f error string + debug prints in save_results; dead sextractor_e1e2, unreachable print, MKDEBUG markers; hardcoded 51*51; fitting.py is a stray upstream ngmix example; make_data/get_prior duplicated across the validation scripts vs simulate.py. testing/simulate.py + the $SP_EXP plumbing themselves are clean 👍.
+
Tally: 5 cut-and-dry · 5 decisions (2 of them genuinely move downstream science) · 1 resume.
+
+
+
+
+
+
§ Recommended sequence
+
How to get #741 mergeable.
+
+
+
Land the cleanups (Bucket A). One small commit on #741: drop the
+ unreachable print and dead sextractor_e1e2, swap 51*51
+ for .size, fix the four runner decorators, drop/trim fitting.py, and
+ strip the in-scope MKDEBUG. Pure mechanical, keeps CI green, and narrows the
+ review surface to the science. (Cail's offer to push these stands — held this round, which
+ is report-only.)
+
+
Resume call (Bucket C). Martin: rewire CHECK_EXISTING_DIR /
+ get_last_id, or remove them. If remove, it folds into the step-1 commit.
+
+
Science confirmations (Bucket B). Martin/Lucy weigh in. Three are light —
+ a one-line "intended" or a small fix each (zero-pixel guard, PSF prior/flux guess, r50
+ naming). Two genuinely change the meaning of output columns and deserve a real answer:
+
*_psfo reconvolved (1045) — the ρ-stat / null tests read these as the input PSF.
+
+
+
+
Reconcile r50/T with sp_validation (Cail) — rename or document, so the
+ downstream consumer and the column name agree.
+
+
Merge #741 — Cail/Martin's gesture, never the worker's. Afterward, delete
+ the stale fork branch behind the closed #740 so #741 is unambiguously the source of truth.
+
+
+
The honest read on "mergeable today": CI is green and nothing here breaks the
+ suite, so #741 can merge now. But two findings — weight-norm and *_psfo —
+ silently change the semantics of columns that sp_validation and the null tests consume. Those are
+ the only two I'd treat as genuine merge-gates: merge-then-iterate is fine for the rest, but those
+ two should be a conscious accept, not a default one. Everything else can land before or
+ after merge without risk.
+
+
Housekeeping still open (from
+ the part-2 summary, none blocking): no closes # link (overlaps Axel's #725 centroid
+ work — cross-ref or supersede?); no docs update; the CLAUDE.md "don't modernize
+ ngmix" note is now stale; the pyproject ngmix pin should land on a tagged
+ esheldon/ngmix release rather than a fork branch.
§ Open questions
-
For Cail, on the call.
+
For Cail.
-
Centroid fix regime. Where does the bias appear? Is the author's before/after captured anywhere we can cite?
-
CI / PR plumbing. Done: mirror PR #741 (origin ngmix_v2.0) with develop merged in is green on the full suite. Decide: keep #741 as the PR of record and close #740, or hand the merged branch back to Martin? And tell Martin future PRs go directly on CosmoStat, not from a fork (fork PRs don't trigger CI).
-
Which findings to post, and as line comments vs a PR-level summary — the 4 confirmed correctness items are clear; the 4 plausible ones are better as questions to Martin/Lucy.
-
r50 columns. Confirm Martin/Lucy's intent for the size-column naming before sp_validation consumes them.
-
Voice + scope. What gets posted (full review vs a lighter "looks good, two questions"), in your voice or signed "Claude on behalf of Cail"? Push cleanups as suggestions, or just request them?
-
#725. Does this PR supersede Axel's centroid-shift PR?
+
Push the Bucket-A cleanups now? The offer stands; this round is
+ report-only, so they're held. Push as one commit on #741, or wait for Martin to greenlight the
+ batch?
+
Merge-gates. I read weight-norm (949) and *_psfo (1045) as
+ the only two genuine gates; the other nine can merge-then-iterate. Agree, or hold more back?
+
r50/T. Rename to honest columns (churns sp_validation) or document the
+ proxy (least churn, but the name keeps misdescribing)? Your call as the downstream consumer.
+
Fork cleanup. Delete martinkilbinger:ngmix_v2.0 after merge,
+ and confirm with Martin that future PRs go on a CosmoStat branch (fork PRs don't trigger CI —
+ that's why #741 exists)?
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 84539c4e1..405599fea 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -7,15 +7,15 @@ tags:
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
-closed-at: 2026-06-03T16:28:05.521094506+02:00
-outcome: 'Review of PR #740 (ngmix v2.0) delivered as a coherent two-part review on CI-mirror #741 (part 1: fixed+tested items; part 2: empirical verification + 11 line-anchored findings + methodology questions for Martin/Lucy). Verified empirically on candide against real ngmix 2.4.0: API correct, metacal recovers m=+2e-4 (consistent w/ zero), centroid fix benign (its bias lives in old ngmix-1.x path, can''t repro on current code — expected). Key findings: weight-norm change, *_psfo now reconvolved-PSF + per-type collapse, zero-pixel guard any->all, CHECK_EXISTING_DIR resume dropped, r50/T mislabel, runner decorator contracts unupdated for 4 runners. Martin to go through and merge; offer to push cut-and-dry fixes stands. Merge is Cail''s/Martin''s gesture.'
+closed-at: 2026-06-05T01:59:46.865336741+02:00
+outcome: 'Round 2 (next-steps) delivered. Re-reviewed against the current #741 head: no code has changed since the part-2 review, so all 11 findings stand exactly as anchored. Martin consolidated onto #741 (closed the fork PR #740), CI green + mergeable, but engaged only to ack the RNG fix — none of the 11 answered yet. Triaged into 5 cut-and-dry / 5 decisions / 1 resume; flagged weight-norm (ngmix.py:949) + *_psfo reconvolved-PSF (1045) as the only two genuine merge-gates (they silently change columns sp_validation/null-tests consume). report.html (triage table + recommended merge sequence) in fiber dir; summary comment posted to #741 (issuecomment-4626968551). REPORT ONLY — no commits pushed this round. Open for Cail: push the Bucket-A cleanups now (offer stands)? confirm the 2 merge-gates? r50/T rename-vs-document? Merge stays Cail''s/Martin''s gesture.'
horizon: now
shuttle:
+ agent: claude-opus
enabled: true
- kind: oneshot
host: candide
+ kind: oneshot
project_dir: /automnt/n17data/cdaley/unions/shapepipe
- agent: claude-opus
---
# Review + work: ngmix v2.0 (PR #740)
@@ -59,3 +59,15 @@ The diff is dominated by `src/shapepipe/modules/ngmix_package/ngmix.py` (~1252 l
- **@lbaumo = Lucy Baumont** — her ngmix classes/interface are central; her wrapper cleanup came out of her visit (see [[ngmix-update]]).
- **Prior art / related:** [[ngmix-update]] (the future-intent this realizes), [[prs-in-flight]] (the PR tracker — add #740 and its disposition), [[shapepipe]] (the Martin collaboration root). The repo `CLAUDE.md` carries build/test/run conventions — read it.
- **Authorship:** GitHub comments in Cail's voice, signed "— Claude on behalf of Cail" unless Cail asks otherwise on the call.
+
+## Round 2 (2026-06-05): next-steps report from Martin's responses — SUPERSEDES the merge-comment scope for THIS round
+
+**Cail's ask (night of 2026-06-05):** read Martin's comments/responses to the standing two-part review (now on #741), do a *further code-review pass*, and produce a **report of recommended next steps** given where Martin has landed.
+
+**Desired State (round 2):**
+- Read every Martin (`martinkilbinger`) response across **#740** and **#741** — inline replies, review threads, and any new commits/force-pushes since the part-2 review (2026-06-03). Note the `# MKDEBUG` markers still in the code. (As of dispatch, Martin's engagement is light: one COMMENTED review + one inline reply at `ngmix.py:951` "Good. This probably was previous code before we changed to a random seed per tile." Verify whether more has landed.)
+- **Further code-review pass** over the ngmix module + the 4 runners whose decorator contracts part-2 flagged (`sextractor_runner` et al.): for each of the 11 line-anchored findings, determine whether Martin's v2.0 now addresses it or it still stands.
+- Produce a **next-steps report**: `report.html` in the fiber dir **and** a posted PR summary comment on #741 (Cail's voice / "— Claude on behalf of Cail"). For each open item: `{resolved by Martin | still open | needs a decision}`, owner (`Cail / Martin / Lucy`), and the **recommended sequence to get #740/#741 mergeable**.
+- **REPORT ONLY — do NOT push new commits to the ngmix branch this round.** Pure analysis + PR comment. Keeps it collision-free while other shapepipe/pure_eb work is in flight.
+
+**Exit:** autonomous — produce the report + PR comment, then close for Cail's review.
From e91b3de13b992b8e21d3c9881393f584f413d774 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Fri, 5 Jun 2026 19:53:03 +0200
Subject: [PATCH 15/29] =?UTF-8?q?felt:=20review-ngmix-v2-pr740=20round=203?=
=?UTF-8?q?=20=E2=80=94=20Martin=20responded=20to=206/11;=20verified=20r50?=
=?UTF-8?q?/T=20bug?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Martin's morning pass (06-05): greenlit 254 (remove resume) + 766
(configurable stamp size); 737 any->all intentional; 949 -> issue #604;
opened r50/T naming, poked Lucy. Ran his explicit check: pars[4]=T=2sigma^2
confirmed, galaxy r50 stores T (area), PSF r50 stores sigma -- neither is
the half-light radius 1.1774*sigma. *_psfo (1045) now the lone unanswered
merge-gate. report.html + outcome + history refreshed. Analysis only, no
code pushed. Held interactive for Cail.
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/report.html | 208 ++++++++++--------
.../review-ngmix-v2-pr740.md | 30 +--
2 files changed, 138 insertions(+), 100 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/report.html b/.felt/review-ngmix-v2-pr740/report.html
index 14e2259c5..3004d77f9 100644
--- a/.felt/review-ngmix-v2-pr740/report.html
+++ b/.felt/review-ngmix-v2-pr740/report.html
@@ -60,50 +60,77 @@
·review-ngmix-v2-pr740·
- round 2 — next steps
+ round 3 · martin responded
Ngmix v2.0: where it stands & how to land it.
The two-part review is delivered, CI is green, and Martin has consolidated the
- work onto PR #741 — closing the fork PR #740. But the eleven line-level findings
- are still unanswered, and no code has changed since the review, so all eleven
- stand exactly as written. #741 is mechanically mergeable today; the real decision is which
- findings to land before merge and which can merge-then-iterate. This report triages them and
- proposes a sequence.
+ work onto PR #741 — closing the fork PR #740. This morning (06-05)
+ Martin replied to six of the eleven findings: he greenlit two cleanups, explained two
+ changes as intentional, pointed the weight-map concern at a long-standing issue, and poked Lucy
+ on the size-column naming — explicitly asking us to check the r50/T
+ arithmetic. I ran that check and confirmed the bug. Five findings — including
+ the second merge-gate, *_psfo — still have no response. This report carries the
+ updated dispositions and the sharpened sequence.
§ Current state
-
#741 is now the PR of record — green, mergeable, waiting on intent.
+
#741 is the PR of record — green, mergeable, and Martin has started answering.
-
Topology shifted. Martin closed #740 (his fork branch
- martinkilbinger:ngmix_v2.0) on 06-03, ~15 min after the part-2 review landed,
- and consolidated onto #741 — the CI mirror on
- CosmoStat:ngmix_v2.0. So #741 is the canonical PR going forward:
+
Topology. Martin closed #740 (his fork branch
+ martinkilbinger:ngmix_v2.0) on 06-03 and consolidated onto #741 —
+ the CI mirror on CosmoStat:ngmix_v2.0. #741 is the canonical PR:
OPEN, MERGEABLE,
- CI green (build-test-publish pass, full suite). The branch
- head carries the three part-1 fixes (seeded-RNG thread, scripts-import smoke, v1
- centroid_bias.py deletion) and the merge of develop. There have been
- no commits since the part-2 review.
-
-
Martin's engagement is light. He closed #740, posted a (body-less)
- COMMENTED review on #741, and left a single inline reply — on the
- reproducibility fix, not one of the open findings:
-
"Good. This probably was previous code before we changed to a random seed per
- tile." — on the seeded-RNG threading
- (ngmix.py:951)
-
That acknowledges the one outright bug we'd already fixed. None of the eleven part-2
- findings have a response yet — consistent with his "just post the review and I'll go
- through it" signal. So the ball is split two ways: the cut-and-dry items wait on
- someone to push them (Cail's offer stands), and the science questions wait on
- Martin/Lucy's intent.
-
-
Re-review verdict. I re-read the ngmix module and the four runners against
- the current branch head. Every finding is still live at the line it was anchored to — nothing
- was silently addressed, nothing drifted. The triage below is therefore a clean carry-forward of
- the eleven, sorted by what unblocks them.
+ CI green (build-test-publish pass, full suite). It carries
+ the three part-1 fixes (seeded-RNG thread, scripts-import smoke, v1
+ centroid_bias.py deletion) and the develop merge. No code has
+ changed since the part-2 review — every finding still sits at its anchored line.
+
+
Martin engaged this morning (06-05, 07:14–07:33Z) — six inline replies across
+ the findings, his first real pass on the science:
+
+
Greenlit two cleanups.254 resume:
+ "a hack to resume interrupted ngmix runs… can be removed now, and if needed implemented
+ better within the module."766 hardcoded 51*51:
+ "yes would be good to not hard-code this", pointing at the existing
+ STAMP_SIZE = 51 / VIGNET_SIZE = 51 config convention.
+
Explained one change as intentional.737 zero-pixel guard:
+ "the any == 0 test had pass close to no postage stamps. I thought this was too
+ restrictive." So any→all was deliberate (loosen, not tighten).
+
Pointed the weight-map concern at a standing issue.949:
+ links #604 — Lucy's 2023
+ "weight handling" issue. That thread establishes the known problem: megapipe weights
+ are 0/1 masks, ngmix needs inverse-variance, and the proper fix is a real RMS map from
+ SExtractor BACKGROUND_RMS (or THELI weights), not a per-stamp
+ sigma_mad estimate. Context, not a direct answer — see the triage note.
+
Opened the size-naming question and poked Lucy.415:
+ "the intent is to resolve the confusion about 'T'… We need to check that pars[4]
+ is indeed 'T', and that r50 should be 1.1774·σ and not just
+ σ. Would be good to not hard-code these transformations — functions exist in
+ sp_validation:galaxy.py, we'd move them to cs_util." + "Poking
+ @lbaumo".
+
+
+
I ran Martin's check. The model is gauss (both PSF and galaxy),
+ and the code itself settles the question: line 911 builds
+ galsim.Gaussian(sigma=np.sqrt(pars[4]/2)), which is only valid if
+ pars[4] = T = 2σ². So yes, pars[4] is T — and the
+ r50 columns are worse off than the question implies:
+
galaxy r50 (l.415) = pars[4] = T = 2σ² — an area, arcsec²
+ PSF r50_PSFo (l.676) = sqrt(T/2) = σ — the std-dev, arcsec
+ true half-light radius = 1.1774·σ — neither column equals this
+
The two r50 columns are on different scales (T vs σ) and neither
+ is a half-light radius. Martin's instinct was right; the fix is a single honest
+ transformation, ideally the one moved into cs_util.
+
+
Still unanswered (5):293 dead print, 1140
+ sextractor_e1e2, the 4 runner decorators, fitting.py,
+ 1068 PSF prior/flux — and crucially 1045
+ *_psfo reconvolved-PSF, the second merge-gate, which is the one open item
+ that genuinely changes what downstream null-tests read.
@@ -112,8 +139,8 @@
#741 is now the PR of record — green, mergeable, waiting on intent.
§ Triage
The eleven findings, by what unblocks them.
-
All eleven are open (none resolved by Martin). The useful cut isn't
- open-vs-resolved — it's the kind of action each needs:
+
Martin has now spoken to six; five are still open. The ✓ column marks his
+ morning disposition. The action cut is still by kind:
A · cut-and-dry mechanical, no science — push as one cleanup
commit. B · decision needs Martin/Lucy intent; some move
@@ -121,24 +148,24 @@
The eleven findings, by what unblocks them.
rewire vs remove.
-
Loc
Finding
Kind
Owner
Action
+
Loc
Finding
Kind
Martin ✓
Action
-
293
Unreachable print after return in the megapipe-rotate branch.
A
any
Delete the line.
-
1140
sextractor_e1e2 is unused and its docstring is copy-pasted from another function.
A
any
Drop it, or fix the docstring if kept.
-
766
Masked-fraction cut divides by a literal 51 * 51; breaks when stamp size becomes configurable (on the roadmap).
A
any
v_flag_tmp.size instead of the literal.
-
sextractor_ runner:74 +3
WCS moved to a positional input_file_list[-1] in 4 runners (sextractor, read_ext_sexcat, psfex_interp, vignetmaker) but their @module_runner decorators weren't updated (only ngmix_runner was). Shipped configs are correct, so not a live bug — but the decorator is the input contract.
A
any
Declare the WCS input in each decorator's file_pattern/file_ext.
-
fitting.py
Near-verbatim copy of esheldon's ngmix example; not wired to the pipeline, and duplicates the new testing/simulate.py helpers it ships alongside.
A
Martin/ Lucy
Confirm it's not meant to ship → drop, or trim to import the canonical helpers.
+
293
Unreachable print after return in the megapipe-rotate branch.
A
—
Delete the line.
+
1140
sextractor_e1e2 is unused and its docstring is copy-pasted from another function.
A
—
Drop it, or fix the docstring if kept.
+
766
Masked-fraction cut divides by a literal 51 * 51; breaks when stamp size becomes configurable (on the roadmap).
A
agreed — make configurable; use the STAMP_SIZE/VIGNET_SIZE config convention.
Add STAMP_SIZE config (default 51); divide by it, not the literal.
+
sextractor_ runner:74 +3
WCS moved to a positional input_file_list[-1] in 4 runners (sextractor, read_ext_sexcat, psfex_interp, vignetmaker) but their @module_runner decorators weren't updated (only ngmix_runner was). Shipped configs are correct, so not a live bug — but the decorator is the input contract.
A
—
Declare the WCS input in each decorator's file_pattern/file_ext.
+
fitting.py
Near-verbatim copy of esheldon's ngmix example; not wired to the pipeline, and duplicates the new testing/simulate.py helpers it ships alongside.
A
—
Confirm it's not meant to ship → drop, or trim to import the canonical helpers.
-
949
Weight-map normalization changed vs v1.sigma_mad(gal) is taken over the whole (flux-contaminated) stamp where v1 used the windowed, object-free get_noise (still defined, now unused); and the per-pixel weight map is then multiplied by 1/sig_noise² — reads as double inverse-variance. Shifts χ²-weighting, reported errors/s2n, and PSF-averaging weights.
B
Martin/ Lucy
Highest-impact. Confirm intended, or restore the windowed estimate / single weighting.
-
1045
*_psfo now carries the metacal-reconvolved PSF.g/T/r50_PSFo read from obsdict['noshear'].psf.meta (reconvolved), and the per-type PSF columns collapse to one value. PSF-leakage / ρ-stat / null tests consume these as the input PSF; the column doc still says "original image psf."
B
Martin/ Lucy
Confirm null-test consumers tolerate reconvolved-PSF columns, or restore the raw pre-metacal PSF fit. Fix the doc either way.
-
737
Zero-pixel guard loosened any→all. v1 skipped an epoch if any pixel was exactly 0; now only if the whole stamp is 0. Partial CCD-edge stamps now enter the fit.
B
Martin/ Lucy
One-liner: intended loosening, or should partial-zero still be excluded?
-
1068
PSF fit reuses the galaxy joint prior + galaxy FLUX_AUTO as the PSF flux guess. Mirrors the upstream example, but a galaxy-flux guess for the PSF is unusual. Lower confidence.
B
Lucy/ Martin
Sanity-check PSF-fit convergence; confirm or adjust the guess.
-
415 / 676
r50 columns aren't half-light radii.r50 stores pars[4] = T (≈2σ², arcsec²) — the same value already under ['T']; r50_PSFo = sqrt(T/2) = the Gaussian σ. True r50 = 1.1774·σ. sp_validation consumes these size columns.
B
Cail + Martin/ Lucy
Decide with the sp_validation lens: rename to honest T/sigma, or document r50 as a size proxy. Mechanical once decided.
+
1045
*_psfo now carries the metacal-reconvolved PSF.g/T/r50_PSFo read from obsdict['noshear'].psf.meta (reconvolved), and the per-type PSF columns collapse to one value. PSF-leakage / ρ-stat / null tests consume these as the input PSF; the column doc still says "original image psf."
B
— unanswered 2nd merge-gate
Confirm null-test consumers tolerate reconvolved-PSF columns, or restore the raw pre-metacal PSF fit. Fix the doc either way.
+
949
Weight-map normalization changed vs v1.sigma_mad(gal) is taken over the whole (flux-contaminated) stamp where v1 used the windowed, object-free get_noise (still defined, now unused); and the per-pixel weight map is then multiplied by 1/sig_noise² — reads as double inverse-variance. Shifts χ²-weighting, reported errors/s2n, and PSF-averaging weights.
B
→ #604 — known weight-handling issue (0/1 masks → need real inv-var). Contextualizes, doesn't resolve the v1-regression sub-finding.
Distinguish the two: (a) the long-game proper-variance fix lives in #604; (b) the whole-stamp sigma_mad + double 1/sig² vs v1's windowed estimate is a this-PR question.
+
415 / 676
r50 columns aren't half-light radii. Verified: pars[4]is T (l.911 uses Gaussian(σ=sqrt(pars[4]/2))). Galaxy r50 = T = 2σ² (area); PSF r50_PSFo = sqrt(T/2) = σ. Different scales; neither is the true r50 = 1.1774·σ. sp_validation consumes these size columns.
B
verified + poked Lucy. Martin wants the transform un-hard-coded (move sp_validation:galaxy.py → cs_util).
One honest transform in cs_util: galaxy 1.1774·sqrt(pars[4]/2), PSF 1.1774·sqrt(T/2) — or rename to T/sigma. Cail's call as consumer.
+
737
Zero-pixel guard loosened any→all. v1 skipped an epoch if any pixel was exactly 0; now only if the whole stamp is 0. Partial CCD-edge stamps now enter the fit.
B
intended — "any==0 passed close to no postage stamps, too restrictive" (commit d01503d0).
Resolved as deliberate. Residual: is a fractional-zero threshold wanted, or is all==0 fine? Low priority.
+
1068
PSF fit reuses the galaxy joint prior + galaxy FLUX_AUTO as the PSF flux guess. Mirrors the upstream example, but a galaxy-flux guess for the PSF is unusual. Lower confidence.
B
—
Sanity-check PSF-fit convergence; confirm or adjust the guess.
-
254 / 257 / 464
CHECK_EXISTING_DIR resume is dead._check_existing_dir is set but never read in process(); get_last_id is defined but never called. A configured resume silently reprocesses from scratch and overwrites. (The adjacent # MKDEBUG check whether still used sits on _f_wcs_path, which is used — that marker can just go.)
C
Martin
Decide rewire vs remove. If remove (option + get_last_id), it folds into the Bucket-A commit.
+
254 / 257 / 464
CHECK_EXISTING_DIR resume is dead._check_existing_dir is set but never read in process(); get_last_id is defined but never called. A configured resume silently reprocesses from scratch and overwrites.
C
remove — "a hack to resume interrupted runs… can be removed now, implemented better in future if needed."
Remove the option + get_last_id + the MKDEBUG marker. Folds into the Bucket-A commit.
-
Tally: 5 cut-and-dry · 5 decisions (2 of them genuinely move downstream science) · 1 resume.
+
Tally: 6 answered (2 greenlit cleanups, 1 verified, 1 intended, 1 → #604, 1 RNG ack) · 5 still open, of which 1045 *_psfo is the lone unanswered merge-gate.
@@ -148,38 +175,41 @@
The eleven findings, by what unblocks them.
How to get #741 mergeable.
-
Land the cleanups (Bucket A). One small commit on #741: drop the
- unreachable print and dead sextractor_e1e2, swap 51*51
- for .size, fix the four runner decorators, drop/trim fitting.py, and
- strip the in-scope MKDEBUG. Pure mechanical, keeps CI green, and narrows the
- review surface to the science. (Cail's offer to push these stands — held this round, which
- is report-only.)
-
-
Resume call (Bucket C). Martin: rewire CHECK_EXISTING_DIR /
- get_last_id, or remove them. If remove, it folds into the step-1 commit.
-
-
Science confirmations (Bucket B). Martin/Lucy weigh in. Three are light —
- a one-line "intended" or a small fix each (zero-pixel guard, PSF prior/flux guess, r50
- naming). Two genuinely change the meaning of output columns and deserve a real answer:
-
*_psfo reconvolved (1045) — the ρ-stat / null tests read these as the input PSF.
-
-
-
-
Reconcile r50/T with sp_validation (Cail) — rename or document, so the
- downstream consumer and the column name agree.
-
-
Merge #741 — Cail/Martin's gesture, never the worker's. Afterward, delete
- the stale fork branch behind the closed #740 so #741 is unambiguously the source of truth.
+
Land the cleanups + Martin's greenlit items (Bucket A+C). One commit on
+ #741: drop the unreachable print (293) and dead sextractor_e1e2
+ (1140), fix the four runner decorators, drop/trim fitting.py, remove the
+ CHECK_EXISTING_DIR resume + get_last_id (254, Martin: remove)
+ and the MKDEBUG marker, and add the STAMP_SIZE config
+ replacing the literal 51*51 (766, Martin: agreed). Pure mechanical, keeps CI green,
+ narrows the review surface to the two real science questions. (Cail's push offer stands.)
+
+
Resolve the size columns (415/676). Verified bug — the fix is one honest
+ transform. Either land it in cs_util as Martin suggested (galaxy
+ 1.1774·sqrt(pars[4]/2), PSF 1.1774·sqrt(T/2)) or rename the columns to
+ T/sigma. Cail decides as the sp_validation consumer; Lucy's been poked.
+
+
The lone unanswered merge-gate: *_psfo (1045). Martin/Lucy
+ still need to confirm that the null-test / ρ-stat consumers tolerate the metacal-reconvolved
+ PSF in g/T/r50_PSFo, or restore the raw pre-metacal fit. Doc fix either way. This
+ is the one open item that changes what downstream reads.
+
+
Weight-norm (949) — split the thread. The proper-variance redesign is
+ tracked in #604 (post-merge,
+ long game). But ask separately whether the whole-stamp sigma_mad + double
+ 1/sig² is an intended v2.0 change or a regression from v1's windowed estimate — a
+ this-PR question #604 doesn't cover.
+
+
Loose ends: PSF prior/flux guess (1068) sanity-check; zero-pixel guard
+ (737) resolved as intended. Merge #741 — Cail/Martin's gesture, never the
+ worker's — then delete the stale fork branch behind closed #740.
The honest read on "mergeable today": CI is green and nothing here breaks the
- suite, so #741 can merge now. But two findings — weight-norm and *_psfo —
- silently change the semantics of columns that sp_validation and the null tests consume. Those are
- the only two I'd treat as genuine merge-gates: merge-then-iterate is fine for the rest, but those
- two should be a conscious accept, not a default one. Everything else can land before or
- after merge without risk.
+ suite. With Martin's morning pass, the merge-gate count drops to one clear item —
+ *_psfo (1045), which silently changes the PSF columns sp_validation / null-tests
+ consume — plus the size-column fix (415/676), now verified and mechanical once the convention is
+ chosen. Weight-norm (949) is reframed: the design fix is #604's (post-merge), leaving only the
+ narrower "is the sigma_mad change a regression?" question. Everything else can merge-then-iterate.
Housekeeping still open (from
the part-2 summary, none blocking): no closes # link (overlaps Axel's #725 centroid
@@ -194,16 +224,22 @@
How to get #741 mergeable.
§ Open questions
For Cail.
-
Push the Bucket-A cleanups now? The offer stands; this round is
- report-only, so they're held. Push as one commit on #741, or wait for Martin to greenlight the
- batch?
-
Merge-gates. I read weight-norm (949) and *_psfo (1045) as
- the only two genuine gates; the other nine can merge-then-iterate. Agree, or hold more back?
-
r50/T. Rename to honest columns (churns sp_validation) or document the
- proxy (least churn, but the name keeps misdescribing)? Your call as the downstream consumer.
-
Fork cleanup. Delete martinkilbinger:ngmix_v2.0 after merge,
- and confirm with Martin that future PRs go on a CosmoStat branch (fork PRs don't trigger CI —
- that's why #741 exists)?
+
Push Bucket A now? Martin greenlit 254 (remove) and 766 (configurable);
+ the rest are mechanical. One commit on #741 clears five-plus findings and narrows the review to
+ the science. Push it, or wait for an explicit "go the whole batch"? (My read: push — it's
+ what Martin's replies invite.)
+
r50/T — your call as consumer. Honest transform in cs_util
+ (1.1774·sqrt(T/2), what Martin leans toward) vs rename columns to
+ T/sigma. The transform keeps the r50 name meaningful;
+ the rename is blunter but unambiguous. Which churns sp_validation less for you?
+
*_psfo (1045) — the open gate. Do you already know whether
+ the null-tests can read a reconvolved PSF, or is this a real question for Martin/Lucy? If the
+ former, we can answer it in the thread now.
+
Weight-norm framing. Agree that #604 owns the long-game fix and we only
+ press Martin on the narrower sigma_mad-vs-v1 regression question?
+
How do you want to reply? Inline thread replies to Martin's six (in your
+ voice / "— Claude on behalf of Cail"), a summary comment, or talk it through here first and I
+ draft once we've landed the calls?
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 405599fea..d0c38cc78 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -1,23 +1,25 @@
---
-name: 'Review + work: ngmix v2.0 (PR #740)'
-status: closed
-tags:
- - constitution
- - shapepipe
- - ngmix
- - review
created-at: 2026-06-01T11:50:17.967872934+02:00
-closed-at: 2026-06-05T01:59:46.865336741+02:00
-outcome: 'Round 2 (next-steps) delivered. Re-reviewed against the current #741 head: no code has changed since the part-2 review, so all 11 findings stand exactly as anchored. Martin consolidated onto #741 (closed the fork PR #740), CI green + mergeable, but engaged only to ack the RNG fix — none of the 11 answered yet. Triaged into 5 cut-and-dry / 5 decisions / 1 resume; flagged weight-norm (ngmix.py:949) + *_psfo reconvolved-PSF (1045) as the only two genuine merge-gates (they silently change columns sp_validation/null-tests consume). report.html (triage table + recommended merge sequence) in fiber dir; summary comment posted to #741 (issuecomment-4626968551). REPORT ONLY — no commits pushed this round. Open for Cail: push the Bucket-A cleanups now (offer stands)? confirm the 2 merge-gates? r50/T rename-vs-document? Merge stays Cail''s/Martin''s gesture.'
horizon: now
+name: "Review + work: ngmix v2.0 (PR #740)"
+outcome: |-
+ INTERACTIVE — awaiting Cail. Martin engaged this morning (06-05 07:14–07:33Z), replying to 6 of the 11 findings: greenlit removing the dead CHECK_EXISTING_DIR resume (254) and making 51*51 configurable via STAMP_SIZE (766); explained the any→all zero-pixel loosening as intentional (737, "any==0 passed ~no stamps"); pointed weight-norm (949) at issue #604 (the 2023 0/1-mask → inverse-variance design discussion — context, not a direct answer to the sigma_mad-vs-v1 sub-finding); and opened the r50/T size-naming question, poking @lbaumo. I RAN Martin's explicit check: confirmed pars[4]=T=2σ² (l.911 uses Gaussian(σ=sqrt(pars[4]/2))); galaxy r50 (l.415) stores T (an area), PSF r50_PSFo (l.676) = sqrt(T/2) = σ — different scales, neither is the half-light radius 1.1774σ. Bug verified. Still unanswered (5): 293, 1140, runners, fitting.py, 1068 — and *_psfo reconvolved-PSF (1045), now the LONE clear merge-gate. report.html refreshed (Martin-disposition column + verified r50/T metric + re-sequenced steps). REPORT/ANALYSIS ONLY — no commits pushed. Held for Cail to direct: push Bucket A? r50/T transform-vs-rename? how to reply to Martin's six? Merge stays Cail's/Martin's gesture.
shuttle:
- agent: claude-opus
- enabled: true
- host: candide
- kind: oneshot
- project_dir: /automnt/n17data/cdaley/unions/shapepipe
+ agent: claude-opus
+ enabled: true
+ host: candide
+ interactive: true
+ kind: oneshot
+ project_dir: /automnt/n17data/cdaley/unions/shapepipe
+status: active
+tags:
+ - constitution
+ - shapepipe
+ - ngmix
+ - review
---
+
# Review + work: ngmix v2.0 (PR #740)
Martin requested Cail's review on **[CosmoStat/shapepipe #740 "Ngmix v2.0"](https://github.com/CosmoStat/shapepipe/pull/740)** (head `ngmix_v2.0` → base `develop`, +3894/−3410 across 67 files). This is the PR that **realizes [[ngmix-update]]**: replace Axel's `stable_version` ngmix fork with **upstream ngmix 2.4.0** and adopt **@lbaumo's (Lucy Baumont's) new ngmix classes + interface**, reconciling the cleaned-up wrapper from her visit. Track it in [[prs-in-flight]]; it sits in the broader Martin collaboration under [[shapepipe]].
From 0fb6e6cf2cbbe2f0626a38b65f91666b2c2a1c93 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Fri, 5 Jun 2026 22:01:36 +0200
Subject: [PATCH 16/29] =?UTF-8?q?felt:=20review-ngmix-v2-pr740=20=E2=80=94?=
=?UTF-8?q?=20workflow=20results=20on=20weights=20+=20size=20(2=20decision?=
=?UTF-8?q?-ready=20reports)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Workflow (6 investigators + 2 synth) on the two hard problems:
- WEIGHTS (#604+949): two coupled regressions in prepare_ngmix_weights
(dead get_noise -> whole-stamp sigma_mad; lost binarization -> double-
counts real ivar). Empirically confirmed (truth ivar 1e6 -> recovered
8.8e11). Rec: split — minimal v1-restore in #741 + SExtractor
BACKGROUND_RMS baseline as separate PR (closes #604).
- SIZE (r50/T): galaxy r50=T(area), PSF r50psf=sigma, neither=1.1774sigma;
UNIONS-3500 paper reports r_h as primary. Rec: transform-at-source +
cs_util converters; bonus — sp_validation T_to_fwhm dimensionally wrong.
Adds weights-report.md, size-report.md, deep-dive-report.html.
Co-Authored-By: Claude Opus 4.8
---
.../deep-dive-report.html | 199 ++++++++++++++++++
.../review-ngmix-v2-pr740.md | 23 +-
.felt/review-ngmix-v2-pr740/size-report.md | 94 +++++++++
.felt/review-ngmix-v2-pr740/weights-report.md | 136 ++++++++++++
4 files changed, 441 insertions(+), 11 deletions(-)
create mode 100644 .felt/review-ngmix-v2-pr740/deep-dive-report.html
create mode 100644 .felt/review-ngmix-v2-pr740/size-report.md
create mode 100644 .felt/review-ngmix-v2-pr740/weights-report.md
diff --git a/.felt/review-ngmix-v2-pr740/deep-dive-report.html b/.felt/review-ngmix-v2-pr740/deep-dive-report.html
new file mode 100644
index 000000000..267991b77
--- /dev/null
+++ b/.felt/review-ngmix-v2-pr740/deep-dive-report.html
@@ -0,0 +1,199 @@
+
+
+
+
+
+Deep dive — ngmix v2.0 weights & size
+
+
+
+
+
+
+
+
+
A fan-out of six investigators + two synthesizers chased the two science-shaped
+ findings from the #741 review to ground. Both came back high-confidence, with verified arithmetic
+ and — for the weight regression — an observed empirical confirmation. Each problem
+ now has a concrete recommendation and a clean scoping plan. Full reports:
+ weights-report.md and size-report.md in this fiber.
+
+
+
+
+
+
§ Problem A · weights / inverse-variance
+
v2.0 has two real, coupled weight-map regressions.
+
+
The live weight path is prepare_ngmix_weights (ngmix.py:871). Against
+ the v1 baseline (develop), v2.0 dropped two things:
+
+
+
Regr.
What changed
Effect
+
R1
Noise estimate regressed from the object-free windowed get_noise (still in the file at :826, now dead — zero callers) to whole-stamp sigma_mad(gal), contaminated by galaxy flux.
Size/flux-dependent mis-weighting of the χ² → biased g_err/T_err/s2n; the signature of a multiplicative shear bias m (cf. old-path m~-2.8e-2). Dominant today.
+
R2
Lost the binarization weight_map[weight_map != 0] = 1, so line 906 multiplies the raw weight by 1/sig_noise².
For 0/1 masks: per-epoch 1/Fscale² scale error. Latent hazard: feed a real inverse-variance map (THELI / BACKGROUND_RMS) and v2.0 double-counts the variance.
+
+
+
Observed, not inferred
+
Fed make_data's truth inverse-variance map (1/noise²) through the actual
+ function in the dev container:
+
+ noise=1e-3 → truth ivar 1e6 | recovered 8.81e11 | ratio 8.81e5 (correct = 1.0)
+ noise=1e-4 → truth ivar 1e8 | recovered 7.35e15 | ratio 7.35e7 (correct = 1.0)
+
+
The weight is off by ≈1/noise² — exactly the R2 double-count. The regression is a
+ clean red→green test target (make_data injects the oracle; no ngmix
+ fit needed). Today's only weight-path test asserts run(42)==run(42) — determinism,
+ never truth. Coverage gap.
+
+
+
recommendation — split into two PRs
+
In #741 (now): minimal v1-restore — reinstate the
+ binarization + call get_noise instead of sigma_mad(gal) — plus the
+ red→green unit test. Two lines; fixes a regression this PR introduced; restores robustness to
+ real ivar maps.
+
Separate PR (next): the in-house SExtractor
+ BACKGROUND_RMS baseline you steered toward (THELI is external). Mostly
+ config — the check-image & vignetmaker ME loops are already list-driven, so delivering the
+ RMS stamps is near-free; the earned work is rewriting prepare_ngmix_weights to use
+ per-pixel 1/RMS² with sigma_mad as fallback, + before/after validation.
+ Effort S–M, ~4–8 h. Closes #604.
+
+
+
+ Open questions for Martin (6)
+
+
Confirm SExtractor-BACKGROUND_RMS is the baseline, THELI a later add-on (non-blocking)?
+
RMS input optional (with sigma_mad/get_noise fallback) or required? (rec: optional)
+
#741 minimal fix: restore windowed get_noise (reintroduces the gal-guess machinery the PR may have deliberately stripped), or accept bare sigma_mad until #604? R1 is dominant, so it matters even short-term.
+
Does the RMS map feed anything beyond weight + noise-fill (e.g. background subtraction)? (rec: weight + noise-fill only)
+
Config-edit scope for #604: cfis only, or cfis + cfis_simu + canfar batch?
+
Merge bar for #604: smoke before/after on the example tile, or a full sim m/c calibration?
+
+
+
+
+
+
+
+
+
§ Problem B · r50 / T / σ size naming
+
The v2.0 “r50” columns aren’t half-light radii.
+
+
All five r50* columns are new in v2.0 (v1 wrote sizes only as
+ T). None is a true half-light radius:
+
+
+ galaxy r50 (l.409) = pars[4] = T = 2σ² — an area, bit-for-bit a copy of T
+ PSF r50psf (l.411) = √(T/2) = σ — a length, but off by ×1.1774
+ true half-light radius = 1.1774·σ = 1.1774·√(T/2) — no column equals this
+
+
+
So the same name root means an area on the galaxy side and a length
+ on the PSF side. Meanwhile the official catalogue paper (UNIONS-3500 WL I,
+ arXiv:2605.13549) reports the half-light radius r_h (= r50) as the primary
+ size for both galaxy and PSF, with the size cut written as the dimensionless ratio
+ 0.707 ≤ r_h/r_h,psf ≤ 3; T is given only as the derived DES area (Eq. 1).
+ The paper wants exactly the quantity these columns claim but fail to provide.
+
+
sp_validation reads none of the r50* columns (only the
+ dimensionless ratio T/Tpsf, which is robust) — so the mislabel harms nothing today,
+ but it's a live foot-gun for any future consumer and contradicts the paper.
+
+
+
recommendation — transform at the source
+
Make the ngmix writer emit honest columns: galaxy
+ r50 = 1.1774·√(T/2) with propagated error; PSF r50psf = existing σ ×
+ 1.1774. Keep all T* columns. Then every "r50" is a length meaning the same thing on
+ both sides, matching the paper. De-duplicate (T_psfo_ngmix≡Tpsf,
+ r50_psfo_ngmix≡r50psf).
+
Put the conversion web in cs_util once
+ (T_to_r50, r50_to_T, sigma_to_fwhm, corrected
+ T_to_fwhm) — the shared producer/consumer layer. Beats a pure rename (throws away
+ the paper's headline quantity) and a read-side-only fix (leaves the foot-gun on disk).
+
+
+
Bonus find: a real downstream bug in sp_validation
+
galaxy.py:T_to_fwhm is T/1.17741*2.355 — dimensionally wrong
+ (2.355/1.17741 = 2.000; it treats the area T as if it were σ). Its own
+ MKDEBUG comment already doubts it. It builds the fwhm_PSF regressor for the
+ scale-dependent PSF-leakage fit (run_object.py:510), so it distorts
+ the PSF-size axis and biases α(size). The correct form is
+ 2.355·√(T/2). A clean cs_util surface lets this be fixed and the trap retired.
+
+
+ Open questions for Lucy / Martin (6)
+
+
Standardize on r50-as-primary (per UNIONS-3500 I) while keeping T, or keep T as working column with r50 derived?
+
Drop or alias the duplicate columns — is anything outside sp_validation src/ (notebooks, older catalogues) bound to the _psfo_ngmix names?
+
cs_util as the home for the converter web — agreed? Naming?
+
Were in-flight scale-dependent leakage results produced with the buggy T_to_fwhm, and how much does α(size) move once fixed?
+
Galaxy r50_err: simple propagation, or from the full pars covariance?
+
Catalogue-format version bump / changelog for the r50* semantics change?
+
+
+
+
+
+
+
+
§ Where this leaves us
+
Two work-streams, both scoped.
+
+
Weights → a Codex shuttle + separate PR (the #604 BACKGROUND_RMS baseline), with the minimal v1-restore landing in #741 first. The science calls above are for Martin.
+
Size → transform-at-source in ngmix + a cs_util converter web; the sp_validation T_to_fwhm fix rides along. Touches the producer/shared/consumer stack — Lucy + Martin on the convention.
+
Still no Martin/Lucy reply drafted — that waits for Cail. These reports are the raw material for it.
+
+
+
+
+
+
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index d0c38cc78..585fba251 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -1,22 +1,23 @@
---
+id: 01KTCHWZX873VG28AA663A3ZQE
created-at: 2026-06-01T11:50:17.967872934+02:00
horizon: now
name: "Review + work: ngmix v2.0 (PR #740)"
outcome: |-
- INTERACTIVE — awaiting Cail. Martin engaged this morning (06-05 07:14–07:33Z), replying to 6 of the 11 findings: greenlit removing the dead CHECK_EXISTING_DIR resume (254) and making 51*51 configurable via STAMP_SIZE (766); explained the any→all zero-pixel loosening as intentional (737, "any==0 passed ~no stamps"); pointed weight-norm (949) at issue #604 (the 2023 0/1-mask → inverse-variance design discussion — context, not a direct answer to the sigma_mad-vs-v1 sub-finding); and opened the r50/T size-naming question, poking @lbaumo. I RAN Martin's explicit check: confirmed pars[4]=T=2σ² (l.911 uses Gaussian(σ=sqrt(pars[4]/2))); galaxy r50 (l.415) stores T (an area), PSF r50_PSFo (l.676) = sqrt(T/2) = σ — different scales, neither is the half-light radius 1.1774σ. Bug verified. Still unanswered (5): 293, 1140, runners, fitting.py, 1068 — and *_psfo reconvolved-PSF (1045), now the LONE clear merge-gate. report.html refreshed (Martin-disposition column + verified r50/T metric + re-sequenced steps). REPORT/ANALYSIS ONLY — no commits pushed. Held for Cail to direct: push Bucket A? r50/T transform-vs-rename? how to reply to Martin's six? Merge stays Cail's/Martin's gesture.
+ INTERACTIVE — live with Cail. Cleanups + fitting.py committed (bd60dc8e in worktree /tmp/pr740-wt, branch pr740-tmp, UNPUSHED): 293 print, 1140 sextractor_e1e2, 254 dead resume, 766 →v_flag_tmp.size, fitting.py deleted. Dev image built (/n17data/cdaley/containers/shapepipe-dev) + ngmix 2.4.0 installed → full suite + example pipeline now runnable. Ran a 6+2-agent WORKFLOW on the two hard problems → two decision-ready reports in fiber: weights-report.md, size-report.md, deep-dive-report.html (sent to Cail). WEIGHTS (#604+949): found TWO coupled regressions in prepare_ngmix_weights (ngmix.py:871) — R1 noise estimator regressed from object-free windowed get_noise (now DEAD at :826) to flux-contaminated whole-stamp sigma_mad; R2 lost the v1 binarization → double-counts a real inverse-variance map. EMPIRICALLY CONFIRMED (truth ivar 1e6 → recovered 8.8e11, ratio≈1/noise²). Clean red→green test target via make_data. Rec: SPLIT — minimal v1-restore (reinstate binarization + get_noise) + test in #741; SExtractor BACKGROUND_RMS baseline as a SEPARATE PR (Codex shuttle, ~4-8h, closes #604). SIZE (r50/T): galaxy r50=pars[4]=T (area), PSF r50psf=σ; neither is 1.1774σ; all 5 r50* cols new in v2.0. UNIONS-3500 WL I (arXiv:2605.13549) reports half-light radius r_h as PRIMARY. Rec: TRANSFORM at source (honest r50 in ngmix) + cs_util converter web; bonus find — sp_validation galaxy.py:T_to_fwhm is dimensionally wrong (feeds the scale-dependent PSF-leakage fit). HELD: runner decorators (test via example pipeline now that the dev image is up). DECISIONS FOR CAIL: push Bucket A? approve the weights 2-PR split + let me draft the Codex constitution? size transform-vs-rename + cs_util home? No Martin/Lucy reply drafted yet (Cail's instruction). Merge stays Cail's/Martin's gesture.
shuttle:
- agent: claude-opus
- enabled: true
- host: candide
- interactive: true
- kind: oneshot
- project_dir: /automnt/n17data/cdaley/unions/shapepipe
+ agent: claude-opus
+ enabled: true
+ host: candide
+ interactive: true
+ kind: oneshot
+ project_dir: /automnt/n17data/cdaley/unions/shapepipe
status: active
tags:
- - constitution
- - shapepipe
- - ngmix
- - review
+ - constitution
+ - shapepipe
+ - ngmix
+ - review
---
diff --git a/.felt/review-ngmix-v2-pr740/size-report.md b/.felt/review-ngmix-v2-pr740/size-report.md
new file mode 100644
index 000000000..72ac3501a
--- /dev/null
+++ b/.felt/review-ngmix-v2-pr740/size-report.md
@@ -0,0 +1,94 @@
+# ngmix v2.0 size-column naming: ground truth, official convention, recommendation
+
+**Scope:** the five new `r50*` size columns added in ngmix v2.0 (PR #741), what they actually contain, what the official UNIONS papers report, and how to make the producer (shapepipe) → shared (cs_util) → consumer (sp_validation) stack honest and consistent.
+
+**Verification:** code claims verified in the worktree; conversion arithmetic verified numerically; paper convention from arXiv:2605.13549 and arXiv:2204.04798; downstream consumption from the sp_validation source tree.
+
+**Headline:** the v2.0 `r50` columns are mislabeled — galaxy `r50` == the area `T`; PSF `r50psf` == `σ` (not `1.1774σ`). Cleanest fix: make the ngmix writer emit honest, correctly-converted columns at the source (true half-light radius `r50 = 1.1774·√(T/2)` for both galaxy and PSF, alongside the existing `T`), rather than a rename or a downstream-only transform. A *separate* load-bearing bug lives in sp_validation's `T_to_fwhm`, which a clean r50/σ surface lets us retire.
+
+---
+
+## 1. Code ground truth — what each size column actually holds
+
+In ngmix `gauss`, `pars = [cen1, cen2, g1, g2, T, flux]` and **`pars[4] = T = 2·σ²` is an AREA** (arcsec²). The half-light radius is `r50 = 1.1774·σ = 1.1774·√(T/2)`. The `σ = √(T/2)` mapping is confirmed twice in-file (`galsim.Gaussian(sigma=np.sqrt(guess[4]/2))`; `r50_psfo = np.sqrt(max(psf_res['T_psf'],0)/2)`).
+
+| Column (FITS) | Source | Value in σ | True r50 = 1.1774σ? | Status |
+|---|---|---|---|---|
+| `T`, `T_err` | `results[…]["T"]` | `2σ²` (area) | — | Correctly named area |
+| `r50`, `r50_err` | `pars[4]`, `pars_err[4]` | **`2σ²` (area)** | **No** | **MISLABELED: == `T`/`T_err` bit-for-bit. Neither √ nor ×1.1774 applied.** |
+| `Tpsf` | `T_PSFo` | `2σ_psf²` (area) | — | Correctly named area |
+| `T_psfo_ngmix`, `T_err_psfo_ngmix` | `T_PSFo`, `T_err_PSFo` | `2σ_psf²` | — | Correct; `T_psfo_ngmix` **duplicates** `Tpsf` |
+| `r50psf` | `r50_PSFo` = `√(T_psf/2)` | **`σ_psf` (length)** | **No — it's σ, off by 1.1774×** | Genuine length, missing the factor |
+| `r50_psfo_ngmix` | `r50_PSFo` | `σ_psf` | No — it's σ | **Duplicates `r50psf`** |
+| `r50_err_psfo_ngmix` | `T_psf_err/(2·r50_psfo)` | `σ_psf_err` | No — error on σ | Correct error-prop of σ; NaN when σ=0 |
+
+**Headline hazard:** the same name root `r50` means an **area** on the galaxy side and a **length** on the PSF side. A user ratioing galaxy `r50` against `r50psf` divides an area by a length. **Zero columns in the file are a true half-light radius** — every "r50" is either `T` (galaxy) or `σ` (PSF).
+
+**Arithmetic check (verified):** `√(2·ln2) = 1.17741`; `2.355/1.17741 = 2.000` (why the downstream `T_to_fwhm` is dimensionally wrong — §4).
+
+**v1 provenance:** `origin/develop` wrote sizes **only as `T`**. **All five `r50*` columns are NEW in v2.0**, and the galaxy two were wired to the area `pars[4]` — introduced as mislabeled duplicates.
+
+---
+
+## 2. Official convention + downstream consumption
+
+**UNIONS-3500 WL I — A Galaxy Shape Catalogue (arXiv:2605.13549), the current shape-catalogue paper:**
+- Reports the **half-light radius `r_h` (= ngmix `r50`)** as the **PRIMARY** size for **both galaxy and PSF**.
+- Size cut is the dimensionless **`0.707 ≤ r_h/r_h,psf ≤ 3`**, applied *inside Metacalibration* (enters the selection response).
+- Resolution factor `R = r_PSF/(r_PSF + r_h)` — written in `r`, not `T`.
+- **`T` is derived only**, via Eq. 1 `T = (r_h²/ln2)·(1+g1²+g2²)/(1−g1²−g2²)`, "to relate r_h to the DES size definition." Round-object case `T = r_h²/ln2 = 2σ²` — identical to standard ngmix `T = 2σ²`.
+
+**Guinot et al. 2022 (arXiv:2204.04798):** mixed — galaxy cut in `T` (`T_gal/T_PSF > 0.5`), PSF size `T = 2σ²`, but ngmix fit param + a prior in `r50` (arcsec).
+
+**sp_validation:** treats galaxy/PSF size only as the dimensionless **ratio** `T/Tpsf` (size cut, DES-weight & leakage binning) — robust regardless of absolute meaning — plus one place where PSF `T` is converted to FWHM for the leakage fit (§4). **It reads NONE of the five `r50*` columns** (0 hits in `src/`).
+
+**Takeaway:** the official paper wants `r_h` (= r50) as the headline size — exactly the quantity the v2.0 `r50*` columns *claim* to provide but don't. Standardize on **r50 = half-light radius as primary, T = 2σ² as the derived DES area.**
+
+---
+
+## 3. Recommendation — TRANSFORM at the source, thin cs_util surface
+
+Three options: a **pure RENAME** (`r50`→`T`/`σ`) is honest but discards the chance to ship the paper's headline quantity; a **cs_util-only TRANSFORM** (fix on read) leaves a foot-gun baked into every FITS file; **TRANSFORM at the source** (recommended) makes the writer emit honest, paper-consistent columns with cs_util holding the convention once.
+
+### 3a. ngmix.py — every size column honest and on the same scale
+Keep all `T*` columns unchanged (correct areas). Change the radius columns to true half-light radii:
+- **Galaxy:** `r50 = 1.1774·√(T/2)` (= `√(ln2·T)`); error `r50_err = (1.1774/(2√2))·T_err/√T`. *(Currently append the area `pars[4]`/`pars_err[4]`.)*
+- **PSF:** multiply existing σ values by `1.17741` so `r50psf`/`r50_psfo_ngmix`/`r50_err_psfo_ngmix` are true half-light radii, not σ.
+
+After this, **every "r50" column is a length meaning the same thing (1.1774σ)** on both sides, matching UNIONS-3500 I Eq. 1 for round objects.
+
+### 3b. De-duplicate
+`T_psfo_ngmix` duplicates `Tpsf`; `r50_psfo_ngmix` duplicates `r50psf`. Retire the redundant names (or document as aliases). Cleanest end state — one name per quantity: galaxy `{T, T_err, r50, r50_err}`, PSF `{Tpsf, Tpsf_err, r50psf, r50psf_err}`. Also fix the asymmetry: a `T_err_psfo_ngmix` exists but no `Tpsf_err` twin.
+
+### 3c. cs_util — single source of truth for the size web
+Expose thin, correctly-named, tested converters (home: `cs_util`, since `sp_validation/galaxy.py` already does `from cs_util import cfis`, and cs_util is the shared layer):
+
+```
+T_to_r50(T) = 1.1774 * sqrt(T / 2) # = sqrt(ln2 * T)
+r50_to_T(r50) = 2 * (r50 / 1.1774)**2
+sigma_to_fwhm(s) = 2.355 * s # already correct in galaxy.py
+T_to_fwhm(T) = 2.355 * sqrt(T / 2) # CORRECTED (see §4)
+```
+
+---
+
+## 4. Migration impact on sp_validation
+
+**No change for the size-ratio paths.** Galaxy `T` and PSF `Tpsf` enter only as the dimensionless ratio `T/Tpsf` (size cut, `size_ratio` binning) — robust by construction. None of the five `r50*` columns is read in sp_validation `src/` (0 hits), so the mislabel harms no current computation.
+
+**One genuine, load-bearing fix — the PSF-leakage size regressor.** `fwhm_PSF = T_to_fwhm(T_PSFo)` (built in `notebooks/extract_info.py`) is the third regressor in the scale-dependent PSF-leakage fit (`run_object.py:510`, `size_PSF_col` at `calibration.py:540`). The current `galaxy.py:T_to_fwhm` is `T/1.17741*2.355` and its own `MKDEBUG` comment doubts it ("Why is FWHM not quadratic in T???"). It is **wrong**: `T = 2σ²` is an area, so `FWHM = 2.355·σ = 2.355·√(T/2)` — a √ is required, not a linear factor (`2.355/1.17741 = 2.000` shows it treats T as if already σ). Result: a monotonic-but-nonlinear distortion of the PSF-size axis, biasing any `α(PSF-size)` trend and per-bin leakage coefficients. Spatially-constant leakage unaffected.
+
+**Migration:** point the `fwhm_PSF` builder at the new corrected PSF radius column via `cs_util.sigma_to_fwhm` (or import the fixed `cs_util.T_to_fwhm`), retire the local `galaxy.py:T_to_fwhm`. This is the change that actually moves a science number; the column rename/transform is hygiene + paper-consistency that also removes the foot-gun.
+
+**Sequencing:** the catalogue-writer change (§3a/3b) and the sp_validation regressor fix (§4) are independent — sp_validation doesn't read the `r50*` columns today, so fixing ngmix won't break it. cs_util converters should land before sp_validation switches to importing them.
+
+---
+
+## 5. Open questions for Lucy / Martin
+
+1. **Primary on-disk convention:** standardize on **r50 as primary** (matching UNIONS-3500 I) while keeping `T`? Or keep `T` as the working column with r50 derived?
+2. **Drop vs keep duplicates** (`T_psfo_ngmix`≡`Tpsf`, `r50_psfo_ngmix`≡`r50psf`): anything outside sp_validation `src/` (notebooks, older catalogues) bound to the `_psfo_ngmix` names? If so, alias rather than delete.
+3. **cs_util as the home** for the converter web — agreed? Naming preference (`T_to_r50` vs `T_to_halflight`)?
+4. **PSF-leakage refit:** were current/in-flight scale-dependent leakage results produced with the buggy `T_to_fwhm`, and should they be regenerated once fixed? How much does `α(size)` move?
+5. **`r50_err` for the galaxy:** is the error-propagation form acceptable, or prefer the full pars covariance?
+6. **Catalogue versioning:** does changing on-disk `r50*` semantics warrant a format version bump / changelog so downstream users know v2.0 pre- vs post-fix differ?
diff --git a/.felt/review-ngmix-v2-pr740/weights-report.md b/.felt/review-ngmix-v2-pr740/weights-report.md
new file mode 100644
index 000000000..17edea424
--- /dev/null
+++ b/.felt/review-ngmix-v2-pr740/weights-report.md
@@ -0,0 +1,136 @@
+# ngmix v2.0 weight / inverse-variance handling — decision-ready report
+
+**Scope:** PR #741 (branch `ngmix_v2.0`), the v2.0 refactor of noise/weight handling in `ngmix.py`, against the v1 baseline (`origin/develop`), and its relationship to Issue #604 (Weight Handling). All file:line refs are in the worktree (`ngmix_v2.0` head + review cleanups). Load-bearing claims verified against source **and confirmed empirically** (see §2a).
+
+**Headline:** v2.0 introduced two real, test-verifiable regressions in the ngmix weight map (dead noise estimator + lost mask binarization). The minimal v1-restore belongs in #741; the proper inverse-variance fix from #604 is a clean, mostly-config follow-up PR that should land separately.
+
+---
+
+## 1. Precise regression characterization
+
+The entire live weight path is `prepare_ngmix_weights` (`ngmix.py:871–908`):
+
+```python
+weight_map = np.copy(weight) # 894 FSCALE-rescaled Megapipe 0/1 mask
+weight_map[flag != 0] = 0.0 # 895
+sig_noise = sigma_mad(gal) # 897 noise scalar over the WHOLE stamp
+...
+weight_map *= 1 / sig_noise ** 2 # 906
+```
+
+v1 (`origin/develop`, `do_ngmix_metacal`), in order:
+
+```python
+weight_map = np.copy(weights[n_e])
+weight_map[np.where(flags[n_e] != 0)] = 0.0
+weight_map[weight_map != 0] = 1 # binarize to exactly 0/1 <-- DROPPED in v2
+...
+sig_noise = get_noise(...) # object-free windowed estimator <-- ORPHANED in v2
+...
+weight_map *= 1 / sig_noise**2
+```
+
+Two coupled regressions, both verified:
+
+**R1 — Noise estimator regressed from object-free to whole-stamp.** v1 called the windowed `get_noise` (masks the object via a model-Gaussian window, re-estimates `sigma_mad` on object-free pixels). v2 uses bare `sigma_mad(gal)` over the full, flux-contaminated stamp. `get_noise` still exists (`ngmix.py:826`, body byte-identical to v1) but **has zero call sites in `src/`** — confirmed dead. v2's `sig_noise` is biased high by the galaxy's own flux, in a **size/flux-dependent** way. Dominant numerical effect for today's inputs.
+
+**R2 — Lost the mask binarization.** v1 collapsed the weight to exactly 0/1 before scaling, so the final weight was a clean uniform `1/sig_noise²` per unmasked pixel. v2 dropped that line, so line 906 multiplies the *raw rescaled mask* by `1/sig_noise²`.
+
+What `weight` contains: traced from `ngmix_runner.py` input #4 → vignetmaker `RUN_2` → single-exposure weight images = Megapipe weight maps, which Gwyn confirmed are **0/1 masks** (#604). The only transform before line 894 is the FSCALE rescale (`rescale_epoch_fluxes`: `weight * 1/Fscale**2`) — identical to v1.
+
+Consequences of R2:
+- **For today's 0/1 masks:** each unmasked pixel carries `Fscale⁻² / sig_noise²`. FSCALE is applied to the weight *twice* and to the noise *once* → per-pixel ivar uniformly off by `1/Fscale²` per epoch vs v1. A latent scale bug, varies epoch-to-epoch.
+- **Latent hazard (dangerous):** if a *real* inverse-variance map is fed in (THELI, or the #604 BACKGROUND_RMS path), v2 multiplies a genuine `1/var_pix` by another `1/sig_noise²` → **variance counted twice**. v1's binarization made the code robust to this; v2 is not.
+
+### Downstream direction of the effect
+`weight_map` → `Observation.weight` = per-pixel inverse-variance in the ngmix χ².
+- Inflated `sig_noise` → too-low weight → under-weighted likelihood → **over-estimated** `g_err`/`T_err`/`flux_err`, **under-estimated** `s2n`. Because whole-stamp `sigma_mad` grows with brighter/larger galaxies, this is a **size/flux-dependent miscalibration** — the signature of a multiplicative shear bias `m`, consistent with the prior old-path finding `m ~ -2.8e-2`.
+- **Multi-epoch PSF averaging** weights each epoch by `obs.weight.sum()` → now driven by whole-stamp MAD and FSCALE; reported `PSFo` g/T shift accordingly.
+
+---
+
+## 2. Test-verifiability: **YES — clean red→green**
+
+`make_data` (`simulate.py:91`) injects the exact truth inverse-variance: `weights = im*0 + 1/noise**2`, `flags = 0`. That `1/noise²` is the oracle. A unit-level test on `prepare_ngmix_weights` (no ngmix fit needed — fast, deterministic) asserts the returned weight equals the injected inverse-variance:
+
+```python
+def test_weight_map_recovers_injected_inverse_variance():
+ noise = 1e-3
+ gals, psfs, _, weights, flags, jacobs = make_data(
+ rng=np.random.RandomState(123), shear=(0.0, 0.0),
+ noise=noise, n_epochs=1, img_size=201)
+ _, weight_map, _ = prepare_ngmix_weights(
+ gals[0], weights[0], flags[0], np.random.RandomState(0))
+ truth_ivar = 1.0 / noise ** 2
+ recovered = np.median(weight_map[weight_map > 0])
+ npt.assert_allclose(recovered, truth_ivar, rtol=0.10)
+```
+
+**Coverage gap today:** `test_ngmix.py`'s only weight-path test (`test_metacal_is_reproducible_with_fixed_seed`) asserts `run(42)==run(42)` — a determinism check that passes for *any* deterministic normalization, correct or not. The regression is exercised but never checked against truth.
+
+**Caveats:** `make_data` injects *homoscedastic* noise — the test catches the double-normalization + scalar bias, but to prove the fix respects *spatially-varying* ivar (#604's point), `make_data` needs a per-pixel RMS option (follow-up, not a blocker).
+
+### 2a. EMPIRICAL CONFIRMATION (observed, this session)
+
+Ran the check against the actual code (dev container, galsim+numpy; no ngmix fit needed):
+
+| noise | truth ivar | sigma_mad(gal) | recovered median | ratio (correct = 1.0) |
+|---|---|---|---|---|
+| 1e-3 | 1e6 | 1.065e-3 | 8.81e11 | **8.81e5** |
+| 1e-4 | 1e8 | 1.167e-4 | 7.35e15 | **7.35e7** |
+
+The recovered weight is off by ≈`1/noise²` — exactly the R2 double-count (real ivar × another `1/sig_noise²`). `sigma_mad(gal)` runs ~6% above truth noise (R1 flux contamination, small for a compact galaxy in a 201² stamp). The data matches the by-reading analysis to ~6 orders of magnitude. **The regression is real and observed, not inferred.**
+
+---
+
+## 3. Recommended baseline: SExtractor BACKGROUND_RMS — concrete change plan
+
+The existing machinery is already generic: SExtractor's check-image handler is list-driven and image-agnostic; vignetmaker's ME loop loops over `ME_IMAGE_PATTERN` and reads `get_data(0)` — a single-HDU BACKGROUND_RMS check-image flows through the same path as BACKGROUND with **zero code change**.
+
+| Step | What | File(s) | Cost |
+|---|---|---|---|
+| 1 | `CHECKIMAGE = BACKGROUND, BACKGROUND_RMS` | `example/cfis/config_exp_*.ini` | **config only** |
+| 2 | Add `background_rms` to vignetmaker `ME_IMAGE_DIR`/`ME_IMAGE_PATTERN` | `config_tile_MiViSmVi.ini` | **config only** |
+| 3 | Wire one **optional** input (RMS sqlite) through runner + `Vignet` | `ngmix_runner.py`, `ngmix.py` `Vignet`/`Ngmix.__init__` | small code |
+| 4 | **Core:** build per-pixel ivar from RMS, gated on mask | `ngmix.py` `prepare_ngmix_weights` et al. | real work |
+| 5 | Keep `sigma_mad`/`get_noise` as **fallback** when no RMS map present | `ngmix.py` (branch on `bkg_rms is not None`) | small code |
+
+**Core (Step 4) sketch** — Megapipe weight stays the *mask*; RMS supplies the *variance* (the two roles were conflated before):
+
+```python
+weight_map = np.copy(weight) # Megapipe mask (0 where masked)
+weight_map[flag != 0] = 0.0
+mask = weight_map != 0
+ivar = np.zeros_like(gal)
+ivar[mask] = 1.0 / bkg_rms[mask] ** 2 # per-pixel inverse variance
+sig_noise = np.median(bkg_rms[mask]) if mask.any() else sigma_mad(gal)
+return gal_masked, ivar, noise_img
+```
+
+RMS rescaling: BACKGROUND_RMS is in image counts, scales by `Fscale`; `1/rms²` scales by `1/Fscale²` — identical to the existing weight rescale, slots into `rescale_epoch_fluxes`.
+
+**Effort: S–M, ~4–8 h.** Config + plumbing ~1–2 h; core rewrite + fallback ~2–3 h; example-pipeline before/after on shear/s2n/flags ~2–3 h.
+
+**Honest read on aguinot's "almost free":** true for the plumbing (steps 1–2 are pure config), undersells the core. Delivering the data is cheap; *using it correctly* (changing what the weight means, getting Fscale + mask roles right, validating no fit regression) is the earned part.
+
+**Risks:** RMS units/rescaling; zeros/NaNs in RMS at masked/off-image pixels → `1/rms²` blowup (gate on mask, guard `rms<=0`); double-counting the mask; positional input-list shift breaking configs (mitigated by optional-input design).
+
+---
+
+## 4. Scoping: **separate PRs.**
+
+- **In #741 (now):** the minimal v1-restore — reinstate `weight_map[weight_map != 0] = 1` and call `get_noise(...)` instead of `sigma_mad(gal)` — plus the red→green unit test. Two-line change fixing a regression *this PR introduced*; restores known-good behavior; restoring the binarization makes the fallback robust to real ivar maps again. Does not depend on #604.
+- **#604 as its own PR (next):** the BACKGROUND_RMS baseline changes what the weight map *means*, touches new I/O, needs its own before/after validation. A distinct review surface; bundling into #741 would mix "fix the regression I just introduced" with "implement a 3-year-old feature request."
+
+The red→green test validates both the #741 minimal fix and the #604 baseline; it travels with #741 now and proves out #604 later.
+
+---
+
+## 5. Open questions for Martin (science call)
+
+1. **Baseline vs THELI.** Confirm in-house SExtractor-BACKGROUND_RMS is the baseline, with THELI weight images slotting in later as another `ME_IMAGE_PATTERN` source (not blocking).
+2. **Optional vs required RMS input.** Recommend optional with `sigma_mad`/`get_noise` fallback. Confirm.
+3. **#741 minimal fix:** restore the windowed `get_noise` (reintroduces the gal-guess machinery the PR may have deliberately stripped), or accept bare `sigma_mad(gal)` until #604 lands? R1 is the dominant effect, so this matters even short-term.
+4. **Does the RMS map feed anything beyond the weight + noise-fill scale** (e.g. background subtraction)? Recommend weight + noise-fill only.
+5. **Scope of config edits** for the #604 PR: cfis only, or cfis + cfis_simu + canfar batch?
+6. **Validation bar to merge #604:** smoke before/after on the example tile, or a full sim m/c calibration before merge?
From 5e0d95d3fcffc57ee73d4006f07985ab5ad00037 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Fri, 5 Jun 2026 22:33:20 +0200
Subject: [PATCH 17/29] felt: draft two work-stream constitutions from the
ngmix v2.0 review
- shapepipe/ngmix-weights-ivar (Codex): fix the two prepare_ngmix_weights
regressions (minimal v1-restore + red->green test in #741) and the
SExtractor BACKGROUND_RMS inverse-variance baseline as a separate PR
(closes #604). Points at weights-report.md.
- shapepipe/ngmix-size-columns: honest r50 at the ngmix source + cs_util
converter web + fix the sp_validation T_to_fwhm leakage bug. Points at
size-report.md.
Both installed as drafts pending Cail's dispatch decision.
Co-Authored-By: Claude Opus 4.8
---
.../review-ngmix-v2-pr740.md | 18 ++--
.../ngmix-size-columns/ngmix-size-columns.md | 76 +++++++++++++++++
.../ngmix-weights-ivar/ngmix-weights-ivar.md | 82 +++++++++++++++++++
3 files changed, 166 insertions(+), 10 deletions(-)
create mode 100644 .felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
create mode 100644 .felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 585fba251..9009eaff8 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -1,10 +1,15 @@
---
id: 01KTCHWZX873VG28AA663A3ZQE
+name: 'Review + work: ngmix v2.0 (PR #740)'
+status: active
+tags:
+ - constitution
+ - shapepipe
+ - ngmix
+ - review
created-at: 2026-06-01T11:50:17.967872934+02:00
+outcome: 'INTERACTIVE — live with Cail. Cleanups + fitting.py committed (bd60dc8e in worktree /tmp/pr740-wt, branch pr740-tmp, UNPUSHED): 293 print, 1140 sextractor_e1e2, 254 dead resume, 766 →v_flag_tmp.size, fitting.py deleted. Dev image built (/n17data/cdaley/containers/shapepipe-dev) + ngmix 2.4.0 installed → full suite + example pipeline now runnable. Ran a 6+2-agent WORKFLOW on the two hard problems → two decision-ready reports in fiber: weights-report.md, size-report.md, deep-dive-report.html (sent to Cail). WEIGHTS (#604+949): found TWO coupled regressions in prepare_ngmix_weights (ngmix.py:871) — R1 noise estimator regressed from object-free windowed get_noise (now DEAD at :826) to flux-contaminated whole-stamp sigma_mad; R2 lost the v1 binarization → double-counts a real inverse-variance map. EMPIRICALLY CONFIRMED (truth ivar 1e6 → recovered 8.8e11, ratio≈1/noise²). Clean red→green test target via make_data. Rec: SPLIT — minimal v1-restore (reinstate binarization + get_noise) + test in #741; SExtractor BACKGROUND_RMS baseline as a SEPARATE PR (Codex shuttle, ~4-8h, closes #604). SIZE (r50/T): galaxy r50=pars[4]=T (area), PSF r50psf=σ; neither is 1.1774σ; all 5 r50* cols new in v2.0. UNIONS-3500 WL I (arXiv:2605.13549) reports half-light radius r_h as PRIMARY. Rec: TRANSFORM at source (honest r50 in ngmix) + cs_util converter web; bonus find — sp_validation galaxy.py:T_to_fwhm is dimensionally wrong (feeds the scale-dependent PSF-leakage fit). HELD: runner decorators (test via example pipeline now that the dev image is up). DECISIONS FOR CAIL: push Bucket A? approve the weights 2-PR split + let me draft the Codex constitution? size transform-vs-rename + cs_util home? No Martin/Lucy reply drafted yet (Cail''s instruction). Merge stays Cail''s/Martin''s gesture.'
horizon: now
-name: "Review + work: ngmix v2.0 (PR #740)"
-outcome: |-
- INTERACTIVE — live with Cail. Cleanups + fitting.py committed (bd60dc8e in worktree /tmp/pr740-wt, branch pr740-tmp, UNPUSHED): 293 print, 1140 sextractor_e1e2, 254 dead resume, 766 →v_flag_tmp.size, fitting.py deleted. Dev image built (/n17data/cdaley/containers/shapepipe-dev) + ngmix 2.4.0 installed → full suite + example pipeline now runnable. Ran a 6+2-agent WORKFLOW on the two hard problems → two decision-ready reports in fiber: weights-report.md, size-report.md, deep-dive-report.html (sent to Cail). WEIGHTS (#604+949): found TWO coupled regressions in prepare_ngmix_weights (ngmix.py:871) — R1 noise estimator regressed from object-free windowed get_noise (now DEAD at :826) to flux-contaminated whole-stamp sigma_mad; R2 lost the v1 binarization → double-counts a real inverse-variance map. EMPIRICALLY CONFIRMED (truth ivar 1e6 → recovered 8.8e11, ratio≈1/noise²). Clean red→green test target via make_data. Rec: SPLIT — minimal v1-restore (reinstate binarization + get_noise) + test in #741; SExtractor BACKGROUND_RMS baseline as a SEPARATE PR (Codex shuttle, ~4-8h, closes #604). SIZE (r50/T): galaxy r50=pars[4]=T (area), PSF r50psf=σ; neither is 1.1774σ; all 5 r50* cols new in v2.0. UNIONS-3500 WL I (arXiv:2605.13549) reports half-light radius r_h as PRIMARY. Rec: TRANSFORM at source (honest r50 in ngmix) + cs_util converter web; bonus find — sp_validation galaxy.py:T_to_fwhm is dimensionally wrong (feeds the scale-dependent PSF-leakage fit). HELD: runner decorators (test via example pipeline now that the dev image is up). DECISIONS FOR CAIL: push Bucket A? approve the weights 2-PR split + let me draft the Codex constitution? size transform-vs-rename + cs_util home? No Martin/Lucy reply drafted yet (Cail's instruction). Merge stays Cail's/Martin's gesture.
shuttle:
agent: claude-opus
enabled: true
@@ -12,15 +17,8 @@ shuttle:
interactive: true
kind: oneshot
project_dir: /automnt/n17data/cdaley/unions/shapepipe
-status: active
-tags:
- - constitution
- - shapepipe
- - ngmix
- - review
---
-
# Review + work: ngmix v2.0 (PR #740)
Martin requested Cail's review on **[CosmoStat/shapepipe #740 "Ngmix v2.0"](https://github.com/CosmoStat/shapepipe/pull/740)** (head `ngmix_v2.0` → base `develop`, +3894/−3410 across 67 files). This is the PR that **realizes [[ngmix-update]]**: replace Axel's `stable_version` ngmix fork with **upstream ngmix 2.4.0** and adopt **@lbaumo's (Lucy Baumont's) new ngmix classes + interface**, reconciling the cleaned-up wrapper from her visit. Track it in [[prs-in-flight]]; it sits in the broader Martin collaboration under [[shapepipe]].
diff --git a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
new file mode 100644
index 000000000..58655def1
--- /dev/null
+++ b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
@@ -0,0 +1,76 @@
+---
+id: 01KTCQPE4M6S1E4V9WMPCAFCKT
+name: 'ngmix size columns: honest r50 + cs_util converter web'
+tags:
+ - constitution
+ - shapepipe
+ - ngmix
+created-at: 2026-06-05T22:30:50.004535516+02:00
+outcome: Draft — not yet dispatched.
+shuttle:
+ enabled: false
+ kind: oneshot
+ host: candide
+ agent: claude-opus
+---
+
+# ngmix size columns: honest r50 + cs_util converter web
+
+Spun out of the [[review-ngmix-v2-pr740]] review. The full investigation — code ground-truth table,
+the UNIONS-paper convention, the sp_validation consumption map, and the bonus downstream bug — lives in
+**`.felt/review-ngmix-v2-pr740/size-report.md`**. Read it first; it is the detailed spec.
+
+The v2.0 ngmix rewrite added five `r50*` columns, **none of which is a half-light radius**: galaxy
+`r50` = `pars[4]` = `T` = 2σ² (an *area*, a bit-for-bit copy of the `T` column); PSF `r50psf` =
+`√(T/2)` = σ (a length, but off by the `1.1774` factor). So the same name root means an area on the
+galaxy side and a length on the PSF side. The official catalogue paper (**UNIONS-3500 WL I**,
+arXiv:2605.13549) reports the half-light radius `r_h` (= r50) as the **primary** size for both galaxy
+and PSF, with `T` derived. Cail approved fixing this at the source.
+
+## Desired State
+
+**1. ngmix writer emits honest, self-consistent size columns** (`ngmix.py`, on `ngmix_v2.0`):
+- Galaxy `r50 = 1.1774·√(T/2)` (= `√(ln2·T)`) with propagated error `r50_err = (1.1774/(2√2))·T_err/√T`
+ — replacing the current raw `pars[4]`/`pars_err[4]`.
+- PSF `r50psf`/`r50_psfo_ngmix`/`r50_err_psfo_ngmix` = existing σ values × `1.17741` (true half-light
+ radii, not σ).
+- Keep all `T*` columns as the (correctly-named) DES areas. After this, every "r50" column is a length
+ meaning the same thing on both sides, matching the paper.
+- **De-duplicate:** `T_psfo_ngmix` ≡ `Tpsf` and `r50_psfo_ngmix` ≡ `r50psf`. Default = retire the
+ redundant names; fall back to documenting them as aliases if anything outside `sp_validation/src`
+ (notebooks, older catalogues) depends on the `_psfo_ngmix` names. Fix the `T_err_psfo_ngmix`-with-no-
+ `Tpsf_err` asymmetry while here.
+
+**2. `cs_util` holds the conversion web** — the single source of truth, since `sp_validation/galaxy.py`
+already does `from cs_util import cfis` and cs_util is the shared producer/consumer layer:
+- `T_to_r50(T) = 1.1774·√(T/2)`, `r50_to_T(r50) = 2·(r50/1.1774)²`, `sigma_to_fwhm(s) = 2.355·s`,
+ and the **corrected** `T_to_fwhm(T) = 2.355·√(T/2)`. With tests.
+
+**3. Fix the downstream PSF-leakage bug in sp_validation.** `galaxy.py:T_to_fwhm` is `T/1.17741*2.355`
+— dimensionally wrong (treats the area `T` as σ; `2.355/1.17741 = 2.000`). It builds the `fwhm_PSF`
+regressor for the scale-dependent PSF-leakage fit (`run_object.py:510`), so it distorts the PSF-size
+axis and biases `α(size)`. Switch it to `cs_util.sigma_to_fwhm` on the corrected PSF size (or import
+the fixed `cs_util.T_to_fwhm`) and retire the local function. **This is the change that actually moves
+a science number** — flag it prominently for Cail (it touches the pure_eb / leakage analysis).
+
+**Scoping:** the ngmix-writer change (1) and the sp_validation fix (3) are independent — sp_validation
+reads none of the `r50*` columns today, so fixing ngmix won't break it. Land cs_util (2) before
+sp_validation (3) switches to importing it. **Merge/PR-creation stay Cail's gesture**; surface the open
+calls (r50-as-primary convention, drop-vs-alias, catalogue version bump, whether in-flight leakage
+results need regenerating) in this fiber's report for Cail's eventual reply — **do not post to Martin.**
+
+## Context
+
+- **Detailed spec:** `.felt/review-ngmix-v2-pr740/size-report.md` (ground-truth table, paper citations,
+ consumption map, the 6 open questions).
+- **Repos (all on candide):** shapepipe `/automnt/n17data/cdaley/unions/shapepipe` (branch `ngmix_v2.0`);
+ sp_validation `/automnt/n17data/cdaley/unions/pure_eb/code/sp_validation/src/sp_validation/`
+ (`galaxy.py`, `calibration.py`, `run_object.py`, `notebooks/extract_info.py`); cs_util (shared —
+ locate the checkout; `sp_validation/galaxy.py` imports it). Cross-repo, so likely cross-PR.
+- **Coordinate with [[ngmix-weights-ivar]]:** both edit `ngmix.py` but different functions (size-column
+ writer vs `prepare_ngmix_weights`) — low conflict; each on its own branch off `ngmix_v2.0` (after the
+ review-cleanup commit `bd60dc8e` lands).
+- **Convention:** UNIONS-3500 WL I (arXiv:2605.13549) → `r_h`/r50 primary, `T = 2σ²` derived (Eq. 1).
+ Arithmetic verified: `√(2·ln2) = 1.17741`.
+- **Agent:** Cail's call — the cs_util/sp_validation API + naming is taste-y (leans Claude); the ngmix
+ column edits are mechanical.
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
new file mode 100644
index 000000000..891de5e26
--- /dev/null
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -0,0 +1,82 @@
+---
+id: 01KTCQPE3JGEYN7NQS8HW1AT6B
+name: 'ngmix weight map: fix v2.0 regressions + inverse-variance (#604)'
+tags:
+ - constitution
+ - shapepipe
+ - ngmix
+created-at: 2026-06-05T22:30:49.970813955+02:00
+outcome: Draft — not yet dispatched.
+shuttle:
+ enabled: false
+ kind: oneshot
+ host: candide
+ agent: codex
+---
+
+# ngmix weight map: fix v2.0 regressions + inverse-variance (#604)
+
+Spun out of the [[review-ngmix-v2-pr740]] review. The full investigation, with file:line
+evidence, the empirical confirmation, the concrete change plan, effort, and risks, lives in
+**`.felt/review-ngmix-v2-pr740/weights-report.md`** — read it first; it is the detailed spec.
+
+The v2.0 ngmix rewrite (PR #741, branch `ngmix_v2.0`) introduced two coupled regressions in
+`prepare_ngmix_weights` (`ngmix.py:871`), **empirically confirmed** this session (fed `make_data`'s
+truth inverse-variance `1/noise²` → recovered `8.8e11`, off by ≈`1/noise²`):
+
+- **R1** — noise estimate regressed from v1's object-free windowed `get_noise` (still present at
+ `:826`, now **dead**) to flux-contaminated whole-stamp `sigma_mad(gal)`. Size/flux-dependent bias
+ on the likelihood weighting → biased `g_err`/`T_err`/`s2n`; the fingerprint of a multiplicative
+ shear bias (cf. old-path `m≈-2.8e-2`). Dominant today.
+- **R2** — lost the v1 binarization `weight_map[weight_map != 0] = 1`, so the raw weight is multiplied
+ by `1/sig_noise²`: a per-epoch `1/Fscale²` error today, and a latent double-count the moment a real
+ inverse-variance map is fed in.
+
+Neither v1 nor v2 is *correct* in absolute terms — both approximate per-pixel inverse-variance with a
+per-stamp scalar on a 0/1 mask. #604 ("Weight Handling", open since 2023) is the real fix: feed ngmix
+genuine inverse-variance maps.
+
+## Desired State
+
+**Two PRs, in this order:**
+
+1. **Minimal regression fix on `ngmix_v2.0` (#741).** Restore the v1 binarization (R2) and add a
+ **red→green unit test** on `prepare_ngmix_weights` using `make_data`'s truth ivar (the test sketch
+ is in the report; it needs no ngmix fit, so it's fast and deterministic). This is a clean, small
+ change that removes the double-count hazard.
+ - **Open call (default = defer R1):** restoring the windowed `get_noise` for R1 reintroduces the
+ gal-guess machinery this PR may have deliberately stripped. Default to fixing R2 + the test now,
+ and let the proper inverse-variance path (PR 2) retire whole-stamp `sigma_mad` wholesale — which
+ resolves R1 without reviving the old machinery. If R1 proves to matter for the interim, surface
+ it; **do not block on it, and do not post to Martin** — these science calls go into this fiber's
+ report for Cail to fold into the eventual #741 reply.
+
+2. **SExtractor `BACKGROUND_RMS` baseline — separate PR, closes #604.** The in-house path (THELI
+ weights are an external product and slot in later as another `ME_IMAGE_PATTERN` source). Follow the
+ report's change plan: `CHECKIMAGE = BACKGROUND, BACKGROUND_RMS` (config), cut the RMS stamp via
+ vignetmaker's existing list-driven ME loop (config), wire one **optional** RMS input through
+ `ngmix_runner`/`Vignet`, and rewrite `prepare_ngmix_weights` to build per-pixel `1/RMS²` gated on
+ the mask, with `sigma_mad`/`get_noise` retained as the **fallback** when no RMS map is present.
+ Validate with a before/after on the example tile (shear/s2n/flags). Effort ~4–8h.
+
+**Quality bar:** every claim verified by running, not inferring (dev image + ngmix 2.4.0 are ready —
+see Context). The red→green test must actually go red on current code and green after. **Merge and
+PR-creation stay Cail's gesture** — prepare branches, commits, and a PR-ready description; surface the
+science decisions in this fiber's `report.html`/outcome rather than pushing to Martin.
+
+## Context
+
+- **Detailed spec:** `.felt/review-ngmix-v2-pr740/weights-report.md` (change plan, file list, risks,
+ the 6 science questions for Martin). This constitution is the pointer; the report is the substance.
+- **Repo:** `/automnt/n17data/cdaley/unions/shapepipe`, branch `ngmix_v2.0` (= #741 head). Base the
+ work on `ngmix_v2.0` **after** the review-cleanup commit lands (it removes dead code and is currently
+ staged at `/tmp/pr740-wt` commit `bd60dc8e`, pending Cail's push). Coordinate with
+ [[ngmix-size-columns]] — it also edits `ngmix.py`, but a different function (the size-column writer
+ vs `prepare_ngmix_weights`); low conflict, but each work-stream gets its own branch.
+- **Test environment:** dev-image sandbox at `/n17data/cdaley/containers/shapepipe-dev` with **ngmix
+ 2.4.0** installed (the published `:develop` image ships the old `aguinot` fork v1.3.6 — wrong for
+ this branch). Run via `CONTAINER_PATH=/n17data/cdaley/containers/shapepipe-dev PYTHONPATH=/src app python3.12 -m pytest …`.
+ The `--writable` sandbox needs bind mountpoints to exist (`mkdir` them); see this session's history.
+- **Issue #604:** megapipe weights are 0/1 masks (Gwyn confirmed); aguinot's BACKGROUND_RMS proposal is
+ the basis for PR 2. Cail confirmed the SExtractor-RMS baseline over THELI.
+- **Agent:** Codex (Cail's call) — gritty, well-defined cross-file implementation.
From 695786a7fa7044d671a955221e2dbcd69b1eeff3 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Fri, 5 Jun 2026 22:41:36 +0200
Subject: [PATCH 18/29] felt: dispatch the two ngmix work-stream shuttles;
Bucket A pushed to #741
- Pushed cleanup bd60dc8e to origin/ngmix_v2.0 (dd4f656a..bd60dc8e).
- Enabled both oneshot shuttles: ngmix-weights-ivar (codex),
ngmix-size-columns (claude-opus). Workers prepare branches/PRs+reports;
merge stays Cail's. *_psfo gate + runner decorators tracked for the
eventual #741 reply.
Co-Authored-By: Claude Opus 4.8
---
.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md | 2 +-
.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md | 3 ++-
.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md | 3 ++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 9009eaff8..6255edabb 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -8,7 +8,7 @@ tags:
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
-outcome: 'INTERACTIVE — live with Cail. Cleanups + fitting.py committed (bd60dc8e in worktree /tmp/pr740-wt, branch pr740-tmp, UNPUSHED): 293 print, 1140 sextractor_e1e2, 254 dead resume, 766 →v_flag_tmp.size, fitting.py deleted. Dev image built (/n17data/cdaley/containers/shapepipe-dev) + ngmix 2.4.0 installed → full suite + example pipeline now runnable. Ran a 6+2-agent WORKFLOW on the two hard problems → two decision-ready reports in fiber: weights-report.md, size-report.md, deep-dive-report.html (sent to Cail). WEIGHTS (#604+949): found TWO coupled regressions in prepare_ngmix_weights (ngmix.py:871) — R1 noise estimator regressed from object-free windowed get_noise (now DEAD at :826) to flux-contaminated whole-stamp sigma_mad; R2 lost the v1 binarization → double-counts a real inverse-variance map. EMPIRICALLY CONFIRMED (truth ivar 1e6 → recovered 8.8e11, ratio≈1/noise²). Clean red→green test target via make_data. Rec: SPLIT — minimal v1-restore (reinstate binarization + get_noise) + test in #741; SExtractor BACKGROUND_RMS baseline as a SEPARATE PR (Codex shuttle, ~4-8h, closes #604). SIZE (r50/T): galaxy r50=pars[4]=T (area), PSF r50psf=σ; neither is 1.1774σ; all 5 r50* cols new in v2.0. UNIONS-3500 WL I (arXiv:2605.13549) reports half-light radius r_h as PRIMARY. Rec: TRANSFORM at source (honest r50 in ngmix) + cs_util converter web; bonus find — sp_validation galaxy.py:T_to_fwhm is dimensionally wrong (feeds the scale-dependent PSF-leakage fit). HELD: runner decorators (test via example pipeline now that the dev image is up). DECISIONS FOR CAIL: push Bucket A? approve the weights 2-PR split + let me draft the Codex constitution? size transform-vs-rename + cs_util home? No Martin/Lucy reply drafted yet (Cail''s instruction). Merge stays Cail''s/Martin''s gesture.'
+outcome: 'INTERACTIVE — live with Cail. Bucket-A cleanups PUSHED to ngmix_v2.0/#741 (dd4f656a..bd60dc8e: 293/1140/254-resume/766 + fitting.py deleted); CI running. Dev image built (/n17data/cdaley/containers/shapepipe-dev) + ngmix 2.4.0 installed → full suite + example pipeline now runnable. Ran a 6+2-agent WORKFLOW on the two hard problems → two decision-ready reports in fiber: weights-report.md, size-report.md, deep-dive-report.html (sent to Cail). WEIGHTS (#604+949): found TWO coupled regressions in prepare_ngmix_weights (ngmix.py:871) — R1 noise estimator regressed from object-free windowed get_noise (now DEAD at :826) to flux-contaminated whole-stamp sigma_mad; R2 lost the v1 binarization → double-counts a real inverse-variance map. EMPIRICALLY CONFIRMED (truth ivar 1e6 → recovered 8.8e11, ratio≈1/noise²). Clean red→green test target via make_data. Rec: SPLIT — minimal v1-restore (reinstate binarization + get_noise) + test in #741; SExtractor BACKGROUND_RMS baseline as a SEPARATE PR (Codex shuttle, ~4-8h, closes #604). SIZE (r50/T): galaxy r50=pars[4]=T (area), PSF r50psf=σ; neither is 1.1774σ; all 5 r50* cols new in v2.0. UNIONS-3500 WL I (arXiv:2605.13549) reports half-light radius r_h as PRIMARY. Rec: TRANSFORM at source (honest r50 in ngmix) + cs_util converter web; bonus find — sp_validation galaxy.py:T_to_fwhm is dimensionally wrong (feeds the scale-dependent PSF-leakage fit). Both work-streams now DISPATCHED as shuttles (Cail''s go): [[ngmix-weights-ivar]] (Codex — minimal #741 regression fix w/ red→green test + the SExtractor BACKGROUND_RMS baseline as a separate PR closing #604) and [[ngmix-size-columns]] (claude-opus — honest r50 at the ngmix source + cs_util converter web + the sp_validation T_to_fwhm leakage fix). Workers prepare branches/commits/PR-descriptions + reports; merge/PR-creation stays Cail''s. STILL OPEN for the eventual #741 reply (not in these streams): *_psfo reconvolved-PSF merge-gate (1045) + the runner-decorator contract (testable via the example pipeline). PLAN: let the two shuttles finish → reassess → write the Martin reply. No reply drafted yet.'
horizon: now
shuttle:
agent: claude-opus
diff --git a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
index 58655def1..9b737a609 100644
--- a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
+++ b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
@@ -8,10 +8,11 @@ tags:
created-at: 2026-06-05T22:30:50.004535516+02:00
outcome: Draft — not yet dispatched.
shuttle:
- enabled: false
+ enabled: true
kind: oneshot
host: candide
agent: claude-opus
+status: active
---
# ngmix size columns: honest r50 + cs_util converter web
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
index 891de5e26..631567510 100644
--- a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -8,10 +8,11 @@ tags:
created-at: 2026-06-05T22:30:49.970813955+02:00
outcome: Draft — not yet dispatched.
shuttle:
- enabled: false
+ enabled: true
kind: oneshot
host: candide
agent: codex
+status: active
---
# ngmix weight map: fix v2.0 regressions + inverse-variance (#604)
From 6a3ac80e307d27c4e4f2d2197aeee8d46a00968f Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Sat, 6 Jun 2026 02:02:05 +0200
Subject: [PATCH 19/29] felt: record ngmix R2 weight fix checkpoint
---
.../ngmix-weights-ivar/ngmix-weights-ivar.md | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
index 631567510..10ae09c21 100644
--- a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -6,7 +6,8 @@ tags:
- shapepipe
- ngmix
created-at: 2026-06-05T22:30:49.970813955+02:00
-outcome: Draft — not yet dispatched.
+outcome: |-
+ R2 minimal fix landed in `/tmp/pr740-wt` commit `953a52a3`: `prepare_ngmix_weights` again binarizes the mask before scalar noise normalization, and a unit test now goes red on double-normalization (`8.809e11` vs `1e6`) and green after the fix (`8.809e5`, within 15%; residual is R1 whole-stamp MAD contamination).
shuttle:
enabled: true
kind: oneshot
@@ -69,6 +70,18 @@ science decisions in this fiber's `report.html`/outcome rather than pushing to M
- **Detailed spec:** `.felt/review-ngmix-v2-pr740/weights-report.md` (change plan, file list, risks,
the 6 science questions for Martin). This constitution is the pointer; the report is the substance.
+- **R2 checkpoint:** `/tmp/pr740-wt` commit `953a52a3` restores
+ `weight_map[weight_map != 0] = 1` in `prepare_ngmix_weights` and adds
+ `test_weight_map_recovers_injected_inverse_variance`. Observed red before the fix:
+ recovered `8.80916e11` vs truth `1e6`; observed green after the fix: recovered
+ `8.80916e5` vs truth `1e6` with `rtol=0.15`. The tolerance is intentionally loose
+ enough to leave R1 visible: whole-stamp `sigma_mad(gal)` still overestimates the
+ compact simulated stamp's noise by about 6%, producing a ~12% inverse-variance
+ shortfall. This commit fixes R2 only; it does not claim R1 is solved.
+- **Current test caveat:** the targeted unit test passes in the dev image. The full
+ `src/shapepipe/tests/test_ngmix.py` file currently still fails in the existing
+ metacal smoke test because the container's `ngmix.fitting` has no `Fitter`
+ attribute; that failure is independent of the new R2 unit path.
- **Repo:** `/automnt/n17data/cdaley/unions/shapepipe`, branch `ngmix_v2.0` (= #741 head). Base the
work on `ngmix_v2.0` **after** the review-cleanup commit lands (it removes dead code and is currently
staged at `/tmp/pr740-wt` commit `bd60dc8e`, pending Cail's push). Coordinate with
From 2ac3803a020fcc9209be084fb8dd4fd595843210 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Sat, 6 Jun 2026 02:08:31 +0200
Subject: [PATCH 20/29] felt: record ngmix background-rms checkpoint
---
.../ngmix-weights-ivar/ngmix-weights-ivar.md | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
index 10ae09c21..c2388ce50 100644
--- a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -7,7 +7,7 @@ tags:
- ngmix
created-at: 2026-06-05T22:30:49.970813955+02:00
outcome: |-
- R2 minimal fix landed in `/tmp/pr740-wt` commit `953a52a3`: `prepare_ngmix_weights` again binarizes the mask before scalar noise normalization, and a unit test now goes red on double-normalization (`8.809e11` vs `1e6`) and green after the fix (`8.809e5`, within 15%; residual is R1 whole-stamp MAD contamination).
+ First #604 implementation checkpoint landed in `/tmp/pr740-wt` commit `03bf12b6`: ngmix now accepts an optional `BKG_RMS_VIGNET_PATH`, threads BACKGROUND_RMS stamps into `Postage_stamp`, builds per-pixel `1/RMS²` inverse-variance gated by weight/flag/valid-RMS masks, and keeps the scalar `sigma_mad` fallback when no RMS file is supplied. Targeted RMS/weight tests passed; full `test_ngmix.py` still has the pre-existing `ngmix.fitting.Fitter` API failure.
shuttle:
enabled: true
kind: oneshot
@@ -78,6 +78,18 @@ science decisions in this fiber's `report.html`/outcome rather than pushing to M
enough to leave R1 visible: whole-stamp `sigma_mad(gal)` still overestimates the
compact simulated stamp's noise by about 6%, producing a ~12% inverse-variance
shortfall. This commit fixes R2 only; it does not claim R1 is solved.
+- **BACKGROUND_RMS checkpoint:** `/tmp/pr740-wt` commit `03bf12b6` adds the
+ first #604 implementation slice. `ngmix_runner` now takes optional
+ `BKG_RMS_VIGNET_PATH` (with `{file_number_string}` substitution) without
+ changing the existing six-input pipeline contract. `Vignet` opens the
+ optional RMS sqlite, `prepare_postage_stamps` rescales RMS by `FSCALE`, and
+ `prepare_ngmix_weights(..., bkg_rms=...)` uses per-pixel `1/RMS²` on pixels
+ where the Megapipe weight mask, flag mask, and RMS validity mask are all good.
+ Verification observed in the dev image:
+ `test_weight_map_recovers_injected_inverse_variance`,
+ `test_background_rms_builds_per_pixel_inverse_variance`, and
+ `test_rescale_epoch_fluxes_scales_background_rms_like_image_counts` all
+ passed. `py_compile` and `git diff --check` were also clean.
- **Current test caveat:** the targeted unit test passes in the dev image. The full
`src/shapepipe/tests/test_ngmix.py` file currently still fails in the existing
metacal smoke test because the container's `ngmix.fitting` has no `Fitter`
From 3975b725a1c68aaba0745b4a88156e13bea9e12d Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Sat, 6 Jun 2026 02:14:02 +0200
Subject: [PATCH 21/29] felt: update ngmix background-rms checkpoint
---
.../ngmix-weights-ivar/ngmix-weights-ivar.md | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
index c2388ce50..b9bdcc53c 100644
--- a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -7,7 +7,7 @@ tags:
- ngmix
created-at: 2026-06-05T22:30:49.970813955+02:00
outcome: |-
- First #604 implementation checkpoint landed in `/tmp/pr740-wt` commit `03bf12b6`: ngmix now accepts an optional `BKG_RMS_VIGNET_PATH`, threads BACKGROUND_RMS stamps into `Postage_stamp`, builds per-pixel `1/RMS²` inverse-variance gated by weight/flag/valid-RMS masks, and keeps the scalar `sigma_mad` fallback when no RMS file is supplied. Targeted RMS/weight tests passed; full `test_ngmix.py` still has the pre-existing `ngmix.fitting.Fitter` API failure.
+ #604 implementation is now wired through code and active example configs in `/tmp/pr740-wt` through commit `ee87b5bd`: SExtractor emits `BACKGROUND_RMS`, vignetmaker stores `background_rms_vignet*.sqlite`, ngmix templates point `BKG_RMS_VIGNET_PATH` at that sqlite, and ngmix builds per-pixel `1/RMS²` inverse-variance gated by weight/flag/valid-RMS masks with scalar fallback when no RMS file is supplied. Targeted RMS/weight tests and config parsing passed; full `test_ngmix.py` still has the pre-existing `ngmix.fitting.Fitter` API failure.
shuttle:
enabled: true
kind: oneshot
@@ -90,6 +90,16 @@ science decisions in this fiber's `report.html`/outcome rather than pushing to M
`test_background_rms_builds_per_pixel_inverse_variance`, and
`test_rescale_epoch_fluxes_scales_background_rms_like_image_counts` all
passed. `py_compile` and `git diff --check` were also clean.
+- **BACKGROUND_RMS config checkpoint:** `/tmp/pr740-wt` commit `ee87b5bd`
+ wires the active CFIS and CFIS simulation example configs into that code
+ path. The exposure SExtractor configs request `CHECKIMAGE = BACKGROUND,
+ BACKGROUND_RMS`; the multi-epoch vignetmaker configs add
+ `background_rms` beside `background`; and the ngmix templates set
+ `BKG_RMS_VIGNET_PATH` to the expected
+ `background_rms_vignet{file_number_string}.sqlite` output. Verification:
+ the three targeted RMS/weight tests pass in the dev image; all touched
+ example configs parse with `ConfigParser(interpolation=None)`; `git diff
+ --check` is clean.
- **Current test caveat:** the targeted unit test passes in the dev image. The full
`src/shapepipe/tests/test_ngmix.py` file currently still fails in the existing
metacal smoke test because the container's `ngmix.fitting` has no `Fitter`
From 85af386023458549d03212c4c75602bd22c1d979 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Sat, 6 Jun 2026 02:18:18 +0200
Subject: [PATCH 22/29] felt: record ngmix rms input guard checkpoint
---
.../ngmix-weights-ivar/ngmix-weights-ivar.md | 25 +++++++++++--------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
index b9bdcc53c..4ea2b7423 100644
--- a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -1,19 +1,18 @@
---
id: 01KTCQPE3JGEYN7NQS8HW1AT6B
name: 'ngmix weight map: fix v2.0 regressions + inverse-variance (#604)'
+status: active
tags:
- - constitution
- - shapepipe
- - ngmix
+ - constitution
+ - shapepipe
+ - ngmix
created-at: 2026-06-05T22:30:49.970813955+02:00
-outcome: |-
- #604 implementation is now wired through code and active example configs in `/tmp/pr740-wt` through commit `ee87b5bd`: SExtractor emits `BACKGROUND_RMS`, vignetmaker stores `background_rms_vignet*.sqlite`, ngmix templates point `BKG_RMS_VIGNET_PATH` at that sqlite, and ngmix builds per-pixel `1/RMS²` inverse-variance gated by weight/flag/valid-RMS masks with scalar fallback when no RMS file is supplied. Targeted RMS/weight tests and config parsing passed; full `test_ngmix.py` still has the pre-existing `ngmix.fitting.Fitter` API failure.
+outcome: '#604 implementation is wired through code and active example configs in /tmp/pr740-wt through commit 22e31e51: SExtractor emits BACKGROUND_RMS, vignetmaker stores background_rms_vignet*.sqlite, ngmix templates point BKG_RMS_VIGNET_PATH at that sqlite, ngmix builds per-pixel 1/RMS² inverse-variance gated by weight/flag/valid-RMS masks with scalar fallback when no RMS file is supplied, and test_ngmix now also guards the optional seventh RMS sqlite input contract. Focused RMS/weight tests and py_compile passed in the dev image; full test_ngmix.py still has the pre-existing ngmix.fitting.Fitter API failure.'
shuttle:
- enabled: true
- kind: oneshot
- host: candide
- agent: codex
-status: active
+ enabled: true
+ kind: oneshot
+ host: candide
+ agent: codex
---
# ngmix weight map: fix v2.0 regressions + inverse-variance (#604)
@@ -100,6 +99,12 @@ science decisions in this fiber's `report.html`/outcome rather than pushing to M
the three targeted RMS/weight tests pass in the dev image; all touched
example configs parse with `ConfigParser(interpolation=None)`; `git diff
--check` is clean.
+- **Optional RMS input-contract guard:** `/tmp/pr740-wt` commit `22e31e51`
+ adds `test_ngmix_accepts_optional_background_rms_vignet`, which
+ instantiates `Ngmix` with the seventh `BACKGROUND_RMS` sqlite input and
+ verifies `Vignet` opens it. Verification observed in the dev image: the
+ focused RMS/weight pytest selection passed, `py_compile` passed for touched
+ ngmix files, and `git diff --check` was clean.
- **Current test caveat:** the targeted unit test passes in the dev image. The full
`src/shapepipe/tests/test_ngmix.py` file currently still fails in the existing
metacal smoke test because the container's `ngmix.fitting` has no `Fitter`
From ffb424fef5cb03095d0ccde9ba7e1b88b845d8ed Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Wed, 10 Jun 2026 04:06:35 +0200
Subject: [PATCH 23/29] =?UTF-8?q?felt:=20ngmix-size-columns=20realized=20?=
=?UTF-8?q?=E2=80=94=20three=20branches=20landed,=20handoff=20for=20review?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
shapepipe fix/ngmix-size-columns, cs_util feat/size-conversions (fork),
sp_validation fix/psf-leakage-fwhm. Also corrects size-report.md's
error-prop claim (old r50_err_PSFo was a factor-2 over-estimate).
Co-Authored-By: Claude Fable 5
---
.felt/review-ngmix-v2-pr740/size-report.md | 2 +-
.../ngmix-size-columns/ngmix-size-columns.md | 17 +-
.../shapepipe/ngmix-size-columns/report.html | 185 ++++++++++++++++++
3 files changed, 194 insertions(+), 10 deletions(-)
create mode 100644 .felt/shapepipe/ngmix-size-columns/report.html
diff --git a/.felt/review-ngmix-v2-pr740/size-report.md b/.felt/review-ngmix-v2-pr740/size-report.md
index 72ac3501a..e43e997c3 100644
--- a/.felt/review-ngmix-v2-pr740/size-report.md
+++ b/.felt/review-ngmix-v2-pr740/size-report.md
@@ -20,7 +20,7 @@ In ngmix `gauss`, `pars = [cen1, cen2, g1, g2, T, flux]` and **`pars[4] = T = 2
| `T_psfo_ngmix`, `T_err_psfo_ngmix` | `T_PSFo`, `T_err_PSFo` | `2σ_psf²` | — | Correct; `T_psfo_ngmix` **duplicates** `Tpsf` |
| `r50psf` | `r50_PSFo` = `√(T_psf/2)` | **`σ_psf` (length)** | **No — it's σ, off by 1.1774×** | Genuine length, missing the factor |
| `r50_psfo_ngmix` | `r50_PSFo` | `σ_psf` | No — it's σ | **Duplicates `r50psf`** |
-| `r50_err_psfo_ngmix` | `T_psf_err/(2·r50_psfo)` | `σ_psf_err` | No — error on σ | Correct error-prop of σ; NaN when σ=0 |
+| `r50_err_psfo_ngmix` | `T_psf_err/(2·r50_psfo)` | `σ_psf_err` | No — error on σ | ~~Correct error-prop of σ~~ **also a factor-2 over-estimate: dσ/dT = 1/(4σ), not 1/(2σ)** (caught in [[shapepipe/ngmix-size-columns]] implementation); NaN when σ=0 |
**Headline hazard:** the same name root `r50` means an **area** on the galaxy side and a **length** on the PSF side. A user ratioing galaxy `r50` against `r50psf` divides an area by a length. **Zero columns in the file are a true half-light radius** — every "r50" is either `T` (galaxy) or `σ` (PSF).
diff --git a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
index 9b737a609..e96405b10 100644
--- a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
+++ b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
@@ -1,18 +1,17 @@
---
id: 01KTCQPE4M6S1E4V9WMPCAFCKT
name: 'ngmix size columns: honest r50 + cs_util converter web'
+status: active
tags:
- - constitution
- - shapepipe
- - ngmix
+ - constitution
+ - shapepipe
+ - ngmix
created-at: 2026-06-05T22:30:50.004535516+02:00
-outcome: Draft — not yet dispatched.
+outcome: 'Realized, pending fresh-eyes review: shapepipe fix/ngmix-size-columns (honest r50 + dedupe, pushed), cs_util feat/size-conversions (size web + tests, on cailmdaley fork — no push rights upstream), sp_validation fix/psf-leakage-fwhm (T_to_fwhm bug fixed — moves α(size); pushed). PRs + merge order (cs_util → sp_validation) are Cail''s gesture; see report.html.'
shuttle:
- enabled: true
- kind: oneshot
- host: candide
- agent: claude-opus
-status: active
+ kind: oneshot
+ host: candide
+ agent: claude-fable
---
# ngmix size columns: honest r50 + cs_util converter web
diff --git a/.felt/shapepipe/ngmix-size-columns/report.html b/.felt/shapepipe/ngmix-size-columns/report.html
new file mode 100644
index 000000000..398facb8a
--- /dev/null
+++ b/.felt/shapepipe/ngmix-size-columns/report.html
@@ -0,0 +1,185 @@
+
+
+
+
+
+Report — ngmix-size-columns
+
+
+
+
+
+
+
+
+
New cs_util/size.py: the conversion web (T ↔ σ ↔ r50 ↔ FWHM) with exact constants
+ and 6 tests (closed forms, round trips, FWHM = 2·r50). Full suite 24/24 green.
+ On my fork — you have no push rights on CosmoStat/cs_util.
Retires the dimensionally wrong local T_to_fwhm; galaxy.py re-exports
+ T_to_fwhm/sigma_to_fwhm from cs_util.size, so all
+ star-import consumers (extract_info.py, the notebooks' size_to_fwhm
+ dispatch) keep working unchanged. Verified in the sp_validation container:
+ T_to_fwhm(2.0) → 2.35482 (old code gave 4.0003).
+
+
+
+
+ The science-number change: the sp_validation fix corrects the
+ fwhm_PSF regressor feeding the scale-dependent PSF-leakage fit
+ (run_object.py:510, size_PSF_col). Any α(PSF-size) trend or per-bin
+ leakage coefficient computed with the old function is biased by a monotonic-but-nonlinear
+ distortion of the size axis. Spatially-constant leakage is unaffected. In-flight pure_eb /
+ leakage results produced with the old code should be regenerated after this merges.
+
+
+
Merge order: cs_util first (sp_validation's import fails loudly without it —
+ intentional fail-fast), then sp_validation; the shapepipe branch is independent and can merge any
+ time. No PRs opened — that's your gesture. cs_util will also need a release (or a git-ref pin) before
+ sp_validation's CI can pass.
+
+
+
+
+
+
§ Findings
+
What surfaced beyond the spec.
+
+
The old PSF r50 error was also a factor-2 over-estimate
+
Beyond the missing 1.1774 factor, r50_err_PSFo = T_err/(2σ) was wrong as an error
+ on σ: for σ = √(T/2), dσ/dT = 1/(4σ), not 1/(2σ). (The size-report table had marked this
+ "correct error-prop of σ" — it wasn't.) Both verified by finite differences; the new code carries
+ r50_err = √(2ln2)·T_err/(4σ).
+
+
Remaining duplicates, deliberately left
+
g1/g2_psfo_ngmix ≡ g1/g2_psf in the ngmix output (both read from
+ g_PSFo) — same disease, different organ; out of this fiber's size scope. Downstream,
+ make_cat writes both NGMIX_Tpsf_* and NGMIX_T_PSFo_* from the
+ now-identical source. Candidates for a follow-up dedupe pass.
+
+
shapepipe keeps a local constant, for now
+
ngmix.py defines SIGMA_TO_R50 = √(2 ln 2) locally rather than importing
+ cs_util.size — shapepipe pins released cs_util (>=0.1.9), and making the
+ branch depend on an unreleased fork would block its CI. Once cs_util ships the size module, a
+ one-line follow-up can switch the import.
+
+
Exact constants over truncated literals
+
cs_util uses √(2ln2) / 2√(2ln2) exactly; sp_validation's old 2.355 and
+ 1.17741 literals are retired with the functions. FWHM values shift at the 4th decimal
+ (2.355 → 2.35482) — cosmetically visible if anyone diffs old catalogs against new.
+
+
Pre-existing test failure, not mine
+
test_metacal_is_reproducible_with_fixed_seed fails in the local
+ shapepipe-dev container on clean ngmix_v2.0 too (container/ngmix version
+ skew, most likely) — CI on the pushed branch is the arbiter. The five other tests, including my two
+ new ones, pass against the branch code.
+
+
+
+
+
+
§ Open questions
+
Your calls, none blocking.
+
+
+
Drop-vs-alias — resolved in favor of drop, verify you agree. I searched
+ sp_validation (src, notebooks, .ipynb) and all of shapepipe: the only consumer of the size
+ _psfo_ngmix names was make_cat.py, now repointed. Zero external hits, so
+ I dropped rather than aliased, per the constitution's default.
+
Catalogue version bump: on-disk r50* semantics change between
+ v2.0 pre-/post-fix files. Does this warrant a format-version note/changelog entry, and where?
+
Leakage regeneration: were any current/in-flight scale-dependent leakage
+ results produced with the buggy T_to_fwhm? How much does α(size) move? (Worth a
+ before/after on one catalog when convenient.)
+
cs_util path to merge: branch lives on the cailmdaley/cs_util fork.
+ PR to CosmoStat/cs_util + release (0.3?), then bump pins in sp_validation (and optionally switch
+ shapepipe's local constant to the import).
+
Galaxy r50_err form: implemented as first-order propagation from
+ T_err (r50_err = r50·T_err/2T), per spec. If you'd rather use the full
+ pars covariance, that's a writer-side swap.
+
+
+
+
+
+
From c1e1a12456fafe3443b3793c9b53a96c48775750 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Wed, 10 Jun 2026 04:16:43 +0200
Subject: [PATCH 24/29] =?UTF-8?q?felt:=20ngmix-size-columns=20fresh-eyes?=
=?UTF-8?q?=20review=20=E2=80=94=20verified,=20closed?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
All three branches independently re-reviewed and re-tested; no code
defects. Report corrected: sp_validation CI is green-but-vacuous on
the cs_util.size import (suite never imports galaxy.py, image carries
released cs-util 0.1.9), so merge order is discipline-enforced.
Co-Authored-By: Claude Fable 5
---
.../ngmix-size-columns/ngmix-size-columns.md | 5 +--
.../shapepipe/ngmix-size-columns/report.html | 36 +++++++++++++++----
2 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
index e96405b10..e91a0921a 100644
--- a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
+++ b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
@@ -1,13 +1,14 @@
---
id: 01KTCQPE4M6S1E4V9WMPCAFCKT
name: 'ngmix size columns: honest r50 + cs_util converter web'
-status: active
+status: closed
tags:
- constitution
- shapepipe
- ngmix
created-at: 2026-06-05T22:30:50.004535516+02:00
-outcome: 'Realized, pending fresh-eyes review: shapepipe fix/ngmix-size-columns (honest r50 + dedupe, pushed), cs_util feat/size-conversions (size web + tests, on cailmdaley fork — no push rights upstream), sp_validation fix/psf-leakage-fwhm (T_to_fwhm bug fixed — moves α(size); pushed). PRs + merge order (cs_util → sp_validation) are Cail''s gesture; see report.html.'
+outcome: |-
+ Realized + fresh-eyes reviewed, no code defects: shapepipe fix/ngmix-size-columns (honest r50 + dedupe, pushed), cs_util feat/size-conversions (size web, 24/24, on cailmdaley fork — no upstream push rights), sp_validation fix/psf-leakage-fwhm (T_to_fwhm bug fixed — moves α(size); pushed). PRs + merge order (cs_util release → sp_validation) are Cail's gesture; caution: sp_validation CI is green-but-vacuous on the cs_util.size import, ordering is discipline-enforced. See report.html.
shuttle:
kind: oneshot
host: candide
diff --git a/.felt/shapepipe/ngmix-size-columns/report.html b/.felt/shapepipe/ngmix-size-columns/report.html
index 398facb8a..f1004ac28 100644
--- a/.felt/shapepipe/ngmix-size-columns/report.html
+++ b/.felt/shapepipe/ngmix-size-columns/report.html
@@ -70,10 +70,17 @@
Honest r50 columns, a shared size web, and the leakage-axis fix.
§ Current state
-
All three branches landed; PR creation is yours.
+
All three branches landed and fresh-eyes reviewed; PR creation is yours.
Every piece of the constitution's desired state is implemented, tested, and pushed.
- Three branches across three repos, in dependency order:
+ A second session independently reviewed all three diffs: re-derived both error propagations
+ (galaxy r50·T_err/2T and PSF √(2ln2)·T_err/(4σ) are the same formula —
+ consistent), confirmed zero remaining consumers of the retired _psfo_ngmix size
+ columns (only make_cat, repointed), confirmed the dropped pixel_size
+ kwarg on sigma_to_fwhm has no callers, re-ran the tests (shapepipe size tests pass
+ in the dev container; cs_util 24/24; T_to_fwhm(2.0) → 2.35482 reproduced in the
+ sp_validation container against the branch), and verified all three push SHAs. No code defects
+ found. Three branches across three repos, in dependency order:
Repo
Branch
What it does
@@ -115,8 +122,18 @@
All three branches landed; PR creation is yours.
Merge order: cs_util first (sp_validation's import fails loudly without it —
intentional fail-fast), then sp_validation; the shapepipe branch is independent and can merge any
- time. No PRs opened — that's your gesture. cs_util will also need a release (or a git-ref pin) before
- sp_validation's CI can pass.
+ time. No PRs opened — that's your gesture. cs_util needs a release (or a git-ref pin in
+ sp_validation) before the sp_validation branch is safe to merge.
+
+
+ CI will not police the merge order. sp_validation's CI is already green on the
+ pushed branch (run 27248208943) — but vacuously: the image installs released
+ cs-util 0.1.9 (no size module), and the test suite never imports
+ galaxy.py (tests cover cosmo_val/cosmology/survey/plots
+ only). Merging sp_validation before the cs_util release would pass CI and then raise
+ ImportError at runtime, in extract_info.py / the notebooks. The
+ ordering is enforced by discipline, not automation.
+
@@ -150,9 +167,14 @@
Exact constants over truncated literals
Pre-existing test failure, not mine
test_metacal_is_reproducible_with_fixed_seed fails in the local
- shapepipe-dev container on clean ngmix_v2.0 too (container/ngmix version
- skew, most likely) — CI on the pushed branch is the arbiter. The five other tests, including my two
- new ones, pass against the branch code.
+ shapepipe-dev container — reproduced by the review session on clean
+ ngmix_v2.0 with the identical error (ngmix.fitting has no
+ Fitter: the local container carries an older ngmix than the branch expects). The
+ other tests, including the two new size tests, pass against the branch code. Note shapepipe's
+ deploy-image.yml triggers only on PRs/pushes targeting
+ develop/main/master — so a PR of this branch into
+ ngmix_v2.0 won't run CI by itself; the suite gets arbitrated when the change rides
+ PR #740's chain toward develop.
From 021eafa976be18731e3693d83af102f000ee0100 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Thu, 11 Jun 2026 01:43:53 +0200
Subject: [PATCH 25/29] felt: sync fiber store
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Backfill ULID ids across 19 fibers; close docs-versioning,
smoke-test-read-only, docker-uv-revert (superseded by #733);
refresh shapepipe.md active-threads list to current PRs (#737–741);
add np-str0-numpy2 fiber; minor outcome/status normalizations.
Co-Authored-By: Claude Fable 5
---
.../dependabot-pr-triage.md | 1 +
.felt/docker-multistage/docker-multistage.md | 1 +
.felt/docker-uv-revert/docker-uv-revert.md | 6 ++-
.felt/fabian-coord-bug/fabian-coord-bug.md | 1 +
.felt/ngmix-update/ngmix-update.md | 1 +
.felt/prs-in-flight/prs-in-flight.md | 1 +
.../review-ngmix-v2-pr740.md | 9 +++--
.felt/shapepipe.md | 20 ++++++----
.../ci-develop-trigger/ci-develop-trigger.md | 1 +
.../ci-green-on-develop.md | 1 +
.../cleanup-rhostats-jobscripts.md | 2 +
.../docs-cluster-tree/docs-cluster-tree.md | 1 +
.../docs-versioning/docs-versioning.md | 5 ++-
.../ngmix-size-columns/ngmix-size-columns.md | 3 +-
.../ngmix-weights-ivar/ngmix-weights-ivar.md | 2 +-
.../np-str0-numpy2/np-str0-numpy2.md | 40 +++++++++++++++++++
.../smoke-test-read-only.md | 17 ++------
.felt/shapepipe/test-suite/test-suite.md | 2 +
.felt/shapepipe/v2-run-plan/v2-run-plan.md | 1 +
.../sqlitedict-pickle-smell.md | 1 +
20 files changed, 85 insertions(+), 31 deletions(-)
create mode 100644 .felt/shapepipe/np-str0-numpy2/np-str0-numpy2.md
diff --git a/.felt/dependabot-pr-triage/dependabot-pr-triage.md b/.felt/dependabot-pr-triage/dependabot-pr-triage.md
index 804ea499e..a814120c8 100644
--- a/.felt/dependabot-pr-triage/dependabot-pr-triage.md
+++ b/.felt/dependabot-pr-triage/dependabot-pr-triage.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZSHQXRQHTHH9Q7Q4MT7
name: Triage open dependabot uv.lock PRs
status: closed
tags:
diff --git a/.felt/docker-multistage/docker-multistage.md b/.felt/docker-multistage/docker-multistage.md
index 94951a708..8b5ed1076 100644
--- a/.felt/docker-multistage/docker-multistage.md
+++ b/.felt/docker-multistage/docker-multistage.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZV7263MYPNB8DT1FXPX
name: 'Docker multi-stage: runtime + dev targets'
tags:
- shapepipe
diff --git a/.felt/docker-uv-revert/docker-uv-revert.md b/.felt/docker-uv-revert/docker-uv-revert.md
index 1a68f4c39..a1f4a16d0 100644
--- a/.felt/docker-uv-revert/docker-uv-revert.md
+++ b/.felt/docker-uv-revert/docker-uv-revert.md
@@ -1,12 +1,14 @@
---
+id: 01KTCHWZW8DT8VRYQJ4CEPDRGB
name: 'Docker: revert skaha→python base, switch to uv lockfile'
-status: active
+status: closed
tags:
- shapepipe
- docker
- infra
created-at: 2026-04-27T11:26:45.677512058+02:00
-outcome: 'PR #719 (chore: switch Dockerfile to slim Python + uv lockfile) opened and CI-green on first try (3m31s); ready for Martin''s review. Drops conda double-install, makes pyproject SSOT + uv.lock the pinned manifest, switches WeightWatcher from sed-patched source build to Debian''s pre-patched 1.12+dfsg-3 package, adds binary smoke tests to deploy-image.yml.'
+closed-at: 2026-06-10T17:14:45.965931602+02:00
+outcome: 'Superseded: conda removal landed via #733 (merged); the #719-era docker-uv work is absorbed into ci-green-on-develop, which tracks remaining follow-ups.'
decisions:
base:
label: Base image
diff --git a/.felt/fabian-coord-bug/fabian-coord-bug.md b/.felt/fabian-coord-bug/fabian-coord-bug.md
index 66213d20c..bcbbc1bb2 100644
--- a/.felt/fabian-coord-bug/fabian-coord-bug.md
+++ b/.felt/fabian-coord-bug/fabian-coord-bug.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZWEXTJ337APFSH4NF6X
name: Fabian's coord-propagation bug + image-sim code on github
tags:
- shapepipe
diff --git a/.felt/ngmix-update/ngmix-update.md b/.felt/ngmix-update/ngmix-update.md
index 2df017deb..bd11d135f 100644
--- a/.felt/ngmix-update/ngmix-update.md
+++ b/.felt/ngmix-update/ngmix-update.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZWY48FVCEDX6DGE6CTY
name: ngmix library upgrade + Lucy wrapper sync
tags:
- shapepipe
diff --git a/.felt/prs-in-flight/prs-in-flight.md b/.felt/prs-in-flight/prs-in-flight.md
index 5d8923e83..364db869f 100644
--- a/.felt/prs-in-flight/prs-in-flight.md
+++ b/.felt/prs-in-flight/prs-in-flight.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZWYYTDSYHAP4FPE7V3V
name: PRs in flight after v2 merge
tags:
- shapepipe
diff --git a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
index 6255edabb..c44543eb8 100644
--- a/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
+++ b/.felt/review-ngmix-v2-pr740/review-ngmix-v2-pr740.md
@@ -1,22 +1,23 @@
---
id: 01KTCHWZX873VG28AA663A3ZQE
name: 'Review + work: ngmix v2.0 (PR #740)'
-status: active
+status: closed
tags:
- constitution
- shapepipe
- ngmix
- review
created-at: 2026-06-01T11:50:17.967872934+02:00
+closed-at: 2026-06-05T23:59:38.502Z
outcome: 'INTERACTIVE — live with Cail. Bucket-A cleanups PUSHED to ngmix_v2.0/#741 (dd4f656a..bd60dc8e: 293/1140/254-resume/766 + fitting.py deleted); CI running. Dev image built (/n17data/cdaley/containers/shapepipe-dev) + ngmix 2.4.0 installed → full suite + example pipeline now runnable. Ran a 6+2-agent WORKFLOW on the two hard problems → two decision-ready reports in fiber: weights-report.md, size-report.md, deep-dive-report.html (sent to Cail). WEIGHTS (#604+949): found TWO coupled regressions in prepare_ngmix_weights (ngmix.py:871) — R1 noise estimator regressed from object-free windowed get_noise (now DEAD at :826) to flux-contaminated whole-stamp sigma_mad; R2 lost the v1 binarization → double-counts a real inverse-variance map. EMPIRICALLY CONFIRMED (truth ivar 1e6 → recovered 8.8e11, ratio≈1/noise²). Clean red→green test target via make_data. Rec: SPLIT — minimal v1-restore (reinstate binarization + get_noise) + test in #741; SExtractor BACKGROUND_RMS baseline as a SEPARATE PR (Codex shuttle, ~4-8h, closes #604). SIZE (r50/T): galaxy r50=pars[4]=T (area), PSF r50psf=σ; neither is 1.1774σ; all 5 r50* cols new in v2.0. UNIONS-3500 WL I (arXiv:2605.13549) reports half-light radius r_h as PRIMARY. Rec: TRANSFORM at source (honest r50 in ngmix) + cs_util converter web; bonus find — sp_validation galaxy.py:T_to_fwhm is dimensionally wrong (feeds the scale-dependent PSF-leakage fit). Both work-streams now DISPATCHED as shuttles (Cail''s go): [[ngmix-weights-ivar]] (Codex — minimal #741 regression fix w/ red→green test + the SExtractor BACKGROUND_RMS baseline as a separate PR closing #604) and [[ngmix-size-columns]] (claude-opus — honest r50 at the ngmix source + cs_util converter web + the sp_validation T_to_fwhm leakage fix). Workers prepare branches/commits/PR-descriptions + reports; merge/PR-creation stays Cail''s. STILL OPEN for the eventual #741 reply (not in these streams): *_psfo reconvolved-PSF merge-gate (1045) + the runner-decorator contract (testable via the example pipeline). PLAN: let the two shuttles finish → reassess → write the Martin reply. No reply drafted yet.'
horizon: now
shuttle:
- agent: claude-opus
enabled: true
- host: candide
- interactive: true
kind: oneshot
+ interactive: true
+ host: candide
project_dir: /automnt/n17data/cdaley/unions/shapepipe
+ agent: claude-opus
---
# Review + work: ngmix v2.0 (PR #740)
diff --git a/.felt/shapepipe.md b/.felt/shapepipe.md
index 40d321969..502e65d3e 100644
--- a/.felt/shapepipe.md
+++ b/.felt/shapepipe.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHX00NMQ1VDPGXRYJ6RGZR
name: ShapePipe maintenance & PRs
tags:
- shapepipe
@@ -26,17 +27,20 @@ Surfaced over a 2026-04-27 walking conversation. Captured in
Disclosure on Claude-only review per
`feedback_claude_only_review_disclosure`.
-## Active threads
+## Active threads (refreshed 2026-06-10)
-- **[[shapepipe/docker-uv-revert]]** — slim Python + uv lockfile, drop conda. PR #719 (draft).
-- **[[shapepipe/prs-in-flight]]** — tracking #708 (testing scaffold), #714 (develop bugs), #719 (this one).
+- **PR #737** (MPI on candide + containerized SLURM scripts) — **MERGED 2026-06-10** (rebased onto develop, both CI runs green).
+- **PR #738** (versioned docs + switcher) — **MERGED 2026-06-10** after independent review; sfarrens offered post-hoc comments. Stable root refreshes on next master push. See [[docs-versioning]].
+- **PR #739** (machine-specific cluster docs tree) — awaiting Martin's review; check rebase state after the #737/#738 merges. See [[docs-cluster-tree]].
+- **PR #741** (ngmix v2.0, CI mirror of #740) — Martin left 10 inline comments Jun 5, no verdict yet; shear recovery verified unbiased to ~1e-4 in m. See [[ngmix-weights-ivar]] (PR-ready regression fix + #604 ivar plan) and [[ngmix-size-columns]] (honest r50 spec).
+- **Martin's PRs #704 (contributors) & #699 (coverage mask)** — Cail's review requested; both conflicting, need Martin's rebase first.
+- **[[ci-green-on-develop]]** — conda fully removed (#733 merged); remaining follow-ups tracked there.
-## Future work
+## Earlier threads (superseded)
-- **[[shapepipe/ngmix-update]]** — replace Axel's stable_version fork
- with upstream ngmix; reconcile with Lucy's wrapper.
-- **[[shapepipe/fabian-coord-bug]]** — port Fabian's 1-line coord
- propagation fix; first need his image-sim code on github.
+- [[shapepipe/docker-uv-revert]] / [[shapepipe/prs-in-flight]] — the #719-era conda removal; landed via #733, current state in [[ci-green-on-develop]].
+- **[[shapepipe/ngmix-update]]** — became the #740/#741 ngmix v2.0 review thread.
+- **[[shapepipe/fabian-coord-bug]]** — port Fabian's 1-line coord propagation fix; still pending his image-sim code reaching github.
## Conventions specific to this repo
diff --git a/.felt/shapepipe/ci-develop-trigger/ci-develop-trigger.md b/.felt/shapepipe/ci-develop-trigger/ci-develop-trigger.md
index 29ab2f689..c48657774 100644
--- a/.felt/shapepipe/ci-develop-trigger/ci-develop-trigger.md
+++ b/.felt/shapepipe/ci-develop-trigger/ci-develop-trigger.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZX8GRCMJHGCRARBGFS8
name: CI silently broken on develop; install_shapepipe conda.sh lookup
status: closed
tags:
diff --git a/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md b/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md
index 3b304539f..d32d4b1d5 100644
--- a/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md
+++ b/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZX9Q1MG2FB20N5Y52TD
name: 'ShapePipe CI: green & trustworthy test suite on develop'
status: open
tags:
diff --git a/.felt/shapepipe/cleanup-rhostats-jobscripts/cleanup-rhostats-jobscripts.md b/.felt/shapepipe/cleanup-rhostats-jobscripts/cleanup-rhostats-jobscripts.md
index 3e93768b8..a7fda558f 100644
--- a/.felt/shapepipe/cleanup-rhostats-jobscripts/cleanup-rhostats-jobscripts.md
+++ b/.felt/shapepipe/cleanup-rhostats-jobscripts/cleanup-rhostats-jobscripts.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZXE6Z6MG1P21F804RZ2
name: 'ShapePipe cleanup: remove obsolete rho-stats/stile; modernize candide job scripts'
status: closed
tags:
@@ -35,6 +36,7 @@ shuttle:
id: 7c02b521-17bd-44dc-ad10-709520d6d2bf
agent: claude-opus
dispatched_at: 2026-05-30T20:29:24.913621662Z
+tempered: true
---
## Desired State
diff --git a/.felt/shapepipe/docs-cluster-tree/docs-cluster-tree.md b/.felt/shapepipe/docs-cluster-tree/docs-cluster-tree.md
index c387c0980..bf05714e0 100644
--- a/.felt/shapepipe/docs-cluster-tree/docs-cluster-tree.md
+++ b/.felt/shapepipe/docs-cluster-tree/docs-cluster-tree.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZY2NPWM4RQ7VAZK7WFE
name: Machine-specific cluster docs tree + freshness pass
tags:
- shapepipe
diff --git a/.felt/shapepipe/docs-versioning/docs-versioning.md b/.felt/shapepipe/docs-versioning/docs-versioning.md
index 7246d684a..18909ea5b 100644
--- a/.felt/shapepipe/docs-versioning/docs-versioning.md
+++ b/.felt/shapepipe/docs-versioning/docs-versioning.md
@@ -1,11 +1,14 @@
---
+id: 01KTCHWZYMXBPT7S0NXJ7AT2CD
name: Versioned docs site with a version switcher
+status: closed
tags:
- shapepipe
- docs
- ci
created-at: 2026-05-31T20:39:06.459699229+02:00
-outcome: 'PR #738 (off develop): docs site now versioned — stable@root (non-destructive to existing URLs), develop@/latest/, tags@/vX.Y.Z/, switcher.json drives the dropdown. Fixes that the site published only from master (~78 commits behind) so container.md/canfar.md were never published. CI green incl. real dev-image build + PR-preview artifact.'
+closed-at: 2026-06-10T17:44:55.028992495+02:00
+outcome: 'Shipped: PR #738 merged 2026-06-10 (independent review pass, sfarrens offered post-hoc comments). master→root, develop→/latest/, tags→//, per-ref builds, switcher.json at root. Stable root refreshes on next master push.'
---
## Why
diff --git a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
index e91a0921a..8cd420401 100644
--- a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
+++ b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
@@ -7,8 +7,7 @@ tags:
- shapepipe
- ngmix
created-at: 2026-06-05T22:30:50.004535516+02:00
-outcome: |-
- Realized + fresh-eyes reviewed, no code defects: shapepipe fix/ngmix-size-columns (honest r50 + dedupe, pushed), cs_util feat/size-conversions (size web, 24/24, on cailmdaley fork — no upstream push rights), sp_validation fix/psf-leakage-fwhm (T_to_fwhm bug fixed — moves α(size); pushed). PRs + merge order (cs_util release → sp_validation) are Cail's gesture; caution: sp_validation CI is green-but-vacuous on the cs_util.size import, ordering is discipline-enforced. See report.html.
+outcome: 'Realized + fresh-eyes reviewed, no code defects: shapepipe fix/ngmix-size-columns (honest r50 + dedupe, pushed), cs_util feat/size-conversions (size web, 24/24, on cailmdaley fork — no upstream push rights), sp_validation fix/psf-leakage-fwhm (T_to_fwhm bug fixed — moves α(size); pushed). PRs + merge order (cs_util release → sp_validation) are Cail''s gesture; caution: sp_validation CI is green-but-vacuous on the cs_util.size import, ordering is discipline-enforced. See report.html.'
shuttle:
kind: oneshot
host: candide
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
index 4ea2b7423..3b246e330 100644
--- a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -1,7 +1,7 @@
---
id: 01KTCQPE3JGEYN7NQS8HW1AT6B
name: 'ngmix weight map: fix v2.0 regressions + inverse-variance (#604)'
-status: active
+status: open
tags:
- constitution
- shapepipe
diff --git a/.felt/shapepipe/np-str0-numpy2/np-str0-numpy2.md b/.felt/shapepipe/np-str0-numpy2/np-str0-numpy2.md
new file mode 100644
index 000000000..d23de432a
--- /dev/null
+++ b/.felt/shapepipe/np-str0-numpy2/np-str0-numpy2.md
@@ -0,0 +1,40 @@
+---
+id: 01KTK3WGKFCGD2VQAQSABC3TBJ
+name: np.str0 fix (NumPy 2 aftershock)
+status: closed
+tags:
+ - shapepipe
+ - numpy
+ - bugfix
+created-at: 2026-06-08T09:59:18.639370578+02:00
+closed-at: 2026-06-10T17:13:38.507495758+02:00
+outcome: 'np.str0 removed from file_io.py on develop (verified 2026-06-10: only np.str_ remains); NumPy 2 string-column saves work. Regression guard idea (NAME column in roundtrip test) noted in body if ever needed.'
+---
+
+Fabian Hervas-Peters hit `module 'numpy' has no attribute 'str0'` running the
+new container against `src/shapepipe/pipeline/file_io.py`. Root cause: the
+container now ships **NumPy 2.4.4** (`pyproject` pins `numpy>=2.0`), and
+`np.str0` — a deprecated alias for `np.str_` — was *removed* in NumPy 2.0.
+
+The offending line was `_get_fits_col_type`'s string branch:
+`elif type(col_data[0]) in [str, np.str_, np.str0]:`. The fix is to drop the
+dead alias — `np.str_` already covers it, so nothing is lost.
+
+**Why the test suite never caught it.** The list literal `[str, np.str_, np.str0]`
+is only *evaluated* when that `elif` is actually reached, i.e. only when a column
+whose first element is a string flows through `save_as_fits`. The existing
+`test_fits_catalogue_table_roundtrips` saved only a `float64` and an `int16`
+column — both short-circuit on earlier branches and never touch the string path.
+So the `AttributeError` stayed dormant until Fabian saved a catalogue with a
+string column. The regression guard is a one-line addition: a `NAME` string
+column in the roundtrip test, which now exercises the `"A"` (FITS char) dispatch.
+
+Interestingly, `ngmix_v2.0` had *already* removed `np.str0` independently, so
+merging develop into it was a clean no-op on `file_io.py` and only carried over
+the improved test.
+
+General lesson: removed-NumPy-2 aliases (`np.str0`, `np.bool0`, `np.int0`,
+`np.object0`, `np.unicode_`, …) hide in rarely-reached branches and survive
+import — they only blow up at the call site. Worth a grep sweep when migrating a
+codebase to NumPy 2. A sweep of `src/shapepipe/` on 2026-06-08 found `str0` was
+the only remaining offender.
diff --git a/.felt/shapepipe/smoke-test-read-only/smoke-test-read-only.md b/.felt/shapepipe/smoke-test-read-only/smoke-test-read-only.md
index cba960b3c..2ff0b8cd6 100644
--- a/.felt/shapepipe/smoke-test-read-only/smoke-test-read-only.md
+++ b/.felt/shapepipe/smoke-test-read-only/smoke-test-read-only.md
@@ -1,23 +1,14 @@
---
+id: 01KTCHWZZ923H1H5DR6AS3Q5H6
name: Smoke test must work in read-only mode
+status: closed
tags:
- shapepipe
- docker
- infra
created-at: 2026-05-28T10:32:25.53742271+02:00
-outcome: |-
- `shapepipe_run -c /app/example/config.ini` fails on read-only SIF
- because the example config uses relative `OUTPUT_DIR = ./example/output`,
- which resolves under `WORKDIR=/app` — read-only in apptainer/SIF. Fix:
- add `scripts/sh/shapepipe_run_example.sh` wrapper that mktemp's a
- workdir, copies `/app/example/` into it, cd's, and execs `shapepipe_run`.
- Dockerfile's existing auto-symlink rule (`scripts/*/*.sh` →
- `/usr/local/bin/`) makes it usable as `shapepipe_run_example`
- on PATH. CI smoke step updated to call it under
- `docker run --read-only --tmpfs /tmp:rw` (emulates SIF). Drive-by:
- tightened `.gitignore` from `*shapepipe_run_*` (catches the wrapper)
- to `example/output/shapepipe_run_*`. Submitted as #731 from branch
- `chore/smoke-test-read-only`; awaiting CI.
+closed-at: 2026-06-10T17:13:38.485047103+02:00
+outcome: 'shapepipe_run_example wrapper (copy example to /tmp) landed via PR #731, merged to develop; CI smoke steps use it. Read-only SIF runs work.'
---
## The gap
diff --git a/.felt/shapepipe/test-suite/test-suite.md b/.felt/shapepipe/test-suite/test-suite.md
index 37e3d46ca..ffb9aab3a 100644
--- a/.felt/shapepipe/test-suite/test-suite.md
+++ b/.felt/shapepipe/test-suite/test-suite.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZZJ1H3D0HYETGFVMVZH
name: 'ShapePipe test suite: tiered, in-image, property-based'
status: closed
tags:
@@ -15,6 +16,7 @@ shuttle:
host: c03
project_dir: /automnt/n17data/cdaley/unions/shapepipe
agent: codex
+tempered: true
---
## Desired State
diff --git a/.felt/shapepipe/v2-run-plan/v2-run-plan.md b/.felt/shapepipe/v2-run-plan/v2-run-plan.md
index 537c85a96..b5a96e2b8 100644
--- a/.felt/shapepipe/v2-run-plan/v2-run-plan.md
+++ b/.felt/shapepipe/v2-run-plan/v2-run-plan.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHWZZKM6TYB7WS8J1TQWVY
name: v2.0 ShapePipe run — workflow wishlist
tags:
- shapepipe
diff --git a/.felt/sqlitedict-pickle-smell/sqlitedict-pickle-smell.md b/.felt/sqlitedict-pickle-smell/sqlitedict-pickle-smell.md
index f035953f5..c2fc64feb 100644
--- a/.felt/sqlitedict-pickle-smell/sqlitedict-pickle-smell.md
+++ b/.felt/sqlitedict-pickle-smell/sqlitedict-pickle-smell.md
@@ -1,4 +1,5 @@
---
+id: 01KTCHX00N4580BRMCH0FNADXV
name: sqlitedict pickle-by-default is a known smell
status: open
tags:
From 5261562777467f518a38dcfebfd6970f03e811ab Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Thu, 11 Jun 2026 01:44:59 +0200
Subject: [PATCH 26/29] =?UTF-8?q?chore:=20log=20PR=20#742=20triage=20?=
=?UTF-8?q?=E2=80=94=20deferred=20(pyproject.toml=20floor=20changes)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Co-Authored-By: Claude Fable 5
---
.felt/dependabot-pr-triage/dependabot-pr-triage.md | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/.felt/dependabot-pr-triage/dependabot-pr-triage.md b/.felt/dependabot-pr-triage/dependabot-pr-triage.md
index a814120c8..48a81c4f8 100644
--- a/.felt/dependabot-pr-triage/dependabot-pr-triage.md
+++ b/.felt/dependabot-pr-triage/dependabot-pr-triage.md
@@ -85,3 +85,7 @@ Update this fiber's `outcome:` to reflect what landed (e.g. "5 merged, 1 deferre
## Skills
Activate `felt` (you'll touch the fiber). The `shuttle` skill is already loaded by virtue of being the dispatched worker.
+
+## Follow-on: PR #742 (2026-06-11)
+
+PR #742 ("chore(deps): bump the lockfile-minor-patch group, 4 updates": mpi4py 4.1.1→4.1.2, numpy 2.4.4→2.4.6, snakemake 9.20.0→9.21.0, black) arrived after the original batch. CI green (`build-test-publish` passes). Not merged — pyproject.toml modified alongside uv.lock: lower bounds tightened (`mpi4py>=4.0→4.1.2`, `numpy>=2.0→2.4.6`, `snakemake>=9.20.0→9.21.0`). Tightening `>=` floors is a policy call, not a mechanical bump. Comment posted at https://github.com/CosmoStat/shapepipe/pull/742#issuecomment-4675762821 asking maintainers to confirm intent before merging.
From 198f6b678d81af2769a2e7e1a8e338e236780b47 Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Thu, 11 Jun 2026 02:37:54 +0200
Subject: [PATCH 27/29] felt: ngmix delivery outcomes + ci-green ground truth
(post constitution-sweep)
---
.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md | 2 +-
.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md | 2 +-
.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md | 5 +++--
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md b/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md
index d32d4b1d5..3ae6c6bc3 100644
--- a/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md
+++ b/.felt/shapepipe/ci-green-on-develop/ci-green-on-develop.md
@@ -7,7 +7,7 @@ tags:
- ci
- constitution
created-at: 2026-05-29T12:34:31.800189806+02:00
-outcome: 'Conda removed from ShapePipe (#733 merged); develop is green including image publish. The container (slim Python 3.12 + apt + uv lockfile) is the single source of truth and CI runs pytest inside the dev image. ci-release.yml + doc-tests.yml + install_shapepipe + environment*.yml deleted; cd.yml builds docs in the image; install docs rewritten. The conda path had rotted into a multi-layer install failure — deleted rather than repaired, so the production astropy bug is moot. Remaining for a clean/tight/shared dev environment: (1) commit a shared .felt in the repo + curate which fibers go public; (2) make repo CLAUDE.md a real onboarding doc; (3) port cluster job scripts off conda; (4) #729 dependabot actions (recreating); (5) verify docs deploy on master; (6) strengthen the thin test suite (#708).'
+outcome: 'Near-realized. Ground truth 2026-06-11: develop green incl. image publish; #708 closed (test scaffolding landed on develop: 5 structural test files + property/synthetic-FITS tiers); candide PBS scripts modernized to SLURM+apptainer (#737); CLAUDE.md already a solid onboarding doc (minor gaps: CONTRIBUTING pointer, first-PR walkthrough); docs deploy infra correct post-#738 — /latest/ live, switcher published, ROOT still serves 2022 v1.0.1 until the next master push (the one remaining trigger). Conda survives only in out-of-scope CANFAR/CC-IN2P3 scripts (job_sp_canfar.bash + init_run_exclusive_canfar.sh have ACTIVE conda paths; cc_{mpi,smp}.sh use ccenv anaconda) — needs its own cluster-aware pass to fully realize ''no conda anywhere''.'
---
## Desired State
diff --git a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
index 8cd420401..df39e2d62 100644
--- a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
+++ b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
@@ -7,7 +7,7 @@ tags:
- shapepipe
- ngmix
created-at: 2026-06-05T22:30:50.004535516+02:00
-outcome: 'Realized + fresh-eyes reviewed, no code defects: shapepipe fix/ngmix-size-columns (honest r50 + dedupe, pushed), cs_util feat/size-conversions (size web, 24/24, on cailmdaley fork — no upstream push rights), sp_validation fix/psf-leakage-fwhm (T_to_fwhm bug fixed — moves α(size); pushed). PRs + merge order (cs_util release → sp_validation) are Cail''s gesture; caution: sp_validation CI is green-but-vacuous on the cs_util.size import, ordering is discipline-enforced. See report.html.'
+outcome: 'Delivered end-to-end 2026-06-11: cs_util#65 merged + released as v0.2.1 on PyPI (size web, 26/26 — first release carrying cs_util.size); shapepipe#743 (honest r50 + dedupe, targeting ngmix_v2.0 per merge-base) merged, #741 CI green after; sp_validation#198 (T_to_fwhm fix + galaxy smoke test killing the green-but-vacuous CI hole, pin cs_util>=0.2.1) OPEN — blocked by shear_psf_leakage''s poetry cap cs-util ^0.1.0; relaxation branch fix/relax-cs-util-cap pushed to CosmoStat/shear_psf_leakage, PR creation is Cail''s click. Deferred (stated in #743 body): wiring honest r50/r50psf into make_cat so paper-convention r_h reaches the science catalogue.'
shuttle:
kind: oneshot
host: candide
diff --git a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
index 3b246e330..bd72a223b 100644
--- a/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
+++ b/.felt/shapepipe/ngmix-weights-ivar/ngmix-weights-ivar.md
@@ -1,13 +1,14 @@
---
id: 01KTCQPE3JGEYN7NQS8HW1AT6B
name: 'ngmix weight map: fix v2.0 regressions + inverse-variance (#604)'
-status: open
+status: closed
tags:
- constitution
- shapepipe
- ngmix
created-at: 2026-06-05T22:30:49.970813955+02:00
-outcome: '#604 implementation is wired through code and active example configs in /tmp/pr740-wt through commit 22e31e51: SExtractor emits BACKGROUND_RMS, vignetmaker stores background_rms_vignet*.sqlite, ngmix templates point BKG_RMS_VIGNET_PATH at that sqlite, ngmix builds per-pixel 1/RMS² inverse-variance gated by weight/flag/valid-RMS masks with scalar fallback when no RMS file is supplied, and test_ngmix now also guards the optional seventh RMS sqlite input contract. Focused RMS/weight tests and py_compile passed in the dev image; full test_ngmix.py still has the pre-existing ngmix.fitting.Fitter API failure.'
+closed-at: 2026-06-11T02:37:33.556855912+02:00
+outcome: 'Realized, fresh-eyes reviewed (no blockers), and delivered to PR #741 (CosmoStat/shapepipe, ngmix_v2.0): per-pixel 1/RMS² inverse-variance weights gated by weight/flag/valid-RMS masks with scalar sigma_mad fallback — feat f466c987, mask binarization 6ddea5b2, NaN-edge guard 4aa3b2a1, integration test pinning spatially-varying RMS through the full observation chain 158986a4, config wiring 7a44abda + all-or-nothing semantics documented 0bc6016e. Martin''s #604 weight-map thread answered with the commits. BKG_RMS_VIGNET_PATH is all-or-nothing per tile (missing file → FileNotFoundError; fallback only when option absent). Container gotcha: test from a scratch-wf worktree needs --bind /automnt/n17data and /automnt-prefixed paths (plain /n17data symlink invisible inside the sandbox); dev sandbox ships ngmix 1.3.6 so test_metacal_is_reproducible_with_fixed_seed only passes in CI''s v2 image.'
shuttle:
enabled: true
kind: oneshot
From ebaeae76517e30154dd686c404a52f925daafc3e Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Thu, 11 Jun 2026 03:01:16 +0200
Subject: [PATCH 28/29] =?UTF-8?q?felt:=20fresh-pass=20review=20fiber=20?=
=?UTF-8?q?=E2=80=94=20two=20empirical=20blockers=20in=20v2.0=20robustness?=
=?UTF-8?q?=20shell,=20science=20threads=20for=20Martin?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
.../review-pr741-fresh-pass.md | 21 +++++++++++++++++++
1 file changed, 21 insertions(+)
create mode 100644 .felt/review-pr741-fresh-pass/review-pr741-fresh-pass.md
diff --git a/.felt/review-pr741-fresh-pass/review-pr741-fresh-pass.md b/.felt/review-pr741-fresh-pass/review-pr741-fresh-pass.md
new file mode 100644
index 000000000..ba3f2c2a9
--- /dev/null
+++ b/.felt/review-pr741-fresh-pass/review-pr741-fresh-pass.md
@@ -0,0 +1,21 @@
+---
+id: 01KTT34NW92KPHH9GDDXAGZZ89
+name: 'Fresh-pass review: PR #741 (post-integration)'
+tags:
+ - shapepipe
+ - ngmix
+ - review
+created-at: 2026-06-11T03:00:58.633703716+02:00
+outcome: 'Fresh full-diff review of #741 at b2dcd793 (empirical, against ngmix 2.4.0): scientific core verified sound (seed-reproducibility holds; injected shear recovered to 0.2% with response correction; r50/ivar-weights math checks out) but two empirically demonstrated blockers found in the v2.0 rewrite''s robustness shell — TILE_ID metadata key truncates the post-process CCD scan (~80% of epochs silently lost to multi-epoch bookkeeping) and failed galaxy fits crash the whole tile at save (KeyError, v1''s per-object BootGalFailure contract lost). Plus: one failed PSF epoch kills the object despite ignore_failed_psf; mcal_flags hardcoded 0 (dead quality column); cfis_simu configs broken against the positional-WCS contract. Science note: with psf=''fitgauss'' the reconvolved metacal PSF is round by construction, so g1/g2_psfo are identically zero — PSF-leakage regressions on them are degenerate. Fixes + review part 3 posted to the PR same night.'
+---
+
+Second-look review requested by Cail after the constitution-sweep integration round, run by a fresh agent with no investment in the branch. Full findings live in PR #741's "Review — part 3" comment; this fiber is the pointer plus what matters beyond the PR.
+
+## Why the blockers hid
+
+Both blockers survive a green unit-test suite and a one-tile smoke run with easy objects: the TILE_ID truncation only bites multi-CCD bookkeeping breadth (the first 8 CCDs still work), and the compile_results KeyError needs a hard-to-fit object (easy smoke objects all converge). Lesson for the test suite: structural tests catch import/config rot; these needed *adversarial empirical* tests — forced-failure objects, metadata-polluted sqlite fixtures. The regression tests added with the fixes are exactly that shape.
+
+## Science threads left deliberately open
+
+- **fixnoise homoscedasticity** (#604 thread): metacal noise-symmetrization currently draws at median(bkg_rms) while weights are per-pixel — second-order within a 51px stamp, one-line to change, Martin's call.
+- **g1/g2_psfo ≡ 0 under psf='fitgauss'** (*_psfo thread): keep/rename/drop decision affects any downstream PSF-leakage regression; relevant to the tutorial.
From 7feffeea57e9526623a1760db10668f8ea4937ff Mon Sep 17 00:00:00 2001
From: Cail Daley
Date: Thu, 11 Jun 2026 03:51:43 +0200
Subject: [PATCH 29/29] felt: size-columns chain fully merged (#198 closed the
loop)
---
.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
index df39e2d62..3f5fd7c22 100644
--- a/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
+++ b/.felt/shapepipe/ngmix-size-columns/ngmix-size-columns.md
@@ -7,7 +7,7 @@ tags:
- shapepipe
- ngmix
created-at: 2026-06-05T22:30:50.004535516+02:00
-outcome: 'Delivered end-to-end 2026-06-11: cs_util#65 merged + released as v0.2.1 on PyPI (size web, 26/26 — first release carrying cs_util.size); shapepipe#743 (honest r50 + dedupe, targeting ngmix_v2.0 per merge-base) merged, #741 CI green after; sp_validation#198 (T_to_fwhm fix + galaxy smoke test killing the green-but-vacuous CI hole, pin cs_util>=0.2.1) OPEN — blocked by shear_psf_leakage''s poetry cap cs-util ^0.1.0; relaxation branch fix/relax-cs-util-cap pushed to CosmoStat/shear_psf_leakage, PR creation is Cail''s click. Deferred (stated in #743 body): wiring honest r50/r50psf into make_cat so paper-convention r_h reaches the science catalogue.'
+outcome: 'Delivered end-to-end and FULLY MERGED 2026-06-11: cs_util#65-67 (v0.2.1 on PyPI with cs_util.size), shapepipe#743 (honest r50, merged into ngmix_v2.0), shear_psf_leakage#23 (cs-util cap relaxation), sp_validation#198 (T_to_fwhm fix + galaxy smoke test) — the whole dependency chain closed in one night. The T_to_fwhm fix moves α(size) leakage results; regenerate where it mattered. Deferred (stated in #743): wiring honest r50/r50psf into make_cat so paper-convention r_h reaches the science catalogue.'
shuttle:
kind: oneshot
host: candide