Skip to content

perf(shapes): build matplotlib patches once, share across fill/outline#691

Merged
timtreis merged 2 commits into
mainfrom
perf/shapes-build-patches-once
Jun 6, 2026
Merged

perf(shapes): build matplotlib patches once, share across fill/outline#691
timtreis merged 2 commits into
mainfrom
perf/shapes-build-patches-once

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented Jun 6, 2026

What & why

The matplotlib branch of _render_shapes builds 2–3 PatchCollections (outer
outline, optional inner outline, fill), and each call to _get_collection_shape
rebuilt the same patch geometry from scratch via GeoDataFrame.iterrows() +
row.to_dict(). For shape elements this geometry construction is the dominant render
cost, and it ran once per layer.

This is the top item from the profiling write-up in #690 (the biggest single payoff).

Change

  • Factor the colour-independent geometry build into a new _build_shape_patches()
    helper and call it once in _render_shapes, sharing the result across all
    collections via a new optional prebuilt_patches= argument on
    _get_collection_shape (back-compatible — defaults to None, in which case it
    builds as before).
  • Replace the per-row iterrows()/to_dict() loop with columnar iteration, and
    resolve the scale scalar once instead of per shape.
  • _get_collection_shape's colour logic is unchanged; it now expands per-shape fill
    colours to per-patch (preserving MultiPolygon expansion and the single-colour
    broadcast) and assembles the PatchCollection.

Why sharing is safe: PatchCollection.set_paths does
p.get_transform().transform_path(p.get_path()), i.e. it bakes a fresh, independent
Path per collection, so building the patch list once and handing it to multiple
collections (and the existing per-collection trans.transform vertex update) does not
cross-contaminate. Verified empirically.

Correctness — byte-identical output

Rendered 16 scenarios on this branch and on main, comparing the Agg RGBA buffers
exactly (np.array_equal): all identical.

plain / colour-literal / outline / fill+outline / scaled circles · polygons
plain/outline · multipolygons plain/outline · categorical fill · continuous fill ·
categorical+outline · groups · groups+na_color

This covers the tricky paths: MultiPolygon → multiple patches with replicated colour,
groups filtering (fewer shapes), na_color, and centroid scaling.

Performance

render_shapes(..., outline_alpha=1.0) at 2000 shapes: 763 → ~500 ms (~35%),
medium blobs, Agg, dpi 100. Scales linearly with shape count, so the absolute saving
grows on large datasets (shape rendering was measured at ~3 s for 8k shapes on main).

The datashader branch (auto-selected for >10k shapes) is untouched.

timtreis added 2 commits June 6, 2026 19:40
The matplotlib branch of _render_shapes called _get_collection_shape
separately for the outer outline, inner outline, and fill, each rebuilding
the same patch geometry from scratch via GeoDataFrame.iterrows()/to_dict().
For shape elements this geometry construction is the dominant render cost and
it ran 2-3x per plot.

Factor the colour-independent geometry build into _build_shape_patches() and
call it once in _render_shapes, passing the result to each collection via a
new optional prebuilt_patches argument. Also drop the per-row iterrows()/Series
construction in favour of columnar iteration and resolve the scale scalar once.

Output is unchanged: RGBA buffers are byte-identical to main across 16
scenarios (plain/outline/fill+outline/scaled circles, polygons, multipolygons,
categorical/continuous fill, groups, na_color). ~35% faster on a 2k-shape
outline render (763 -> ~500 ms), scaling linearly with shape count.
Apply /simplify cleanups (output byte-identical, 16-scenario parity holds):
- expand fill colours via numpy fancy-indexing instead of .tolist() + a
  python loop (drops a dead hasattr fallback; fill_c is always an ndarray);
- normalize geometries with a single vectorized shapely.normalize call,
  falling back to per-geometry only if the bulk call rejects an input, and
  materialize the geometry array once;
- index the radius array with numpy boolean masking;
- pass the already-resolved scale scalar to _scale_pathpatch_around_centroid
  so the MultiPolygon branch doesn't re-extract it.
@timtreis timtreis merged commit 224f069 into main Jun 6, 2026
5 of 8 checks passed
@timtreis timtreis deleted the perf/shapes-build-patches-once branch June 6, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant