Skip to content

Add Compaction encoding for normalizing arrays to compact form#8265

Draft
joseph-isaacs wants to merge 2 commits into
developfrom
claude/quirky-carson-1NaxF
Draft

Add Compaction encoding for normalizing arrays to compact form#8265
joseph-isaacs wants to merge 2 commits into
developfrom
claude/quirky-carson-1NaxF

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

@joseph-isaacs joseph-isaacs commented Jun 5, 2026

This is more of an idea for now

Summary

This PR introduces the Compaction encoding, a transient array wrapper that normalizes its child into a compact canonical form when executed. This is similar to the existing Slice encoding in that it's non-serializable and exists primarily to drive execution.

The Compaction encoding handles structural normalization of various array types:

  • ListView arrays are rebuilt to be zero-copy convertible to Arrow-style ListArray (overlapping views deduplicated, leading/trailing garbage trimmed)
  • VarBinView arrays have their data buffers garbage collected
  • Dict arrays are either decoded to flat canonical form or garbage collected in place (dead values removed, codes remapped) based on a cost heuristic
  • Struct fields are recursively compacted

Add a transient `Compaction` wrapper encoding that normalizes its child into
a compact canonical form when executed:

- ListView arrays are rebuilt to be zero-copy convertible to a ListArray
  (overlaps deduplicated, leading/trailing garbage trimmed).
- VarBinView buffers are garbage collected.
- Dict arrays are either decoded to flat canonical or garbage collected in
  place (dead values dropped, codes remapped), whichever is cheaper. This is
  driven by a `Dict` `CompactKernel` registered as an `execute_parent` of
  `compaction(dict(..))`.
- Struct fields are recursively compacted.

This extends the shallow `Canonical::compact()` idea into a recursive,
encoding-aware operation exposed via `ArrayRef::compact`.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added the do not merge Pull requests that are not intended to merge label Jun 5, 2026
@joseph-isaacs joseph-isaacs added changelog/feature A new feature and removed do not merge Pull requests that are not intended to merge labels Jun 5, 2026 — with Claude
`compact_canonical` is private to the module, so the rustdoc link could not
resolve under -D warnings. Describe the behavior in prose instead.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 5, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 7 improved benchmarks
❌ 4 regressed benchmarks
✅ 1496 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_canonical_into[(1000, 10)] 162 µs 198.2 µs -18.26%
Simulation compare[15] 120.4 µs 146.5 µs -17.8%
Simulation compare[14] 118 µs 142.2 µs -17%
Simulation compare[13] 116.1 µs 138.5 µs -16.14%
Simulation varbinview_zip_block_mask 3.7 ms 2.9 ms +27.51%
Simulation bitwise_not_vortex_buffer_mut[128] 275.3 ns 216.9 ns +26.89%
Simulation bitwise_not_vortex_buffer_mut[1024] 336.9 ns 278.6 ns +20.94%
Simulation bitwise_not_vortex_buffer_mut[2048] 400.6 ns 342.2 ns +17.05%
Simulation varbinview_zip_fragmented_mask 6.9 ms 6.1 ms +13.07%
Simulation chunked_varbinview_canonical_into[(100, 100)] 309.6 µs 274.3 µs +12.88%
Simulation compare[5] 77.5 µs 70.1 µs +10.5%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/quirky-carson-1NaxF (1b3a817) with develop (d97d2bd)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants