Skip to content

Fix edge cases with UTF-8 strings in ChecksumResultSet#13441

Open
Noremac201 wants to merge 1 commit into
googleapis:mainfrom
Noremac201:utf8-checksum-fixes
Open

Fix edge cases with UTF-8 strings in ChecksumResultSet#13441
Noremac201 wants to merge 1 commit into
googleapis:mainfrom
Noremac201:utf8-checksum-fixes

Conversation

@Noremac201

@Noremac201 Noremac201 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor
  1. Ensure a minimum capacity of 4 bytes when allocating the buffer, this is the max size of a UTF-8 character. However, the java length representation is being used in this code for the byte buffer allocation, which may be too small for a single utf-8 character.

  2. Use buffer.clear() instead of the second buffer.flip() . This resets the buffer's write limit back to its full capacity for the next iteration, instead of shrinking it to the size of the previous write. This is for mixed multi-byte utf-8 characters and single-byte characters. A test was added to show this passing, and it fails with flip() vs clear()

Fixes #13440

@Noremac201 Noremac201 requested review from a team as code owners June 11, 2026 18:19

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates ChecksumResultSet to ensure a minimum buffer size of 4 bytes when allocating a ByteBuffer and switches from buffer.flip() to buffer.clear() to correctly reset the buffer. It also adds new unit tests covering empty, multi-byte, and mixed UTF-8 strings. The review feedback suggests refactoring these new tests to extract the duplicate mock setup and initialization logic into a helper method, which will improve readability and maintainability.

@Noremac201 Noremac201 force-pushed the utf8-checksum-fixes branch 3 times, most recently from b0017ca to 1a1b88e Compare June 11, 2026 19:01
…tSet

1. Ensure a minimum capacity of 4 bytes when allocating the buffer, this is the max size of a UTF-8 character. However, the java length representation is being used in this code for the byte buffer allocation, which may be too small for a single utf-8 character.

2. Use buffer.clear() instead of the second buffer.flip(). This is for mixed multi-byte utf-8 characters and single-byte characters. A test was added to show this passing, and it fails with flip() vs clear().

Fixes googleapis#13440
@Noremac201 Noremac201 force-pushed the utf8-checksum-fixes branch from 1a1b88e to 75a42cf Compare June 11, 2026 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ChecksumResultset UTF-8 parsing infinite loops

1 participant