Skip to content

fix surrogate pair byte counting in ExtendedBufferedReader array read#605

Open
digi-scrypt wants to merge 1 commit into
apache:masterfrom
digi-scrypt:surrogate-byte-count
Open

fix surrogate pair byte counting in ExtendedBufferedReader array read#605
digi-scrypt wants to merge 1 commit into
apache:masterfrom
digi-scrypt:surrogate-byte-count

Conversation

@digi-scrypt

Copy link
Copy Markdown
  1. with byte tracking on (setTrackBytes + a charset), read(char[]) advances lastChar to the last buffer char before counting, and the per-char helper reads that field instead of the actual preceding char, so a surrogate pair gets matched against the wrong neighbor (the loop also ran to length instead of offset+length).
  2. a 4-byte char taken through the char[] path, e.g. a multi-character delimiter holding a supplementary character, then throws CharacterCodingException out of nextRecord and getBytePosition() goes wrong.

Passed the previous char explicitly and moved the counting ahead of the lastChar update. What happens for a pair split across two buffer reads is covered too since the first char still pairs against the saved lastChar. Added a regression test next to the existing multi-character-delimiter byte-position one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant