Skip to content

fix: avoid quadratic re-scan of comments after a value#1689

Open
hjanuschka wants to merge 1 commit into
open-source-parsers:masterfrom
hjanuschka:fix-quadratic-comment-rescan
Open

fix: avoid quadratic re-scan of comments after a value#1689
hjanuschka wants to merge 1 commit into
open-source-parsers:masterfrom
hjanuschka:fix-quadratic-comment-rescan

Conversation

@hjanuschka

@hjanuschka hjanuschka commented Jun 13, 2026

Copy link
Copy Markdown

OurReader::readComment() decides whether a comment attaches to the previous value (commentAfterOnSameLine) by scanning from the end of that value to the comment via containsNewLine(lastValueEnd_, commentBegin). lastValueEnd_ only advances when a new value is read, so a long run of comments after a value (error recovery, or a value followed by many comments) re-scans the same growing prefix for every comment -- O(n^2).

A comment can only be on the same line as the last value when no newline separates them, and that gap only grows as more comments are consumed, so it never needs re-examining after the first comment. Setting lastValueHasAComment_ after the first comment makes the rest skip the scan. Output is unchanged.

Found via a jsoncpp_fuzzer timeout (Chromium issue 521541633): a 400KB input scanned 2.24 GB across 8384 containsNewLine calls and took ~18s. After the fix: 122 KB / 1 call, ~56ms.

Adds CharReaderTest/parseCommentsAfterValueNotQuadratic (a value followed by many trailing comments, bounded parse time): fails multi-second without the fix, passes in ms with it. Existing tests unaffected.

OurReader::readComment() decides whether a comment should be attached to
the previous value (commentAfterOnSameLine) by scanning the input from the
end of that value up to the comment with containsNewLine(). lastValueEnd_
only advances when a new value is read, so a long run of comments after a
value (e.g. during error recovery, or a value followed by many comments)
made every comment re-scan the same growing prefix, giving O(n^2) parse
time. A jsoncpp_fuzzer testcase took ~18s for a 400KB input.

A comment can only ever be on the same line as the last value if no
newline separates them, and the gap to inspect only grows as further
comments are consumed, so once the gap has been examined for the first
comment it never needs to be examined again. Mark lastValueHasAComment_
after the first comment following a value so subsequent comments skip the
scan. Parsing the testcase drops from ~18s to ~56ms with identical output.

Add a regression test that parses a value followed by a large number of
trailing comments and requires it to complete well under a generous time
bound.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant