[rust-compiler] Tolerate unknown statement kinds at the AST boundary#36705
Closed
poteto wants to merge 1 commit into
Closed
[rust-compiler] Tolerate unknown statement kinds at the AST boundary#36705poteto wants to merge 1 commit into
poteto wants to merge 1 commit into
Conversation
4 tasks
2 tasks
fb7d3cc to
1e41304
Compare
51f8c00 to
0efc429
Compare
Babel can emit statement kinds the typed AST does not model (the todo-ts-* fixtures pin three TS module-interop forms). Deserialization previously failed the whole file on the first such node, while the TS reference compiles the file and leaves the statement alone. Statement gains a final #[serde(untagged)] Unknown(UnknownStatement) variant carrying the complete raw node. Deserialization is hand-written and dispatches modeled `type` tags through a KnownStatement helper so a malformed modeled node still errors with its precise field-level message instead of degrading to Unknown; only genuinely unmodeled tags take the catch-all. The TS reference reaches its equivalent default case only via assertExhaustive (Babel's closed types), so it crashes; here unmodeled syntax is reachable by construction and degrades instead: top-level statements are preserved verbatim through re-serialization, and function-body occurrences record the standard UnsupportedSyntax bailout with an UnsupportedNode instruction carrying the raw node. A known_statements! macro is the single source for the dispatch enum, its From mapping, and the tag list, so those three cannot drift; a variant added to Statement but not the macro is the one remaining silent gap, documented on the variant. UnknownStatement caches BaseNode for position helpers; the scoped with_raw_mut mutator refreshes the cache and rejects mutations that strip `type`, so the two views cannot desync. Program-level analyses treat Unknown explicitly: the gating reference-before-declaration scan walks the raw node for identifier references (an `export = X` does reference X), and the prefilter and return-analysis arms are deliberately inert. SWC/OXC reverse converters emit a deliberate runtime tripwire (a throw in generated code) for the arms that are unreachable until the SWC forward conversion stops rewriting these statements to EmptyStatement in the next slice. Deserialization now materializes a serde_json::Value per statement before typed parsing. The cost is one move-based tree rebuild per nesting level at a one-time boundary; the previous derive also buffered every node through serde's internal Content to read the tag, so the delta is allocation shape, not asymptotics. Verified: ast unit tests including malformed/edge cases, a lowering integration test pinning the function-body bailout, round_trip green on the three fixtures, scoped and full Babel e2e green on all three with events parity, cargo test --workspace green. The scope-resolution half of test-babel-ast.sh is green on this stack's base and remains red corpus-wide on the pr-36173 tip, whose node-ID migration removed position-based keying while babel-ast-to-json.mjs still emits offset-based scope JSON; that generator gap needs its own fix before this stack rebases onto the tip. rust-port-0001-babel-ast.md's no-catch-all policy is amended to document Statement as the deliberate exception.
1e41304 to
e4141db
Compare
0efc429 to
64199c8
Compare
Collaborator
Author
|
Closing: ported directly to the umbrella PR branch (rust-research, #36173) along with the rest of this stack. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Babel can emit statement kinds the typed AST does not model; #36704 pins three TS module-interop forms. Deserialization previously failed the whole file on the first such node, while the TS reference compiles the file and leaves the statement alone.
Statementgains a final#[serde(untagged)] Unknown(UnknownStatement)variant carrying the complete raw Babel node. Deserialization is hand-written and dispatches modeledtypetags through aKnownStatementhelper enum, so a malformed modeled node still errors with its precise field-level message instead of degrading toUnknown; only genuinely unmodeled tags take the catch-all. Aknown_statements!macro is the single source for the dispatch enum, itsFrommapping, and the tag list, so the three cannot drift from each other.UnsupportedSyntaxbailout with anUnsupportedNodeinstruction carrying the raw node. The TS reference reaches its equivalent default case only viaassertExhaustive, which Babel's closed types make unreachable; in Rust unmodeled syntax is reachable by construction, so it degrades per the fault-tolerance model instead of crashing.Unknownexplicitly: the gating reference-before-declaration scan walks the raw node for identifier references (anexport = Xdoes referenceX), and the prefilter and return-analysis arms are deliberately inert.throwin generated code) instead of silently dropping unknown nodes. The SWC forward path is fixed in the next PR of this stack.rust-port-0001-babel-ast.md's no-catch-all policy is amended to documentStatementas the single deliberate exception.Perf note: deserialization now materializes a
serde_json::Valueper statement before typed parsing. The marginal cost is a move-based tree rebuild at a one-time boundary; the previous derive also buffered every node through serde's internalContentto read the tag, so the delta is allocation shape, not asymptotics.Test plan
cargo test --workspacegreen; unit tests cover program-level and nested-in-function round trips with reserialize equality, known-tag non-shadowing, precise malformed-node errors, missing/non-string/non-objecttype, scoped raw mutation, and a lowering integration test pinning the function-body bailout shapeimport lib = require("shared-runtime")is preserved verbatim next to real memoization)test-babel-ast.sh(round-trip and scope-resolution tests) fully green on this stackStack
pr-36173Merge order: bottom up.