Skip to content

Improve ref.eq optimization#8805

Merged
tlively merged 5 commits into
mainfrom
optimize-instructions-order-2932
Jun 5, 2026
Merged

Improve ref.eq optimization#8805
tlively merged 5 commits into
mainfrom
optimize-instructions-order-2932

Conversation

@tlively
Copy link
Copy Markdown
Member

@tlively tlively commented Jun 5, 2026

OptimizeInstructions would previously optimize ref.eq by checking if its operands were equal and foldable. If so, it would be replaced with the dropped operands (if they had effects that needed preserving) followed by a constant 1. Improve this by dropping only one of the operands; if they are equal and foldable, then there is no need to keep both around. When the operands do not have effects, this leaves more work for Vacuum than the old optimization did, but that's fine.

Also newly optimize when the operands are equal but not foldable. In this case they can still be dropped and followed by a constant 1.

Finally, replace the EffectAnalyzer invalidates call in areConsecutiveInputsEqualAndFoldable with an orderedBefore call. Add tests using release/acquire operations to show that the direction of the effect comparison is correct.

tlively added 2 commits June 4, 2026 21:10
OptimizeInstruction's `areConsecutiveInputsEqual` function finds the effects that are executed in between the left-hand and right-hand expressions and checks that they cannot affect the value of the right-hand expression or its operands. However, it did not previously handle the case where the left-hand expression itself changes the value of the operands flowing into the right-hand expression. This can happen in the case of calls to idempotent functions, where the first call changes a global that is passed as an argument to the second call.
OptimizeInstructions would previously optimize ref.eq by checking if its operands were equal and foldable. If so, it would be replaced with the dropped operands (if they had effects that needed preserving) followed by a constant 1. Improve this by dropping only one of the operands; if they are equal and foldable, then there is no need to keep both around. When the operands do not have effects, this leaves more work for Vacuum than the old optimization did, but that's fine.

Also newly optimize when the operands are equal but not foldable. In this case they can still be dropped and followed by a constant 1.

Finally, replace the EffectAnalyzer `invalidates` call in `areConsecutiveInputsEqualAndFoldable` with an `orderedBefore` call. Add tests using release/acquire operations to show that the direction of the effect comparison is correct.
@tlively tlively requested a review from a team as a code owner June 5, 2026 04:38
@tlively tlively requested review from kripken and removed request for a team June 5, 2026 04:38
ShallowEffectAnalyzer parentEffects(
getPassOptions(), *getModule(), call);
if (parentEffects.invalidates(childEffects)) {
if (parentEffects.orderedBefore(childEffects)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (parentEffects.orderedBefore(childEffects)) {
// Check if the first parent is ordered before the second child (in the
// example above, the first call vs the second global.get).
if (parentEffects.orderedBefore(childEffects)) {

(call $acquire-mem (struct.get $struct 0 (local.get $s)))
)
)
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the other two testcases, of calls to one each of acquire/release?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which additional test cases you have in mind. There are lots of potential combinations of acquire/release/seqcst/unordered memory/struct/array loads and stores that we could add. But the two here are sufficient to show that our orderedBefore check is going in the correct direction, so I don't think we need any more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that we have

    (ref.eq
      (call $acquire-mem (struct.get $struct 0 (local.get $s)))
      (call $acquire-mem (struct.get $struct 0 (local.get $s)))
    )

    (ref.eq
      (call $release-mem (struct.get $struct 0 (local.get $s)))
      (call $release-mem (struct.get $struct 0 (local.get $s)))
    )

and we could add

    (ref.eq
      (call $release-mem (struct.get $struct 0 (local.get $s)))
      (call $acquire-mem (struct.get $struct 0 (local.get $s)))
    )

    (ref.eq
      (call $acquire-mem (struct.get $struct 0 (local.get $s)))
      (call $release-mem (struct.get $struct 0 (local.get $s)))
    )

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the LHS and RHS are not the same there so we would never get as far as checking for effect ordering.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, yeah.

Base automatically changed from consecutive-inputs-equal-left-right-interference to main June 5, 2026 17:44
@tlively tlively merged commit f58a196 into main Jun 5, 2026
16 checks passed
@tlively tlively deleted the optimize-instructions-order-2932 branch June 5, 2026 17:49
@kripken
Copy link
Copy Markdown
Member

kripken commented Jun 5, 2026

I'm seeing fuzzer errors after this landed, due to the new testcase it/passes/optimize-instructions-global-effects-idempotent.wast. I reduced one to

(module
 (type $0 (shared (struct (field (mut i32)))))
 (type $1 (func (param i32) (result eqref)))
 (type $2 (func (param (ref $0)) (result i32)))
 (type $3 (func))
 (global $global$0 i31ref (ref.i31
  (i32.const 0)
 ))
 (global $global$1 (mut i32) (i32.const 0))
 (export "no-fold_invoker" (func $2))
 (@binaryen.idempotent)
 (func $0 (param $0 i32) (result eqref)
  (if
   (i32.eqz
    (global.get $global$1)
   )
   (then
    (unreachable)
   )
  )
  (global.set $global$1
   (i32.const 0)
  )
  (global.get $global$0)
 )
 (func $1 (param $0 (ref $0)) (result i32)
  (drop
   (call $0
    (i32.const 0)
   )
  )
  (drop
   (call $0
    (i32.const 0)
   )
  )
  (i32.const 0)
 )
 (func $2
  (global.set $global$1
   (i32.const 1)
  )
  (drop
   (call $1
    (struct.new_default $0)
   )
  )
 )
)
$ bin/wasm-opt out/test/w.wasm -all --fuzz-exec  --monomorphize
[fuzz-exec] export no-fold_invoker
[trap unreachable]
[fuzz-exec] export no-fold_invoker
[fuzz-exec] comparing no-fold_invoker
[fuzz-exec] optimization passes changed results

Possibly this new testcase uncovers and idempotency bug? Or perhaps it does not define a truly idempotent function, which can break the fuzzer?

@kripken
Copy link
Copy Markdown
Member

kripken commented Jun 5, 2026

Smaller example:

(module
 (type $0 (func))
 (global $global$0 (mut i32) (i32.const 0))
 (export "func_55_invoker" (func $1))
 (@binaryen.idempotent)
 (func $0 (type $0)
  (if
   (i32.eqz
    (global.get $global$0)
   )
   (then
    (unreachable)
   )
  )
  (global.set $global$0
   (i32.const 0)
  )
 )
 (func $1 (type $0)
  (global.set $global$0
   (i32.const 1)
  )
  (call $0)
  (call $0)
 )
)

Breaks on bin/wasm-opt t.wat -all --fuzz-exec -O2

@tlively
Copy link
Copy Markdown
Member Author

tlively commented Jun 5, 2026

Thanks, will investigate. The functions are correctly idempotent AFAICT.

@kripken
Copy link
Copy Markdown
Member

kripken commented Jun 5, 2026

Oh actually you just need to skip this test in scripts/test/fuzzing.py like the others, see the comment there

    # We cannot fuzz semantics-altering intrinsics, as when we optimize the
    # behavior changes.

Tricky to remember this as it isn't the case for js.called...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants