Improve ref.eq optimization#8805
Conversation
OptimizeInstruction's `areConsecutiveInputsEqual` function finds the effects that are executed in between the left-hand and right-hand expressions and checks that they cannot affect the value of the right-hand expression or its operands. However, it did not previously handle the case where the left-hand expression itself changes the value of the operands flowing into the right-hand expression. This can happen in the case of calls to idempotent functions, where the first call changes a global that is passed as an argument to the second call.
OptimizeInstructions would previously optimize ref.eq by checking if its operands were equal and foldable. If so, it would be replaced with the dropped operands (if they had effects that needed preserving) followed by a constant 1. Improve this by dropping only one of the operands; if they are equal and foldable, then there is no need to keep both around. When the operands do not have effects, this leaves more work for Vacuum than the old optimization did, but that's fine. Also newly optimize when the operands are equal but not foldable. In this case they can still be dropped and followed by a constant 1. Finally, replace the EffectAnalyzer `invalidates` call in `areConsecutiveInputsEqualAndFoldable` with an `orderedBefore` call. Add tests using release/acquire operations to show that the direction of the effect comparison is correct.
…optimize-instructions-order-2932
| ShallowEffectAnalyzer parentEffects( | ||
| getPassOptions(), *getModule(), call); | ||
| if (parentEffects.invalidates(childEffects)) { | ||
| if (parentEffects.orderedBefore(childEffects)) { |
There was a problem hiding this comment.
| if (parentEffects.orderedBefore(childEffects)) { | |
| // Check if the first parent is ordered before the second child (in the | |
| // example above, the first call vs the second global.get). | |
| if (parentEffects.orderedBefore(childEffects)) { |
| (call $acquire-mem (struct.get $struct 0 (local.get $s))) | ||
| ) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Maybe add the other two testcases, of calls to one each of acquire/release?
There was a problem hiding this comment.
I'm not sure which additional test cases you have in mind. There are lots of potential combinations of acquire/release/seqcst/unordered memory/struct/array loads and stores that we could add. But the two here are sufficient to show that our orderedBefore check is going in the correct direction, so I don't think we need any more.
There was a problem hiding this comment.
I mean that we have
(ref.eq
(call $acquire-mem (struct.get $struct 0 (local.get $s)))
(call $acquire-mem (struct.get $struct 0 (local.get $s)))
)
(ref.eq
(call $release-mem (struct.get $struct 0 (local.get $s)))
(call $release-mem (struct.get $struct 0 (local.get $s)))
)
and we could add
(ref.eq
(call $release-mem (struct.get $struct 0 (local.get $s)))
(call $acquire-mem (struct.get $struct 0 (local.get $s)))
)
(ref.eq
(call $acquire-mem (struct.get $struct 0 (local.get $s)))
(call $release-mem (struct.get $struct 0 (local.get $s)))
)
?
There was a problem hiding this comment.
But the LHS and RHS are not the same there so we would never get as far as checking for effect ordering.
|
I'm seeing fuzzer errors after this landed, due to the new testcase (module
(type $0 (shared (struct (field (mut i32)))))
(type $1 (func (param i32) (result eqref)))
(type $2 (func (param (ref $0)) (result i32)))
(type $3 (func))
(global $global$0 i31ref (ref.i31
(i32.const 0)
))
(global $global$1 (mut i32) (i32.const 0))
(export "no-fold_invoker" (func $2))
(@binaryen.idempotent)
(func $0 (param $0 i32) (result eqref)
(if
(i32.eqz
(global.get $global$1)
)
(then
(unreachable)
)
)
(global.set $global$1
(i32.const 0)
)
(global.get $global$0)
)
(func $1 (param $0 (ref $0)) (result i32)
(drop
(call $0
(i32.const 0)
)
)
(drop
(call $0
(i32.const 0)
)
)
(i32.const 0)
)
(func $2
(global.set $global$1
(i32.const 1)
)
(drop
(call $1
(struct.new_default $0)
)
)
)
)Possibly this new testcase uncovers and idempotency bug? Or perhaps it does not define a truly idempotent function, which can break the fuzzer? |
|
Smaller example: (module
(type $0 (func))
(global $global$0 (mut i32) (i32.const 0))
(export "func_55_invoker" (func $1))
(@binaryen.idempotent)
(func $0 (type $0)
(if
(i32.eqz
(global.get $global$0)
)
(then
(unreachable)
)
)
(global.set $global$0
(i32.const 0)
)
)
(func $1 (type $0)
(global.set $global$0
(i32.const 1)
)
(call $0)
(call $0)
)
)Breaks on |
|
Thanks, will investigate. The functions are correctly idempotent AFAICT. |
|
Oh actually you just need to skip this test in Tricky to remember this as it isn't the case for js.called... |
OptimizeInstructions would previously optimize ref.eq by checking if its operands were equal and foldable. If so, it would be replaced with the dropped operands (if they had effects that needed preserving) followed by a constant 1. Improve this by dropping only one of the operands; if they are equal and foldable, then there is no need to keep both around. When the operands do not have effects, this leaves more work for Vacuum than the old optimization did, but that's fine.
Also newly optimize when the operands are equal but not foldable. In this case they can still be dropped and followed by a constant 1.
Finally, replace the EffectAnalyzer
invalidatescall inareConsecutiveInputsEqualAndFoldablewith anorderedBeforecall. Add tests using release/acquire operations to show that the direction of the effect comparison is correct.