fix(tracing): reset jit_trace_num on PHP 8.5 sandbox bailout by Leiyks · Pull Request #3973 · DataDog/dd-trace-php

Leiyks · 2026-06-10T12:26:16Z

Description

Draft / hypothesis fix — CI is the experiment. Fixes the PHP 8.5-only ASAN SEGV in tests/opcache/jit_trace_num_bailout.phpt (the regression test added by #3964). The crash reproduces across multiple branches/PRs, always on the ASAN Opcache tests: [8.5] job, never on 8.0–8.4.

Root cause (high confidence on category, medium on exact state)

fix(tracing): sandbox should save/restore jit_trace_num #3964 added save/restore of EG(jit_trace_num) across a caught sandbox bailout, mirroring what zend_call_function() does (Zend/zend_execute_API.c — it only ever save/restores EG(jit_trace_num) for user functions). This is correct and sufficient on 8.0–8.4.
PHP 8.5's IR-based tracing JIT made zend_jit_trace_exit() less tolerant of a stale trace id: it removed the GCC_GLOBAL_REGS guard around the exception/opline correction (php-src ext/opcache/jit/zend_jit_trace.c ~8773-8779), so after a caught bailout the resumed caller reaches the side-exit deopt path much more readily and dereferences &zend_jit_traces[EG(jit_trace_num)].
On the sandbox bailout-catch path the captured jit_trace_num can itself be stale (the hook may have been entered while already inside a trace), so simply restoring it re-arms the crash on 8.5.

ASAN evidence

SEGV READ on 0x000000000078  (zero page)
#0 (/dev/zero (deleted)+0x8000942)   <-- JIT buffer, unsymbolized
#1 zend_execute Zend/zend_vm_execute.h
r8=0x0  r12=0x0   (null base, deref at +0x78)
r14, r15 valid    (frame pointer / opline intact)

Valid FP + a null variable-register dereferenced at +0x78 = a side-exit restoring the wrong variable stack map, i.e. stale trace metadata. No core dump was captured by CI (artifacts/core_dumps/ is empty), so the crashing JIT frame can't be symbolized from CI data alone.

Fix

On PHP 8.5+ only, reset EG(jit_trace_num) to 0 (the "not currently in a trace" invariant the resumed interpreter expects) on the bailout-catch path instead of restoring the captured value. opcache exposes no symbol to reset JIT_G(tracing) from another extension, so this EG-level reset is the reachable equivalent. zai_sandbox_engine_state_restore() is only ever called from zai_sandbox_bailout(), so 8.0–8.4 behaviour is untouched.

Verification

This PR's ASAN Opcache tests: [8.5] job is the verifier. If green, hypothesis confirmed.
A local debug-zts-asan PHP 8.5 gdb reproduction is being run in parallel to confirm the exact faulting access (restored stack map + EG(jit_trace_num)) and validate "zero vs restore".

cc the author of #3964 — this builds directly on that fix and may warrant a more principled upstream/opcache-level reset of the trace-recording state.

The #3964 regression test tests/opcache/jit_trace_num_bailout.phpt crashes with an ASAN SEGV only on PHP 8.5 (bad JIT side-exit reading a stale trace stack map). PHP 8.5's IR-based tracing JIT made zend_jit_trace_exit() less tolerant of a stale EG(jit_trace_num) after a caught sandbox bailout, so restoring the captured (possibly already-stale) trace id re-arms the crash. zend_call_function() only ever save/restores EG(jit_trace_num), and opcache exposes no symbol to reset JIT_G(tracing) from another extension, so on 8.5 we reset jit_trace_num to 0 on the bailout-catch path -- the 'not in a trace' invariant the resumed interpreter expects. <=8.4 behaviour is unchanged.

datadog-official · 2026-06-10T12:35:20Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 15 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | appsec integration tests: [test8.4-release]
🧪 1 Test failed

All test failures are known flaky.
❄️ Known flaky: extended heartbeat re-emits configuration, dependencies and integrations() from com.datadog.appsec.php.integration.TelemetryExtendedHeartbeatTests (Fix with Cursor)
java.lang.AssertionError: phpredis not emitted via app-started/app-integrations-change; saw: []. Expression: (phpredis in flushed). Values: flushed = []

java.lang.AssertionError: phpredis not emitted via app-started/app-integrations-change; saw: []. Expression: (phpredis in flushed). Values: flushed = []
	at org.codehaus.groovy.runtime.InvokerHelper.createAssertError(InvokerHelper.java:416)
	at com.datadog.appsec.php.integration.TelemetryExtendedHeartbeatTests.extended heartbeat re-emits configuration, dependencies and integrations(TelemetryExtendedHeartbeatTests.groovy:70)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Not introduced in this PR.
DataDog/apm-reliability/dd-trace-php | ASAN Opcache tests: [8.5]

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.3]
View all 15 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 54.12% (-0.03%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 44fa3af | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-06-10T14:11:05Z

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-10 14:08:59

Comparing candidate commit 44fa3af in PR branch leiyks/fix-ci-jit-trace-num-bailout-85 with baseline commit a87192b in branch master.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 192 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:MessagePackSerializationBench/benchMessagePackSerialization

🟥 execution_time [+7.035µs; +8.825µs] or [+6.921%; +8.682%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

🟥 execution_time [+6.850µs; +8.030µs] or [+6.481%; +7.597%]

Merge branch 'master' into leiyks/fix-ci-jit-trace-num-bailout-85

44fa3af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tracing): reset jit_trace_num on PHP 8.5 sandbox bailout#3973

fix(tracing): reset jit_trace_num on PHP 8.5 sandbox bailout#3973
Leiyks wants to merge 2 commits into
masterfrom
leiyks/fix-ci-jit-trace-num-bailout-85

Leiyks commented Jun 10, 2026

Uh oh!

datadog-official Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented Jun 10, 2026

Explanation

More details about the CI and significant changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Leiyks commented Jun 10, 2026

Description

Root cause (high confidence on category, medium on exact state)

ASAN evidence

Fix

Verification

Uh oh!

datadog-official Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

pr-commenter Bot commented Jun 10, 2026

Benchmarks [ tracer ]

Explanation

More details about the CI and significant changes

scenario:MessagePackSerializationBench/benchMessagePackSerialization

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

datadog-official Bot commented Jun 10, 2026 •

edited

Loading