Skip to content

Fix DagFileProcessorManager silent hang on DB lock contention#68118

Open
Subham-KRLX wants to merge 2 commits into
apache:mainfrom
Subham-KRLX:fix-dag-processor-lock-timeout
Open

Fix DagFileProcessorManager silent hang on DB lock contention#68118
Subham-KRLX wants to merge 2 commits into
apache:mainfrom
Subham-KRLX:fix-dag-processor-lock-timeout

Conversation

@Subham-KRLX
Copy link
Copy Markdown
Contributor

@Subham-KRLX Subham-KRLX commented Jun 6, 2026

Instead of risky thread-based heartbeats this PR adds a native with_db_lock_timeout context manager to the blocking deactivate_stale_dags and deactivate_deleted_dags updates. If a lock timeout occurs it safely rolls back logs a warning and skips the iteration so the main loop and heartbeat() can continue uninterrupted.

closes: #68101


Was generative AI tooling used to co-author this PR?
  • Yes — Claude(For pr description and code research)

@Subham-KRLX Subham-KRLX force-pushed the fix-dag-processor-lock-timeout branch 3 times, most recently from 77d3f99 to 7b665dd Compare June 6, 2026 06:04
Comment thread airflow-core/src/airflow/utils/sqlalchemy.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent the DagFileProcessorManager main loop from silently hanging during startup when DB lock contention blocks the deactivate_stale_dags / deactivate_deleted_dags update paths, by applying a per-session lock wait timeout and gracefully skipping the iteration on lock-timeout errors.

Changes:

  • Added a with_db_lock_timeout() SQLAlchemy utility to apply per-dialect lock wait timeouts (PostgreSQL/MySQL).
  • Wrapped deactivate_stale_dags() and deactivate_deleted_dags() DB updates with the lock-timeout context manager and added rollback + warning on lock-timeout OperationalError.
  • Added a unit test covering lock-timeout handling for deactivate_stale_dags().

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
airflow-core/src/airflow/utils/sqlalchemy.py Adds with_db_lock_timeout() context manager to apply DB lock wait timeouts.
airflow-core/src/airflow/dag_processing/manager.py Uses the lock-timeout wrapper and handles lock-timeout errors to avoid processor hangs.
airflow-core/tests/unit/dag_processing/test_manager.py Adds a regression test for lock-timeout handling in deactivate_stale_dags().

Comment thread airflow-core/src/airflow/utils/sqlalchemy.py
Comment thread airflow-core/tests/unit/dag_processing/test_manager.py
Comment thread airflow-core/src/airflow/dag_processing/manager.py
@Subham-KRLX Subham-KRLX force-pushed the fix-dag-processor-lock-timeout branch from 7b665dd to 1611f16 Compare June 6, 2026 10:34
Comment thread airflow-core/src/airflow/utils/sqlalchemy.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DagFileProcessorManager silently hangs on DB lock contention during startup — no log, no timeout, no recovery

4 participants