Documentation for ResumableJobMixin and resumable tasks#68136
Open
amoghrajesh wants to merge 1 commit into
Open
Documentation for ResumableJobMixin and resumable tasks#68136amoghrajesh wants to merge 1 commit into
amoghrajesh wants to merge 1 commit into
Conversation
eladkal
reviewed
Jun 6, 2026
|
|
||
| .. versionadded:: 3.2.0 | ||
|
|
||
| .. versionchanged:: 3.3.0 |
Contributor
There was a problem hiding this comment.
This breaks the current structure the
.. versionadded:: 3.2.0
Is related to the pargraph below the new additon
jroachgolf84
reviewed
Jun 6, 2026
| The trade-offs are: | ||
|
|
||
| * A Triggerer component must be running. Deployments that do not include a Triggerer cannot use this pattern. | ||
| * Writing a custom deferrable operator requires implementing a separate ``Trigger`` class in |
Collaborator
There was a problem hiding this comment.
Suggested change
| * Writing a custom deferrable operator requires implementing a separate ``Trigger`` class in | |
| * Writing a custom deferrable operator requires implementing a ``Trigger`` class in |
|
|
||
| process_files_dag() | ||
|
|
||
| This pattern works without any additional work, just plain old context. The state store is just |
Collaborator
There was a problem hiding this comment.
Suggested change
| This pattern works without any additional work, just plain old context. The state store is just | |
| This pattern works without any additional work, relying only on ``context``. The state store is just |
|
|
||
| :class:`~airflow.sdk.ResumableJobMixin` is a mixin for operators that submit long-running jobs | ||
| to an external system and poll for its completion. It makes the operator crash-safe by persisting | ||
| the external job identifier to the task state store before polling begins. If the worker is restarted |
Collaborator
There was a problem hiding this comment.
Suggested change
| the external job identifier to the task state store before polling begins. If the worker is restarted | |
| the external job identifier to task state store before polling begins. If the worker is restarted |
dabla
approved these changes
Jun 6, 2026
| The trade-offs are: | ||
|
|
||
| * A Triggerer component must be running. Deployments that do not include a Triggerer cannot use this pattern. | ||
| * Writing a custom deferrable operator requires implementing a separate ``Trigger`` class in |
Contributor
There was a problem hiding this comment.
Or: requires implementing a dedicated Trigger class...
ephraimbuddy
approved these changes
Jun 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Was generative AI tooling used to co-author this PR?
closes: #67706
What?
Operators that submit work to external systems such as Spark, BigQuery, EMR, Kubernetes share a common failure mode: the worker holds its slot for the full polling duration, and if the worker crashes the task retries from scratch, often submitting a duplicate job. Airflow 3.3 now introduces
ResumableJobMixinto solve the crash-and-duplicate problem, and the task state store enables the broader checkpoint-and-resume pattern. Neither had documentation, and there was no guidance on when to reach for these over deferrable operators or async tasks.Current behaviour
The only comparison page (
task-sdk/docs/deferred-vs-async-operators.rst) covered deferrable and async operators but made no mention of resumable tasks.ResumableJobMixinhad no user-facing documentation at all. Users experiencing repeated job submission failures had no clear path to a solution.Proposed change
airflow-core/docs/core-concepts/resumable-tasks.rst- a decision guide covering all three patterns (deferrable, resumable, async) with trade-off descriptions, a three-way comparison table, and a general checkpoint example usingtask_storedirectly.task-sdk/docs/resumable-job-mixin.rst- a reference page forResumableJobMixincovering the interface, method descriptions with inline examples, the retry flow, the pre-submit crash window limitation, and theexternal_id_keyrenaming warning.task-sdk/docs/deferred-vs-async-operators.rstwith aversionchanged:: 3.3.0note pointing to both new pages.{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.