-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deferrable sensors can implement sensor timeout #33718
Conversation
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
@dstandish and team - I see this issue was auto-closed and now reopened. My team is struggling with the same issue, so we'd be happy to help/test in any way possible as needed - let us know ! |
3a5a800
to
2fe05c8
Compare
Hi @robg-eb, thanks for your interest. Yes, your feedback / review / testing would be welcome. I just rebased it. |
@dstandish - To clarify, is this PR ready to test as-is? |
Right. In short what this does is, now when trigger times out we raise TaskDeferralTimeout instead of the generic TaskDeferralError. And in BaseSensorOperator, we reraise this as AirflowSensorTimeout which has special meaning (results in immediately fail and no more retries. At least that's my understanding after dusting this off just now :) |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
bcb4f06
to
cfaf095
Compare
@robg-eb looked at your comment
I don't understand. What do you mean "an interruption does happen"? |
21e660f
to
39095d5
Compare
Is this one ready to review and we think we can merge it in? Also I think it relates with our discussions on making deferrable the "default" for Airflow 3 and it is part of the discussion "Do we actually pay enough attention for deferrable timeouts and faiilure scenarios to make it "first-class" replacement for other types of sensors - so maybe we should prioritize this one. I saw a number of people commenting before, but I am not sure what the status is after so many back/forth and long conversations. .. So maybe we can somewhat restart this one and get more people who commented before and understand more in detail about the problems? cc: @eladkal Pinging those who were active here Also @thesuperzapper -> re: #36090 (comment) which seems very much relevant to this one, as I think we need to agree on strategy of how to treat "exceptional" cases for deferrable operators in a consistent way. I personally know too little in this area to make a meaningful (and correct) feedback, But I have a feeling this and #36090 would need to be addressed if we want to seriously continue discussion started here https://lists.apache.org/thread/3m7vjwcbvodnhrklo69s3j8s8pp7nm6o |
8db9672
to
eae169d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
But we must make sure the core part of this PR is released in 2.11 otherwise we will have compatibility problems with the provider part cc @ephraimbuddy @utkarsharma2
The goal here is to ensure behavioral parity w.r.t. sensor timeouts between deferrable and non-deferrable sensor operators. With non-deferrable sensors, if there's a sensor timeout, the task fails without retry. But currently, with deferrable sensors, that does not happen. Since there's already a "timeout" capability on triggers, we can use this for sensor timeout. Essentially all that was missing was the ability to distinguish between trigger timeouts and other trigger errors. With this capability, base sensor can distinguish between the two, and reraise deferral timeouts as sensor timeouts. So, here we add a new exception type, TaskDeferralTimeout, which base sensor reraises as AirflowSensorTimeout. Then, to take advantage of this feature, a sensor need only ensure that its timeout is passed when deferring. For convenience, we update the task deferred exception signature to take int and float in addition to timedelta, since that's how `timeout` attr is defined on base sensor. But we do not change the exception attribute type. In order to keep this PR focused, this PR only updates one sensor to use the timeout functionality, namely, time delta sensor. Other sensors will have to be done as followups.
4fb41b1
to
502c0fc
Compare
The goal here is to ensure behavioral parity w.r.t. sensor timeouts between deferrable and non-deferrable sensor operators. With non-deferrable sensors, if there's a sensor timeout, the task fails without retry. But currently, with deferrable sensors, that does not happen. Since there's already a "timeout" capability on triggers, we can use this for sensor timeout. Essentially all that was missing was the ability to distinguish between trigger timeouts and other trigger errors. With this capability, base sensor can distinguish between the two, and reraise deferral timeouts as sensor timeouts. So, here we add a new exception type, TaskDeferralTimeout, which base sensor reraises as AirflowSensorTimeout. Then, to take advantage of this feature, a sensor need only ensure that its timeout is passed when deferring. For convenience, we update the task deferred exception signature to take int and float in addition to timedelta, since that's how `timeout` attr is defined on base sensor. But we do not change the exception attribute type. In order to keep this PR focused, this PR only updates one sensor to use the timeout functionality, namely, time delta sensor. Other sensors will have to be done as followups.
The goal here is to ensure behavioral parity w.r.t. sensor timeouts between deferrable and non-deferrable sensor operators. With non-deferrable sensors, if there's a sensor timeout, the task fails without retry. But currently, with deferrable sensors, that does not happen. Since there's already a "timeout" capability on triggers, we can use this for sensor timeout. Essentially all that was missing was the ability to distinguish between trigger timeouts and other trigger errors. With this capability, base sensor can distinguish between the two, and reraise deferral timeouts as sensor timeouts. So, here we add a new exception type, TaskDeferralTimeout, which base sensor reraises as AirflowSensorTimeout. Then, to take advantage of this feature, a sensor need only ensure that its timeout is passed when deferring. For convenience, we update the task deferred exception signature to take int and float in addition to timedelta, since that's how `timeout` attr is defined on base sensor. But we do not change the exception attribute type. In order to keep this PR focused, this PR only updates one sensor to use the timeout functionality, namely, time delta sensor. Other sensors will have to be done as followups.
The goal here is to allow behavioral parity w.r.t. sensor timeouts between deferrable and non-deferrable sensor operators.
With non-deferrable sensors, if there's a sensor timeout, the task fails without retry. But currently, with deferrable sensors, that does not happen.
Since there's already a "timeout" capability on triggers, we can use this for sensor timeout. Essentially all that was missing was the ability to distinguish between trigger timeouts and other trigger errors. With this capability, base sensor can distinguish between the two, and reraise deferral timeouts as sensor timeouts.
So, here we add a new exception type, TaskDeferralTimeout, which base sensor reraises as AirflowSensorTimeout. Then, to take advantage of this feature, a sensor need only ensure that its timeout is passed when deferring. For convenience, we update the task deferred exception signature to take int and float in addition to timedelta, since that's how
timeout
attr is defined on base sensor. But we do not change the exception attribute type.In order to keep this PR focused, this PR only updates one sensor to use the timeout functionality, namely, time delta sensor. Other sensors will have to be done as followups.
Old description below ⬇️
Alternative to #32990
resolves #32638
This is a less invasive approach. Essentially, what we do here is, update
BaseOperator.resume_execution
so that when trigger times out then it raises special exceptionAirflowDeferralTimeout
.Then,
BaseSensorOperator.resume_execution
, we reraiseAirflowDeferralTimeout
as aAirflowSensorTimeout
.So, if a sensor resumes from a timed-out deferral, then it's interpreted as a sensor timeout.
All that is required is for a sensor to add a timeout to the deferral.
Example logs: