-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used #33118
Conversation
@@ -20,21 +20,19 @@ package org.apache.spark.scheduler | |||
import java.util.Properties | |||
import java.util.concurrent.{CountDownLatch, TimeUnit} | |||
import java.util.concurrent.atomic.{AtomicBoolean, AtomicLong, AtomicReference} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break scala style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, but this looks like an overkill a little to me. Why don't we update the config doc instead? I believe that's more proper and correct location to do this.
cc @mridulm
@dongjoon-hyun There are two ongoing correctness PR's which require protocol changes (for push based shuffle) in order to fix the issues. I am expecting them to be reviewed and merged before 3.2. This PR would be merged only if those other two pr's (#33078 and #33034) do not make it into 3.2 Do we want to make this WIP to convey this intent Dongjoon ? |
@dongjoon-hyun @mridulm I have updated the docs of the configs on both the client and sever as well with |
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks for the excellent suggestion @dongjoon-hyun ! |
cc @Ngone51 FYI |
Yeah, +1 for landing this to branch-3.2 only |
only few days left for the branchcut :-). |
Makes sense @dongjoon-hyun @mridulm @HyukjinKwon. Will close this PR and create it against 3.2 once it is cut. |
I'm thinking that if we dynamically disable push-based shuffle when indeterminated stage retries (e.g., via job properties), users are still safe to try it? (Note that push-based shuffle is already disabled in the case of multiple yarn attempts in master branch) |
@Ngone51 The problem is that fixing the two correctness pending issues requires protocol changes - which means serde issues if 3.3 (where this is fixed) tries to work with 3.2 (if it is released without the fixes) - client/ESS combinations (3.3 client with 3.2 ESS and 3.2 client with 3.3 ESS in future). |
I see. Make sense to me. |
This is WIP because #33118 (comment)
What changes were proposed in this pull request?
Push-based shuffle is partially merged in apache master but some of the tasks are still incomplete. Since 3.2 is going to cut soon, we will not be able to get the pending tasks reviewed and merged. Few of the pending tasks make protocol changes to the push-based shuffle protocols, so we would like to prevent users from enabling push-based shuffle both on the client and the server until push-based shuffle implementation is complete.
We can prevent push-based shuffle to be used by throwing UnsupportedOperationException both on the client and the server when the user tries to enable it.
Why are the changes needed?
The change is needed to prevent users from trying out push-based shuffle until it is complete.
Does this PR introduce any user-facing change?
Yes. It will prevent users to try out push-based shuffle until it is complete.
How was this patch tested?
It is a straightforward change that prevents users from enable push-based shuffle so haven't added a UT for it.