-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used #33329
Conversation
Can one of the admins verify this patch? |
@@ -2079,7 +2079,7 @@ package object config { | |||
"conjunction with the server side flag spark.shuffle.server.mergedShuffleFileManagerImpl " + | |||
"which needs to be set with the appropriate " + | |||
"org.apache.spark.network.shuffle.MergedShuffleFileManager implementation for push-based " + | |||
"shuffle to be enabled") | |||
"shuffle to be enabled. Push-based shuffle is not yet supported.") | |||
.version("3.1.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration is added at 3.1.0 and do we want to add comments like Push-based shuffle is not yet supported
at 3.1.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 3.1, multiple critical pieces of push-based shuffle were not there, for example SPARK-32922, SPARK-32921, and SPARK-32920. So, even if anyone even tried to turn it on, it would just not work.
However, in 3.2 we are just missing 2 correctness fixes which involve modifying push-based shuffle protocols: #33078 and #33034. Without these 2 push-based shuffle still works but this is what we want to prevent.
Given this I don't think it makes any difference if we add these comments to branches of version < 3.2. WDYT?
I'm okay with disabling this, but I'm wondering what is the different from branch-3.1 because branch-3.1 also has the same configuration. If we want to disable this explicitly, it might be enough to comment on configuration doc. For the other Exceptions and assertions looks overkill. In any way, I'll leave this to the other committers. |
Yeah, I think we won't necessarily have to make it failed when it's enabled. I believe it's fine to explicitly document that this feature is unstable, and either correctness or backward compatibility isn't guaranteed at this moment - to be clear, is it not usable at all? In fact, I guess we haven't even properly documented this yet (?). |
@HyukjinKwon, @dongjoon-hyun Fixing the pending jira's would require protocol changes - and shuffle protocol does not gracefully handle incompatible changes. More details here and here. I asked @otterc to keep this PR ready (as WIP) just in case we want to take the path of disabling push based shuffle entirely in 3.2.x line in case correctness fixes cant be resolved before GA - we should not be merging this unless we are sure the other two fixes wont make it through, and we want to disable it. Thoughts ? |
+CC @Ngone51 |
I am fine either way. I was just thinking that it's fine to have some incompatibility or breaking stuff as long as we document so and it's disabled by default (since this push based shuffle is brand new, unstable and not really exposed to users yet in fact). |
Being conservative sounds fine to me too since we're in core. I will leave it to you @mridulm. |
I care more about the correctness issue here. So I feel it's safer to disallow use. |
@otterc Can you please update to later branch please ? |
Closing this PR as it is no longer relevant ! |
This is WIP because #33118 (comment)
What changes were proposed in this pull request?
Push-based shuffle is partially merged in apache master but some of the tasks are still incomplete. Since 3.2 is going to cut soon, we will not be able to get the pending tasks reviewed and merged. Few of the pending tasks make protocol changes to the push-based shuffle protocols, so we would like to prevent users from enabling push-based shuffle both on the client and the server until push-based shuffle implementation is complete.
We can prevent push-based shuffle to be used by throwing UnsupportedOperationException both on the client and the server when the user tries to enable it.
Why are the changes needed?
The change is needed to prevent users from trying out push-based shuffle until it is complete.
Does this PR introduce any user-facing change?
Yes. It will prevent users to try out push-based shuffle until it is complete.
How was this patch tested?
It is a straightforward change that prevents users from enable push-based shuffle so haven't added a UT for it.