Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used #33329

Closed
wants to merge 3 commits into from

Conversation

otterc
Copy link
Contributor

@otterc otterc commented Jul 13, 2021

This is WIP because #33118 (comment)

What changes were proposed in this pull request?
Push-based shuffle is partially merged in apache master but some of the tasks are still incomplete. Since 3.2 is going to cut soon, we will not be able to get the pending tasks reviewed and merged. Few of the pending tasks make protocol changes to the push-based shuffle protocols, so we would like to prevent users from enabling push-based shuffle both on the client and the server until push-based shuffle implementation is complete.
We can prevent push-based shuffle to be used by throwing UnsupportedOperationException both on the client and the server when the user tries to enable it.

Why are the changes needed?
The change is needed to prevent users from trying out push-based shuffle until it is complete.

Does this PR introduce any user-facing change?
Yes. It will prevent users to try out push-based shuffle until it is complete.

How was this patch tested?
It is a straightforward change that prevents users from enable push-based shuffle so haven't added a UT for it.

@github-actions github-actions bot added the CORE label Jul 13, 2021
@otterc
Copy link
Contributor Author

otterc commented Jul 13, 2021

cc. @mridulm @Ngone51 @Victsm

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -2079,7 +2079,7 @@ package object config {
"conjunction with the server side flag spark.shuffle.server.mergedShuffleFileManagerImpl " +
"which needs to be set with the appropriate " +
"org.apache.spark.network.shuffle.MergedShuffleFileManager implementation for push-based " +
"shuffle to be enabled")
"shuffle to be enabled. Push-based shuffle is not yet supported.")
.version("3.1.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is added at 3.1.0 and do we want to add comments like Push-based shuffle is not yet supported at 3.1.3?

Copy link
Contributor Author

@otterc otterc Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 3.1, multiple critical pieces of push-based shuffle were not there, for example SPARK-32922, SPARK-32921, and SPARK-32920. So, even if anyone even tried to turn it on, it would just not work.
However, in 3.2 we are just missing 2 correctness fixes which involve modifying push-based shuffle protocols: #33078 and #33034. Without these 2 push-based shuffle still works but this is what we want to prevent.

Given this I don't think it makes any difference if we add these comments to branches of version < 3.2. WDYT?

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used Jul 13, 2021
@dongjoon-hyun
Copy link
Member

I'm okay with disabling this, but I'm wondering what is the different from branch-3.1 because branch-3.1 also has the same configuration. If we want to disable this explicitly, it might be enough to comment on configuration doc. For the other Exceptions and assertions looks overkill. In any way, I'll leave this to the other committers.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 14, 2021

Yeah, I think we won't necessarily have to make it failed when it's enabled. I believe it's fine to explicitly document that this feature is unstable, and either correctness or backward compatibility isn't guaranteed at this moment - to be clear, is it not usable at all? In fact, I guess we haven't even properly documented this yet (?).

@mridulm
Copy link
Contributor

mridulm commented Jul 14, 2021

@HyukjinKwon, @dongjoon-hyun Fixing the pending jira's would require protocol changes - and shuffle protocol does not gracefully handle incompatible changes. More details here and here.

I asked @otterc to keep this PR ready (as WIP) just in case we want to take the path of disabling push based shuffle entirely in 3.2.x line in case correctness fixes cant be resolved before GA - we should not be merging this unless we are sure the other two fixes wont make it through, and we want to disable it.

Thoughts ?

@mridulm
Copy link
Contributor

mridulm commented Jul 14, 2021

+CC @Ngone51

@HyukjinKwon
Copy link
Member

I am fine either way. I was just thinking that it's fine to have some incompatibility or breaking stuff as long as we document so and it's disabled by default (since this push based shuffle is brand new, unstable and not really exposed to users yet in fact).

@HyukjinKwon
Copy link
Member

Being conservative sounds fine to me too since we're in core. I will leave it to you @mridulm.

@Ngone51
Copy link
Member

Ngone51 commented Jul 15, 2021

I care more about the correctness issue here. So I feel it's safer to disallow use.

@mridulm
Copy link
Contributor

mridulm commented Jul 21, 2021

@otterc Can you please update to later branch please ?

@mridulm
Copy link
Contributor

mridulm commented Sep 20, 2021

Closing this PR as it is no longer relevant !

@mridulm mridulm closed this Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants