Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used #33118

Closed
wants to merge 2 commits into from

Conversation

otterc
Copy link
Contributor

@otterc otterc commented Jun 28, 2021

This is WIP because #33118 (comment)

What changes were proposed in this pull request?

Push-based shuffle is partially merged in apache master but some of the tasks are still incomplete. Since 3.2 is going to cut soon, we will not be able to get the pending tasks reviewed and merged. Few of the pending tasks make protocol changes to the push-based shuffle protocols, so we would like to prevent users from enabling push-based shuffle both on the client and the server until push-based shuffle implementation is complete.
We can prevent push-based shuffle to be used by throwing UnsupportedOperationException both on the client and the server when the user tries to enable it.

Why are the changes needed?

The change is needed to prevent users from trying out push-based shuffle until it is complete.

Does this PR introduce any user-facing change?

Yes. It will prevent users to try out push-based shuffle until it is complete.

How was this patch tested?

It is a straightforward change that prevents users from enable push-based shuffle so haven't added a UT for it.

@github-actions github-actions bot added the CORE label Jun 28, 2021
@@ -20,21 +20,19 @@ package org.apache.spark.scheduler
import java.util.Properties
import java.util.concurrent.{CountDownLatch, TimeUnit}
import java.util.concurrent.atomic.{AtomicBoolean, AtomicLong, AtomicReference}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break scala style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just fixed it.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, but this looks like an overkill a little to me. Why don't we update the config doc instead? I believe that's more proper and correct location to do this.

cc @mridulm

@mridulm
Copy link
Contributor

mridulm commented Jun 28, 2021

@dongjoon-hyun There are two ongoing correctness PR's which require protocol changes (for push based shuffle) in order to fix the issues. I am expecting them to be reviewed and merged before 3.2.
But in case they do not make it into 3.2, we would need to make sure that in future a 3.3 ESS/client does not badly interact with a 3.2 ESS/client.

This PR would be merged only if those other two pr's (#33078 and #33034) do not make it into 3.2
Thoughts ?

Do we want to make this WIP to convey this intent Dongjoon ?

@otterc
Copy link
Contributor Author

otterc commented Jun 28, 2021

@dongjoon-hyun @mridulm I have updated the docs of the configs on both the client and sever as well with
Push-based shuffle is not yet supported.
I will change this to WIP.

@otterc otterc changed the title [SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used [WIP][SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used Jun 28, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed information. Are we going to merge this to the master and revert after the branch-3.2 cut? I believe we should not land this kind of PR in master branch. Instead, we can have this in branch-3.2 if needed for those correctness issue. WDYT, @mridulm and @otterc ?

@mridulm
Copy link
Contributor

mridulm commented Jun 30, 2021

That makes sense, thanks for the excellent suggestion @dongjoon-hyun !
Let us close this PR @otterc and create it against branch-3.2 when it has been cut - we can merge it if we determine that both the two correctness fixes cant make it to 3.2

@HyukjinKwon
Copy link
Member

cc @Ngone51 FYI

@HyukjinKwon
Copy link
Member

Yeah, +1 for landing this to branch-3.2 only

@HyukjinKwon
Copy link
Member

only few days left for the branchcut :-).

@otterc
Copy link
Contributor Author

otterc commented Jun 30, 2021

Makes sense @dongjoon-hyun @mridulm @HyukjinKwon. Will close this PR and create it against 3.2 once it is cut.

@otterc otterc closed this Jun 30, 2021
@Ngone51
Copy link
Member

Ngone51 commented Jun 30, 2021

I'm thinking that if we dynamically disable push-based shuffle when indeterminated stage retries (e.g., via job properties), users are still safe to try it?

(Note that push-based shuffle is already disabled in the case of multiple yarn attempts in master branch)

@mridulm
Copy link
Contributor

mridulm commented Jun 30, 2021

@Ngone51 The problem is that fixing the two correctness pending issues requires protocol changes - which means serde issues if 3.3 (where this is fixed) tries to work with 3.2 (if it is released without the fixes) - client/ESS combinations (3.3 client with 3.2 ESS and 3.2 client with 3.3 ESS in future).

@Ngone51
Copy link
Member

Ngone51 commented Jun 30, 2021

I see. Make sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants