[SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks #29855

Victsm · 2020-09-23T19:43:45Z

What changes were proposed in this pull request?

This is the first patch for SPIP SPARK-30602 for push-based shuffle.
Summary of changes:

Introduce new API in ExternalBlockStoreClient to push blocks to a remote shuffle service.
Leveraging the streaming upload functionality in SPARK-6237, it also enables the ExternalBlockHandler to delegate the handling of block push requests to MergedShuffleFileManager.
Propose the API for MergedShuffleFileManager, where the core logic on the shuffle service side to handle block push requests is defined. The actual implementation of this API is deferred into a later RB to restrict the size of this PR.
Introduce OneForOneBlockPusher to enable pushing blocks to remote shuffle services in shuffle RPC layer.
New protocols in shuffle RPC layer to support the functionalities.

Why are the changes needed?

Refer to the SPIP in SPARK-30602

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests.
The reference PR with the consolidated changes covering the complete implementation is also provided in SPARK-30602.
We have already verified the functionality and the improved performance as documented in the SPIP doc.

Lead-authored-by: Min Shen mshen@linkedin.com
Co-authored-by: Chandni Singh chsingh@linkedin.com
Co-authored-by: Ye Zhou yezhou@linkedin.com

The following changes are included in this patch. In addition, fixed a potential block duplicate issue when speculative execution is enabled, and improved test coverage. commit 7e134c2b75c8882474a67c15036087eb8a02caef Author: Chandni Singh <chsingh@linkedin.com> Date: Tue Apr 21 21:54:46 2020 -0700 LIHADOOP-52202 Create merge_manager under app local dirs RB=2069854 G=superfriends-reviewers R=mshen,yezhou A=chsingh commit 63dcda9309fc06c5f1fb6e7268df1d7416db49c7 Author: Chandni Singh <chsingh@linkedin.com> Date: Wed Apr 1 12:15:21 2020 -0700 LIHADOOP-51889 Divide remote fetches into smaller chunks RB=2029681 commit 35298465155ec10e6ee2caf1adc0e78717dc6fed Author: Chandni Singh <chsingh@linkedin.com> Date: Thu Mar 19 17:47:40 2020 -0700 LIHADOOP-51889 Writing last chunk offsets to merge index file RB=2016700 BUG=LIHADOOP-51889 G=superfriends-reviewers R=mshen A=mshen commit bbb53ec0fdfa0aebda954ede17a9f6e217607a53 Author: Min Shen <mshen@linkedin.com> Date: Thu Dec 19 08:46:34 2019 -0800 Shuffle server and client properly handles merged block fetch failure. Use file length as merged shuffle block size when serving merged shuffle block. commit 6718074c6a6a98b1d66d4fdff6bf08fb266ce32e Author: Min Shen <mshen@linkedin.com> Date: Mon Nov 18 14:19:20 2019 -0800 Netty protocol for DAGScheduler control message commit 52e4dfade2e004fbc39fc60937342a9a57872680 Author: Min Shen <mshen@linkedin.com> Date: Sun Sep 8 18:44:09 2019 -0700 Netty protocol for remote block push, pass 3 commit e9db4cc1ae56e9722a598b0011a10e55e84bf19c Author: Min Shen <mshen@linkedin.com> Date: Thu Sep 5 18:29:24 2019 -0700 Netty protocol for remote block push, pass 2 commit 7627ecf54292edda4a133e596f53306e7af76100 Author: Min Shen <mshen@linkedin.com> Date: Fri Aug 30 08:54:08 2019 -0700 Netty protocol for remote block push, pass 1

RB=2096937 G=spark-reviewers R=chsingh,mshen A=mshen

…ResolverSuite RB=2101153 BUG=LIHADOOP-53438 G=spark-reviewers R=mshen,yezhou A=yezhou

RB=2104829 BUG=LIHADOOP-53496 G=spark-reviewers R=yezhou,mshen A=mshen

…les in NM RB=2130238 BUG=LIHADOOP-53700 G=spark-reviewers R=mshen,chsingh A=chsingh

… service is unable to create them RB=2146753 BUG=LIHADOOP-53940 G=spark-reviewers R=mshen,yezhou A=mshen,yezhou

…l dirs provided to executor and the shuffle service and not log all exceptions at error/warning level RB=2152736 BUG=LIHADOOP-53496,LIHADOOP-54059 G=spark-reviewers R=yezhou,mshen A=mshen

RB=2166324 BUG=LIHADOOP-54379 G=spark-reviewers R=yezhou,mshen A=mshen

RB=2166258 BUG=LIHADOOP-54370 G=spark-reviewers R=mshen,yezhou A=mshen

… a shuffle chunk fails RB=2203642 BUG=LIHADOOP-52494 G=spark-reviewers R=yzhou,mshen,vsowrira A=mshen

RB=2253833 G=spark-reviewers R=mshen,vsowrira,mmuralid,yezhou A=mshen

…al host with a consistent view of app local dirs among different executors RB=2261073 BUG=LIHADOOP-55315 G=spark-reviewers R=chsingh,mshen,vsowrira,mmuralid A=mmuralid,chsingh

… local dirs update in shuffle service. Also fixing a memory leak. RB=2281730 BUG=LIHADOOP-55654 G=spark-reviewers R=vsowrira,chsingh,mshen A=vsowrira,chsingh

Victsm · 2020-09-23T19:57:35Z

A few clarifications on this PR:
The entire netty RPC layer change for push-based shuffle is ~4000 LOC in our current implementation. We plan to break it down into 3 PRs for easier review:

The first one in this PR focus on the foundation for supporting block push functionalities
The second PR will provide the actual implementation for the MergedShuffleFileManager, as well as the integration with YARNShuffleService
The third PR will provide the read path implementation supporting fetching a merged shuffle file as a sequence of chunks

In addition, there are some additional refactoring we could do with this PR.
For example, we reuse RetryingBlockFetcher and BlockFetchingListener for block push as well.
This makes their naming not appropriate any more.
We didn't make that change in this PR to reduce the number of files we touch, so it's easier to review.
We can either send out a separate PR just to do these refactoring or update this PR, depending on the reviewers' preferences.

Also, regarding the approach to encode the PushBlockStream header into the exception message to be returned to the client, this is to minimize changes to the existing Netty protocol, so that it's easier to introduce such a change to clusters actively using the existing protocol.
We are open to suggestions.

otterc

Looks good! Thanks @Victsm

...on/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java

common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java

.../network-shuffle/src/test/java/org/apache/spark/network/shuffle/BlockPushExceptionSuite.java

mridulm · 2020-09-23T20:43:55Z

ok to test

mridulm · 2020-09-23T20:44:21Z

+CC @jiangxb1987

mridulm · 2020-09-23T20:49:35Z

+CC @attilapiros, @mccheah

mridulm · 2020-09-23T20:52:32Z

Something seems off about the jenkins test. @shaneknapp can you please take a look ?

mridulm · 2020-09-23T21:18:50Z

+CC @tgravescs

Victsm · 2020-09-23T21:23:52Z

Fixed the Java style issue and the 1 UT failure.
Test build should be clean now.

SparkQA · 2020-09-23T23:32:01Z

Test build #129045 has finished for PR 29855 at commit 2bdf800.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-24T00:33:40Z

Test build #129046 has finished for PR 29855 at commit 3e9e9e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-12T17:16:02Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34306/

SparkQA · 2020-10-12T17:32:20Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34306/

SparkQA · 2020-10-12T19:31:23Z

Test build #129699 has finished for PR 29855 at commit f016b39.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Ngone51 · 2020-10-13T13:47:05Z

common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ErrorHandlerSuite.java

+      ErrorHandler.BlockPushErrorHandler.TOO_LATE_MESSAGE_SUFFIX))));
+    assertFalse(handler.shouldRetryError(new RuntimeException(new ConnectException())));
+    assertTrue(handler.shouldRetryError(new RuntimeException(new IllegalArgumentException(
+      ErrorHandler.BlockPushErrorHandler.BLOCK_APPEND_COLLISION_DETECTED_MSG_PREFIX))));


I'm wondering why this error is retriable. My understanding is that the error returns when there's already another attempt wrote or being writing (not sure if this case also included) the same shuffle block. So does retry is to prevent the case where the writing attempt fails to write the block completely?

It's not the same block, but another block belonging to the same shuffle partition.

Ah..sorry. I do mean the same shuffle partition rather than the same block.

WDYT?

It's retriable because this block hasn't been appended to the merged shuffle file and the merge operation hasn't been finalized yet.

When we append a block to the merged shuffle file, we either append it completely or we end up effectively writing nothing to the file.
If the first attempt failed because of collision, then that block effectively hasn't been appended to the file yet, which makes it retriable.
In the 2nd PR to be sent out soon, it will include more details for this part.

If the first attempt failed because of collision, then that block effectively hasn't been appended to the file yet, which makes it retriable.

What makes the first attempt failed because of collision? With my understanding, it has two possibilities:

the same partition has been already merged by another task attempt

the same partition is merging by another task attempt

For case 1, do we still need to retry? If we do retry in this case, doesn't it return BLOCK_APPEND_COLLISION_DETECTED_MSG_PREFIX again?

For case2, I think it may make sense to retry in case of that attempt doesn't merge partition successfully at the end.

We actually distinguish between a block duplication and a block collision on the server side.

Block duplication is when the exact same shuffle partition block gets pushed by different executors, due to speculation or maybe task retry.
The server side is able to tell when block duplication happens, whether is one duplicate block sent after the first has been successfully merged or when both blocks are received at the same time.
With duplicate block, the server will actually respond success to the client, so the client won't retry sending it.
In the case of speculation, when 2 clients might be sending the same block at the same time, the server will respond success to 1 of the two and let the other write, and if that write fails the corresponding client will retry if it's retriable.

On the other hand, a block collision is not about the exact same shuffle partition block, but 2 different blocks belonging to the same shuffle partition being sent to the same shuffle service at the same time.
Since the shuffle service need to append 1 block completely before appending the content of the next block belonging to the same shuffle partition, when these blocks arrive at one shuffle service at the same time, we would encounter a block collision.
A block collision might not immediately lead to the collision failure sent back to the client, since the server will buffer the blocks for a short period of time and make a few attempts before giving up.
When the shuffle service gives up, the client will receive the collision failure.
If it receives the collision failure, it's an indication that this block hasn't been merged yet, and thus it's OK to retry.
Of course, it's totally possible that by the time the retry happens, a speculative task has already pushed the block and successfully merged it.
If that's the case, the retry would be treated as a block duplication instead of a block collision, and the client will receive success response.

I hope this servers as an overview of what's to come in the next PR.

I see. Thanks for the detailed explanation.

...n/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/MergeStatuses.java

Victsm · 2020-10-13T15:58:54Z

Thanks for the additional review comments from @jiangxb1987 @Ngone51. I should have resolved all pending issues.

Ping @attilapiros @tgravescs @mridulm to see if there're any additional concerns on the PR and if we can get a +1.

Victsm · 2020-10-13T19:14:36Z

The most recent test failure does not seem related to this patch.

Ngone51 · 2020-10-14T12:45:29Z

LGTM

mridulm · 2020-10-14T19:56:09Z

Thanks for the review @Ngone51 !
+CC @tgravescs, @attilapiros, @jiangxb1987, @otterc any additional comments ? Or are all concerns resolved ?.

otterc · 2020-10-14T20:04:01Z

Looks good to me.

SparkQA · 2020-10-14T23:48:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34371/

SparkQA · 2020-10-15T00:05:53Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34371/

SparkQA · 2020-10-15T00:54:10Z

Test build #129765 has finished for PR 29855 at commit 2c95f18.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

attilapiros · 2020-10-15T10:31:42Z

I do not like the error message check (the string contain) but as it it is promised to be addressed in a follow-up PR #29855 (comment), it is fine now.

LGTM

tgravescs · 2020-10-15T13:22:04Z

changes look fine

mridulm · 2020-10-15T17:34:02Z

Thanks for the reviews @attilapiros, @tgravescs, @otterc, @jiangxb1987, @Ngone51 Merging to master.

mridulm · 2020-10-15T17:40:29Z

Thanks for working on this @Victsm ! Looking forward to the next set of patches on push based shuffle :-)

dongjoon-hyun · 2020-10-15T18:16:47Z

Thank you all!

…ush operations ### What changes were proposed in this pull request? This is a follow-up to #29855 according to the [comments](https://github.com/apache/spark/pull/29855/files#r505536514) In this PR, the following changes are made: 1. A new `BlockPushingListener` interface is created specifically for block push. The existing `BlockFetchingListener` interface is left as is, since it might be used by external shuffle solutions. These 2 interfaces are unified under `BlockTransferListener` to enable code reuse. 2. `RetryingBlockFetcher`, `BlockFetchStarter`, and `RetryingBlockFetchListener` are renamed to `RetryingBlockTransferor`, `BlockTransferStarter`, and `RetryingBlockTransferListener` respectively. This makes their names more generic to be reused across both block fetch and push. 3. Comments in `OneForOneBlockPusher` are further clarified to better explain how we handle retries for block push. ### Why are the changes needed? To make code cleaner without sacrificing backward compatibility. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests. Closes #33340 from Victsm/SPARK-32915-followup. Lead-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Min Shen <victor.nju@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

…ush operations ### What changes were proposed in this pull request? This is a follow-up to #29855 according to the [comments](https://github.com/apache/spark/pull/29855/files#r505536514) In this PR, the following changes are made: 1. A new `BlockPushingListener` interface is created specifically for block push. The existing `BlockFetchingListener` interface is left as is, since it might be used by external shuffle solutions. These 2 interfaces are unified under `BlockTransferListener` to enable code reuse. 2. `RetryingBlockFetcher`, `BlockFetchStarter`, and `RetryingBlockFetchListener` are renamed to `RetryingBlockTransferor`, `BlockTransferStarter`, and `RetryingBlockTransferListener` respectively. This makes their names more generic to be reused across both block fetch and push. 3. Comments in `OneForOneBlockPusher` are further clarified to better explain how we handle retries for block push. ### Why are the changes needed? To make code cleaner without sacrificing backward compatibility. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests. Closes #33340 from Victsm/SPARK-32915-followup. Lead-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Min Shen <victor.nju@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit c4aa54e) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>

…ush operations This is a follow-up to apache#29855 according to the [comments](https://github.com/apache/spark/pull/29855/files#r505536514) In this PR, the following changes are made: 1. A new `BlockPushingListener` interface is created specifically for block push. The existing `BlockFetchingListener` interface is left as is, since it might be used by external shuffle solutions. These 2 interfaces are unified under `BlockTransferListener` to enable code reuse. 2. `RetryingBlockFetcher`, `BlockFetchStarter`, and `RetryingBlockFetchListener` are renamed to `RetryingBlockTransferor`, `BlockTransferStarter`, and `RetryingBlockTransferListener` respectively. This makes their names more generic to be reused across both block fetch and push. 3. Comments in `OneForOneBlockPusher` are further clarified to better explain how we handle retries for block push. To make code cleaner without sacrificing backward compatibility. No Existing unit tests. Closes apache#33340 from Victsm/SPARK-32915-followup. Lead-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Min Shen <victor.nju@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit c4aa54e)

…ush operations ### What changes were proposed in this pull request? This is a follow-up to apache#29855 according to the [comments](https://github.com/apache/spark/pull/29855/files#r505536514) In this PR, the following changes are made: 1. A new `BlockPushingListener` interface is created specifically for block push. The existing `BlockFetchingListener` interface is left as is, since it might be used by external shuffle solutions. These 2 interfaces are unified under `BlockTransferListener` to enable code reuse. 2. `RetryingBlockFetcher`, `BlockFetchStarter`, and `RetryingBlockFetchListener` are renamed to `RetryingBlockTransferor`, `BlockTransferStarter`, and `RetryingBlockTransferListener` respectively. This makes their names more generic to be reused across both block fetch and push. 3. Comments in `OneForOneBlockPusher` are further clarified to better explain how we handle retries for block push. ### Why are the changes needed? To make code cleaner without sacrificing backward compatibility. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests. Closes apache#33340 from Victsm/SPARK-32915-followup. Lead-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Min Shen <victor.nju@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit c4aa54e) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>

…pport push shuffle blocks This is the first patch for SPIP SPARK-30602 for push-based shuffle. Summary of changes: * Introduce new API in ExternalBlockStoreClient to push blocks to a remote shuffle service. * Leveraging the streaming upload functionality in SPARK-6237, it also enables the ExternalBlockHandler to delegate the handling of block push requests to MergedShuffleFileManager. * Propose the API for MergedShuffleFileManager, where the core logic on the shuffle service side to handle block push requests is defined. The actual implementation of this API is deferred into a later RB to restrict the size of this PR. * Introduce OneForOneBlockPusher to enable pushing blocks to remote shuffle services in shuffle RPC layer. * New protocols in shuffle RPC layer to support the functionalities. Refer to the SPIP in SPARK-30602 No. Added unit tests. The reference PR with the consolidated changes covering the complete implementation is also provided in SPARK-30602. We have already verified the functionality and the improved performance as documented in the SPIP doc. Lead-authored-by: Min Shen <mshenlinkedin.com> Co-authored-by: Chandni Singh <chsinghlinkedin.com> Co-authored-by: Ye Zhou <yezhoulinkedin.com> Closes #29855 from Victsm/SPARK-32915. Lead-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Chandni Singh <chsingh@linkedin.com> Co-authored-by: Ye Zhou <yezhou@linkedin.com> Co-authored-by: Chandni Singh <singh.chandni@gmail.com> Co-authored-by: Min Shen <victor.nju@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

…ush operations This is a follow-up to #29855 according to the [comments](https://github.com/apache/spark/pull/29855/files#r505536514) In this PR, the following changes are made: 1. A new `BlockPushingListener` interface is created specifically for block push. The existing `BlockFetchingListener` interface is left as is, since it might be used by external shuffle solutions. These 2 interfaces are unified under `BlockTransferListener` to enable code reuse. 2. `RetryingBlockFetcher`, `BlockFetchStarter`, and `RetryingBlockFetchListener` are renamed to `RetryingBlockTransferor`, `BlockTransferStarter`, and `RetryingBlockTransferListener` respectively. This makes their names more generic to be reused across both block fetch and push. 3. Comments in `OneForOneBlockPusher` are further clarified to better explain how we handle retries for block push. To make code cleaner without sacrificing backward compatibility. No Existing unit tests. Closes #33340 from Victsm/SPARK-32915-followup. Lead-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Min Shen <victor.nju@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

Victsm and others added 15 commits September 18, 2020 21:39

LIHADOOP-53388 Magnet: Fix a bug with calculating bitmap encoded length

3bb084a

RB=2096937 G=spark-reviewers R=chsingh,mshen A=mshen

LIHADOOP-53438 Using different appId for the tests in RemoteBlockPush…

0781541

…ResolverSuite RB=2101153 BUG=LIHADOOP-53438 G=spark-reviewers R=mshen,yezhou A=yezhou

LIHADOOP-53496 Not logging all block push exceptions on the client

7adf227

RB=2104829 BUG=LIHADOOP-53496 G=spark-reviewers R=yezhou,mshen A=mshen

LIHADOOP-53700 Separate configuration for caching the merged index fi…

ee75ee9

…les in NM RB=2130238 BUG=LIHADOOP-53700 G=spark-reviewers R=mshen,chsingh A=chsingh

LIHADOOP-53940 Logging the data file and index file path when shuffle…

d1f36c0

… service is unable to create them RB=2146753 BUG=LIHADOOP-53940 G=spark-reviewers R=mshen,yezhou A=mshen,yezhou

LIHADOOP-54059 LIHADOOP-53496 Handle the inconsistencies between loca…

aa124b4

…l dirs provided to executor and the shuffle service and not log all exceptions at error/warning level RB=2152736 BUG=LIHADOOP-53496,LIHADOOP-54059 G=spark-reviewers R=yezhou,mshen A=mshen

LIHADOOP-54379 Sorting the disks both on shuffle service and executors

3f1fb0c

RB=2166324 BUG=LIHADOOP-54379 G=spark-reviewers R=yezhou,mshen A=mshen

LIHADOOP-54370 Not to retry on certain exceptions when pushing blocks

bfcb070

RB=2166258 BUG=LIHADOOP-54370 G=spark-reviewers R=mshen,yezhou A=mshen

LIHADOOP-52494 Magnet fallback to origin shuffle blocks when fetch of…

415a7d5

… a shuffle chunk fails RB=2203642 BUG=LIHADOOP-52494 G=spark-reviewers R=yzhou,mshen,vsowrira A=mshen

LIHADOOP-55372 reduced the default for minChunkSizeInMergedShuffleFile

4381ff3

RB=2253833 G=spark-reviewers R=mshen,vsowrira,mmuralid,yezhou A=mshen

LIHADOOP-55315 Avoid network when fetching merged shuffle file in loc…

dd9958b

…al host with a consistent view of app local dirs among different executors RB=2261073 BUG=LIHADOOP-55315 G=spark-reviewers R=chsingh,mshen,vsowrira,mmuralid A=mmuralid,chsingh

LIHADOOP-55654 Duplicate application init calls trigger NPE and wrong…

021dea4

… local dirs update in shuffle service. Also fixing a memory leak. RB=2281730 BUG=LIHADOOP-55654 G=spark-reviewers R=vsowrira,chsingh,mshen A=vsowrira,chsingh

Prune changes that should go into a later PR.

5ce02d3

Further prune changes that should go into a later PR.

90d6329

probot-autolabeler bot added BUILD CORE YARN labels Sep 23, 2020

otterc reviewed Sep 23, 2020

View reviewed changes

Fix review comments.

2bdf800

Fix unit test failure.

3e9e9e1

Ngone51 reviewed Oct 13, 2020

View reviewed changes

...n/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/MergeStatuses.java Outdated Show resolved Hide resolved

Fix styling issue

2c95f18

asfgit closed this in 82eea13 Oct 15, 2020

Victsm mentioned this pull request Oct 23, 2020

[SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode #30062

Closed

Victsm mentioned this pull request Nov 26, 2020

[SPARK-32915][SHUFFLE] Create and rename classes in shuffle RPC used for block push operations #30513

Closed

Victsm mentioned this pull request Jul 14, 2021

[SPARK-36266][SHUFFLE] Rename classes in shuffle RPC used for block push operations #33340

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks #29855

[SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks #29855

Victsm commented Sep 23, 2020 •

edited

Loading

Victsm commented Sep 23, 2020 •

edited

Loading

otterc left a comment

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

Victsm commented Sep 23, 2020

SparkQA commented Sep 23, 2020

SparkQA commented Sep 24, 2020

SparkQA commented Oct 12, 2020

SparkQA commented Oct 12, 2020

SparkQA commented Oct 12, 2020

Ngone51 Oct 13, 2020 •

edited

Loading

Victsm Oct 13, 2020

Ngone51 Oct 13, 2020

Victsm Oct 13, 2020

Victsm Oct 13, 2020

Ngone51 Oct 13, 2020

Victsm Oct 14, 2020

Ngone51 Oct 14, 2020

Victsm commented Oct 13, 2020

Victsm commented Oct 13, 2020

Ngone51 commented Oct 14, 2020

mridulm commented Oct 14, 2020

otterc commented Oct 14, 2020

SparkQA commented Oct 14, 2020

SparkQA commented Oct 15, 2020

SparkQA commented Oct 15, 2020

attilapiros commented Oct 15, 2020

tgravescs commented Oct 15, 2020

mridulm commented Oct 15, 2020

mridulm commented Oct 15, 2020

dongjoon-hyun commented Oct 15, 2020

[SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks #29855

[SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks #29855

Conversation

Victsm commented Sep 23, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Victsm commented Sep 23, 2020 • edited Loading

otterc left a comment

Choose a reason for hiding this comment

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

mridulm commented Sep 23, 2020

Victsm commented Sep 23, 2020

SparkQA commented Sep 23, 2020

SparkQA commented Sep 24, 2020

SparkQA commented Oct 12, 2020

SparkQA commented Oct 12, 2020

SparkQA commented Oct 12, 2020

Ngone51 Oct 13, 2020 • edited Loading

Choose a reason for hiding this comment

Victsm Oct 13, 2020

Choose a reason for hiding this comment

Ngone51 Oct 13, 2020

Choose a reason for hiding this comment

Victsm Oct 13, 2020

Choose a reason for hiding this comment

Victsm Oct 13, 2020

Choose a reason for hiding this comment

Ngone51 Oct 13, 2020

Choose a reason for hiding this comment

Victsm Oct 14, 2020

Choose a reason for hiding this comment

Ngone51 Oct 14, 2020

Choose a reason for hiding this comment

Victsm commented Oct 13, 2020

Victsm commented Oct 13, 2020

Ngone51 commented Oct 14, 2020

mridulm commented Oct 14, 2020

otterc commented Oct 14, 2020

SparkQA commented Oct 14, 2020

SparkQA commented Oct 15, 2020

SparkQA commented Oct 15, 2020

attilapiros commented Oct 15, 2020

tgravescs commented Oct 15, 2020

mridulm commented Oct 15, 2020

mridulm commented Oct 15, 2020

dongjoon-hyun commented Oct 15, 2020

Victsm commented Sep 23, 2020 •

edited

Loading

Victsm commented Sep 23, 2020 •

edited

Loading

Ngone51 Oct 13, 2020 •

edited

Loading