Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26604][CORE] Clean up channel registration for StreamManager #23521

Closed
wants to merge 4 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jan 11, 2019

What changes were proposed in this pull request?

Now in TransportRequestHandler.processStreamRequest, when a stream request is processed, the stream id is not registered with the current channel in stream manager. It should do that so in case of that the channel gets terminated we can remove associated streams of stream requests too.

This also cleans up channel registration in StreamManager. Since StreamManager doesn't register channel but only OneForOneStreamManager does it, this removes registerChannel from StreamManager. When OneForOneStreamManager goes to register stream, it will also register channel for the stream.

How was this patch tested?

Existing tests.

@viirya
Copy link
Member Author

viirya commented Jan 11, 2019

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Jan 11, 2019

Test build #101089 has finished for PR 23521 at commit 6e35249.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -127,6 +127,7 @@ private void processStreamRequest(final StreamRequest req) {
ManagedBuffer buf;
try {
buf = streamManager.openStream(req.streamId);
streamManager.registerChannel(channel, req.streamId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering when we should do this. There are many kinds of requests, and currently only chunk fetch request does it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For stream request and get chunk request, they are recorded in stream manager. Registering channel for them is to make sure they are removed from stream manager when the channel is inactive. For other types of requests, I don't find they are recorded like that.

@SparkQA
Copy link

SparkQA commented Jan 11, 2019

Test build #101090 has finished for PR 23521 at commit ba9c27e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* of the stream. This is similar to {@link #registerChannel(Channel, long)} method, but the
* <code>streamId</code> argument is for the stream in response to a stream() request.
*/
public void registerChannel(Channel channel, String streamId) { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now things get tricky here. There are 2 different kind of stream requests:

  1. to download jar and other files
  2. to fetch data blocks (introduced at [SPARK-19659] Fetch big blocks to disk when shuffle-read. #16989)

Which one do we need to register channel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For stream requests from fetching data blocks, the streams will be registered by RPC handler. So registering the channels helps to remove registered streams when the channels get inactive.
  2. For stream requests from downloading jar and files, there is no such stream registration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying TransportRequestHandler.processStreamRequest is only used to deal with stream request to fetch blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransportRequestHandler.processStreamRequest is used for both. But the streams are not registered there. It is registered by NettyBlockRpcServer when processing OpenBlocks message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registerStream is only in OneForOneStreamManager, it's super weird that registerChannel needs to be called after registerStream, but registerChannel is in the parent StreamManager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I agree. I think it is also good to clean this up a bit. I will do a change later.

@viirya
Copy link
Member Author

viirya commented Jan 14, 2019

@cloud-fan Since StreamManager doesn't register channel but only OneForOneStreamManager does it, I remove registerChannel from StreamManager. When OneForOneStreamManager goes to serve chunk or stream request, it will register channel for the stream.

@viirya viirya changed the title [SPARK-26604][CORE] Register channel for stream request [SPARK-26604][CORE] Clean up channel registration for StreamManager Jan 14, 2019
@SparkQA
Copy link

SparkQA commented Jan 14, 2019

Test build #101187 has finished for PR 23521 at commit 9082f01.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 14, 2019

Test build #101188 has started for PR 23521 at commit 6b028a9.

@shaneknapp
Copy link
Contributor

test this please

1 similar comment
@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jan 14, 2019

Test build #101198 has finished for PR 23521 at commit 6b028a9.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Associates a stream with a single client connection, which is guaranteed to be the only reader
* of the stream. Once the connection is closed, the stream will never be used again, enabling
* cleanup by `connectionTerminated`.
*/
public void registerChannel(Channel channel, long streamId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be private?

@@ -42,9 +42,10 @@
* The returned ManagedBuffer will be release()'d after being written to the network.
*
* @param streamId id of a stream that has been previously registered with the StreamManager.
* @param channel The connection used to serve chunk request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say more about this parameter, especially how it should be used. IIUC we need to track the channel states, and do some cleanup if the channel is inactive.

@SparkQA
Copy link

SparkQA commented Jan 15, 2019

Test build #101211 has finished for PR 23521 at commit 53e9c6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 15, 2019

@cloud-fan Moved channel registration to where we register the stream. Few tests are modified.

@SparkQA
Copy link

SparkQA commented Jan 15, 2019

Test build #101243 has finished for PR 23521 at commit be01666.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 15, 2019

Test build #101244 has finished for PR 23521 at commit 7c1e13d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 15, 2019

Oops...

this.appId = appId;
this.buffers = Preconditions.checkNotNull(buffers);
this.associatedChannel = channel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

associatedChannel can be final now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return myStreamId;
}

@VisibleForTesting
public int streamStateSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: numStreamStates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@SparkQA
Copy link

SparkQA commented Jan 15, 2019

Test build #101245 has finished for PR 23521 at commit 8cd705e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 15, 2019

Test build #101251 has finished for PR 23521 at commit 7eed779.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 15, 2019

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 15, 2019

Test build #101269 has finished for PR 23521 at commit 7eed779.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in cf133e6 Jan 16, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Now in `TransportRequestHandler.processStreamRequest`, when a stream request is processed, the stream id is not registered with the current channel in stream manager. It should do that so in case of that the channel gets terminated we can remove associated streams of stream requests too.

This also cleans up channel registration in `StreamManager`. Since `StreamManager` doesn't register channel but only `OneForOneStreamManager` does it, this removes `registerChannel` from `StreamManager`. When `OneForOneStreamManager` goes to register stream, it will also register channel for the stream.

## How was this patch tested?

Existing tests.

Closes apache#23521 from viirya/SPARK-26604.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@felixcheung
Copy link
Member

I think we need to backport this into branch-2.4

@viirya
Copy link
Member Author

viirya commented Mar 7, 2019

If it can't be directly merged to branch-2.4, I can make a PR for it.

@abellina
Copy link
Contributor

abellina commented Mar 7, 2019

I am taking a look, but it may not be until later today that I'd have a PR up. It's mostly clean but the 2.4 branch doesn't have the ChunkFetchRequestHandler, and then there's one extra registerChannel that needs to go away + testing. @viirya if you want to do PR sooner, please feel free.

abellina pushed a commit to abellina/spark that referenced this pull request Mar 7, 2019
Now in `TransportRequestHandler.processStreamRequest`, when a stream request is processed, the stream id is not registered with the current channel in stream manager. It should do that so in case of that the channel gets terminated we can remove associated streams of stream requests too.

This also cleans up channel registration in `StreamManager`. Since `StreamManager` doesn't register channel but only `OneForOneStreamManager` does it, this removes `registerChannel` from `StreamManager`. When `OneForOneStreamManager` goes to register stream, it will also register channel for the stream.

Existing tests.

Closes apache#23521 from viirya/SPARK-26604.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@abellina
Copy link
Contributor

abellina commented Mar 7, 2019

#24013

vanzin pushed a commit that referenced this pull request Mar 8, 2019
…treamManager

## What changes were proposed in this pull request?

This is mostly a clean backport of #23521 to branch-2.4

## How was this patch tested?

I've tested this with a hack in `TransportRequestHandler` to force `ChunkFetchRequest` to get dropped.

Then making a number of `ExternalShuffleClient.fetchChunk` requests (which `OpenBlocks` then `ChunkFetchRequest`) and closing out of my test harness. A heap dump later reveals that the `StreamState` references are unreachable.

I haven't run this through the unit test suite, but doing that now. Wanted to get this up as I think folks are waiting for it for 2.4.1

Closes #24013 from abellina/SPARK-26604_cherry_pick_2_4.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
vanzin pushed a commit that referenced this pull request Mar 8, 2019
…treamManager

## What changes were proposed in this pull request?

This is mostly a clean backport of #23521 to branch-2.4

## How was this patch tested?

I've tested this with a hack in `TransportRequestHandler` to force `ChunkFetchRequest` to get dropped.

Then making a number of `ExternalShuffleClient.fetchChunk` requests (which `OpenBlocks` then `ChunkFetchRequest`) and closing out of my test harness. A heap dump later reveals that the `StreamState` references are unreachable.

I haven't run this through the unit test suite, but doing that now. Wanted to get this up as I think folks are waiting for it for 2.4.1

Closes #24013 from abellina/SPARK-26604_cherry_pick_2_4.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(cherry picked from commit 216eeec)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…treamManager

## What changes were proposed in this pull request?

This is mostly a clean backport of apache#23521 to branch-2.4

## How was this patch tested?

I've tested this with a hack in `TransportRequestHandler` to force `ChunkFetchRequest` to get dropped.

Then making a number of `ExternalShuffleClient.fetchChunk` requests (which `OpenBlocks` then `ChunkFetchRequest`) and closing out of my test harness. A heap dump later reveals that the `StreamState` references are unreachable.

I haven't run this through the unit test suite, but doing that now. Wanted to get this up as I think folks are waiting for it for 2.4.1

Closes apache#24013 from abellina/SPARK-26604_cherry_pick_2_4.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
…treamManager

## What changes were proposed in this pull request?

This is mostly a clean backport of apache#23521 to branch-2.4

## How was this patch tested?

I've tested this with a hack in `TransportRequestHandler` to force `ChunkFetchRequest` to get dropped.

Then making a number of `ExternalShuffleClient.fetchChunk` requests (which `OpenBlocks` then `ChunkFetchRequest`) and closing out of my test harness. A heap dump later reveals that the `StreamState` references are unreachable.

I haven't run this through the unit test suite, but doing that now. Wanted to get this up as I think folks are waiting for it for 2.4.1

Closes apache#24013 from abellina/SPARK-26604_cherry_pick_2_4.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…treamManager

## What changes were proposed in this pull request?

This is mostly a clean backport of apache#23521 to branch-2.4

## How was this patch tested?

I've tested this with a hack in `TransportRequestHandler` to force `ChunkFetchRequest` to get dropped.

Then making a number of `ExternalShuffleClient.fetchChunk` requests (which `OpenBlocks` then `ChunkFetchRequest`) and closing out of my test harness. A heap dump later reveals that the `StreamState` references are unreachable.

I haven't run this through the unit test suite, but doing that now. Wanted to get this up as I think folks are waiting for it for 2.4.1

Closes apache#24013 from abellina/SPARK-26604_cherry_pick_2_4.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
zhongjinhan pushed a commit to zhongjinhan/spark-1 that referenced this pull request Sep 3, 2019
…treamManager

## What changes were proposed in this pull request?

This is mostly a clean backport of apache/spark#23521 to branch-2.4

## How was this patch tested?

I've tested this with a hack in `TransportRequestHandler` to force `ChunkFetchRequest` to get dropped.

Then making a number of `ExternalShuffleClient.fetchChunk` requests (which `OpenBlocks` then `ChunkFetchRequest`) and closing out of my test harness. A heap dump later reveals that the `StreamState` references are unreachable.

I haven't run this through the unit test suite, but doing that now. Wanted to get this up as I think folks are waiting for it for 2.4.1

Closes #24013 from abellina/SPARK-26604_cherry_pick_2_4.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(cherry picked from commit 216eeec)
prakharjain09 pushed a commit to prakharjain09/spark that referenced this pull request Nov 29, 2019
## What changes were proposed in this pull request?

Now in `TransportRequestHandler.processStreamRequest`, when a stream request is processed, the stream id is not registered with the current channel in stream manager. It should do that so in case of that the channel gets terminated we can remove associated streams of stream requests too.

This also cleans up channel registration in `StreamManager`. Since `StreamManager` doesn't register channel but only `OneForOneStreamManager` does it, this removes `registerChannel` from `StreamManager`. When `OneForOneStreamManager` goes to register stream, it will also register channel for the stream.

## How was this patch tested?

Existing tests.

Closes apache#23521 from viirya/SPARK-26604.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit cf133e6)
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
Now in `TransportRequestHandler.processStreamRequest`, when a stream request is processed, the stream id is not registered with the current channel in stream manager. It should do that so in case of that the channel gets terminated we can remove associated streams of stream requests too.

This also cleans up channel registration in `StreamManager`. Since `StreamManager` doesn't register channel but only `OneForOneStreamManager` does it, this removes `registerChannel` from `StreamManager`. When `OneForOneStreamManager` goes to register stream, it will also register channel for the stream.

Existing tests.

Closes apache#23521 from viirya/SPARK-26604.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit cf133e6)

RB=1586979
BUG=LIHADOOP-44658
G=superfriends-reviewers
R=mshen,yezhou,fli,edlu
A=mshen
@viirya viirya deleted the SPARK-26604 branch December 27, 2023 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants