Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] client memory leak with netty #1355 #1359

Closed
3 tasks done
zuston opened this issue Dec 7, 2023 · 8 comments
Closed
3 tasks done

[Bug] client memory leak with netty #1355 #1359

zuston opened this issue Dec 7, 2023 · 8 comments

Comments

@zuston
Copy link
Member

zuston commented Dec 7, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

image

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@rickyma
Copy link
Contributor

rickyma commented Dec 19, 2023

I've encountered this problem too. Any ideas to fix this? @zuston

@zuston
Copy link
Member Author

zuston commented Dec 19, 2023

I've encountered this problem too. Any ideas to fix this? @zuston

I don't reproduce this case, do you want to fix this? @rickyma

@rickyma
Copy link
Contributor

rickyma commented Dec 19, 2023

I can reproduce this issue when we are doing stress testing, but I don't know how to fix it. It seems to be a problem with the underlying code handling in Netty. @zuston

jerqi pushed a commit that referenced this issue Jan 19, 2024
…1455)

### What changes were proposed in this pull request?

The current code logic is that the `ByteBuf` is only released when `msg.body() == null`. However, when `msg.body != null`, `msg.body().byteBuf()` returns a `NettyManagedBuffer.EMPTY_BUFFER`, and the `ByteBuf` is not released in this case, resulting in a memory leak issue every time decoding an RPC response from ShuffleServer.
Over time, if the Spark Job runs long enough and there are enough requests, it will eventually cause a significant memory leak on the client side (Spark Executor).
The modifications to the other code are mainly for readability and enhanced protection, and will not cause any side effects.

### Why are the changes needed?

To fix the memory leak issue in the Netty client when decoding RPC responses.
For [#1359](#1359)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Rerun successfully and no more Netty leak logs.
@rickyma
Copy link
Contributor

rickyma commented Feb 11, 2024

I think we can close this now. @zuston

@rickyma
Copy link
Contributor

rickyma commented May 9, 2024

@jerqi @zuston We can close this.

@zuston
Copy link
Member Author

zuston commented May 9, 2024

@jerqi @zuston We can close this.

Is everything ok ?

@rickyma
Copy link
Contributor

rickyma commented May 9, 2024

@jerqi @zuston We can close this.

Is everything ok ?

Yes.

@zuston
Copy link
Member Author

zuston commented May 10, 2024

@jerqi @zuston We can close this.

Is everything ok ?

Yes.

thanks for your effort

@zuston zuston closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants