-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1755] fix(spark): Avoid task failure of inconsistent record number #1756
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Although we've never encountered this issue in prod.
LGTM. Left a comment.
partitionList.sort( | ||
Comparator.comparingInt(o -> buffers.get(o) == null ? 0 : buffers.get(o).getMemoryUsed()) | ||
.reversed()); | ||
if (bufferSpillRatio != 1.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this line, we already have if (Double.compare(bufferSpillRatio, 1.0) < 0) {
More validation mechanisms should be added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merged to master & branch 0.9.
cherry-pick failure to branch 0.9 |
@zuston Can we add a new PR to branch 0.9. It is a critical fix. |
…umber (apache#1756) ### What changes were proposed in this pull request? 1. When the spill ratio is `1.0` , the process of calculating target spill size will be ignored to avoid potential race condition that the `usedBytes` and `inSendBytes` are not thread safe. This could guarantee that the all data is flushed to the shuffle server at the end of task. 2. Adding the `bufferManager's` buffer remaining check ### Why are the changes needed? Due to the apache#1670 , the partial data held by the bufferManager will not be flushed to shuffle servers in some corner cases, this will make task fail fast rather than silently data loss that should thanks the apache#1558 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests.
What changes were proposed in this pull request?
1.0
, the process of calculating target spill size will be ignored to avoid potential race condition that theusedBytes
andinSendBytes
are not thread safe. This could guarantee that the all data is flushed to the shuffle server at the end of task.bufferManager's
buffer remaining checkWhy are the changes needed?
Due to the #1670 , the partial data held by the bufferManager will not be flushed to shuffle servers in some corner cases,
this will make task fail fast rather than silently data loss that should thanks the #1558
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.