New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[#1594] improvement(client):support generating larger block size during shuffle map task by spill partial partitions data #1670

Merged

zuston merged 15 commits into apache:master from leslizhang:issue-1594

May 14, 2024

Contributor

leslizhang commented Apr 28, 2024

What changes were proposed in this pull request?

when spilling shuffle data, we just spill part of the reduce partition datas which hold the major space.
so, in each spilling process, the WriteBufferManager.clear() method should implement one more logic: sort the to-be spilled buffers by their size and select the top-N buffers to spill.

Why are the changes needed?

related feature #1594

Does this PR introduce any user-facing change?

No.

How was this patch tested?

new UTs.

leslizhang added 4 commits

April 28, 2024 10:35


          --story=117012386 shuffle map task支持spill部分分区数据以生成更大的block size

434f2dc

增加客户端部分spill逻辑


          support generating larger block size during shuffle map task by spill…

03b330f

… partial partitions data


          support generating larger block size during shuffle map task by spill…

4a25591

… partial partitions data


          Merge remote-tracking branch 'origin/issue-1594' into issue-1594

db30260

# Conflicts:
#	client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java

github-actions bot commented Apr 28, 2024 •

edited

Loading

Test Results

2 391 files ±0 2 391 suites ±0 4h 58m 3s ⏱️ + 1m 13s
929 tests +1 928 ✅ +1 1 💤 ±0 0 ❌ ±0
10 763 runs +9 10 749 ✅ +9 14 💤 ±0 0 ❌ ±0

Results for commit a650c97. ± Comparison against base commit 8e26a34.

♻️ This comment has been updated with latest results.

leslizhang added 6 commits

April 28, 2024 14:13


          Merge remote-tracking branch 'origin/issue-1594' into issue-1594

ede9626

# Conflicts:
#	client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java


          Merge remote-tracking branch 'origin/issue-1594' into issue-1594

56537bf


          support generating larger block size during shuffle map task by spill…

194f550

… partial partitions data


          Merge remote-tracking branch 'origin/issue-1594' into issue-1594

eb7648b


          support generating larger block size during shuffle map task by spill…

3d1f90d

… partial partitions data


          Merge remote-tracking branch 'origin/issue-1594' into issue-1594

3235db8

Contributor

jerqi commented Apr 29, 2024

@rickyma Could you help me review this pull request?

Contributor

jerqi commented Apr 29, 2024

Could you paste some test results to the community for this feature?

rickyma reviewed

View reviewed changes

Contributor

rickyma left a comment

Overall LGTM. Left some minor comments.

client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java Outdated

+                  List<Integer> partitionList =
+                      new ArrayList<Integer>() {
+                        {
+                          addAll(buffers.keySet());

Contributor

rickyma Apr 29, 2024

I think this can be simplified as:

List<Integer> partitionList = new ArrayList<>(buffers.keySet());

client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java Outdated

+                        }
+                      };
+                  if (bufferSpillRatio < 1.0) {
+                    Collections.sort(

Contributor

rickyma Apr 29, 2024

This can be simplified as:

partitionList.sort(Comparator.comparingInt(o -> buffers.get(o) == null ? 0 : buffers.get(o).getMemoryUsed()).reversed());

client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java Outdated

@@ @@ -316,6 +350,10 @@ public synchronized List<ShuffleBlockInfo> clear() { @@
                           + dataSize
                           + "], memoryUsed["
                           + memoryUsed
+                          + "],number of blocks["

Contributor

rickyma Apr 29, 2024

Nit, add a space here, for a better log output:
],number of blocks[ -> ], number of blocks[

client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java Outdated

@@ @@ -316,6 +350,10 @@ public synchronized List<ShuffleBlockInfo> clear() { @@
                           + dataSize
                           + "], memoryUsed["
                           + memoryUsed
+                          + "],number of blocks["
+                          + result.size()
+                          + "],flush ratio["

Contributor

rickyma Apr 29, 2024

Nit, add a space here, for a better log output:
],flush ratio[ -> ], flush ratio[

client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java Outdated

+                    LOG.info(
+                        String.format(
+                            "ShuffleBufferManager spill for buffer size exceeding spill threshold,"
+                                + "usedBytes[%d],inSendListBytes[%d],spill size threshold[%d]",

Contributor

rickyma Apr 29, 2024

Nit, a better log output:

" usedBytes[%d], inSendListBytes[%d], spill size threshold[%d]",

leslizhang added 4 commits

April 29, 2024 11:06


          Merge remote-tracking branch 'origin/issue-1594' into issue-1594

7e1e1d5


          Merge remote-tracking branch 'origin/issue-1594' into issue-1594

7a830e0


          support generating larger block size during shuffle map task by spill…

f3df3eb

… partial partitions data


          Merge branch 'issue-1594' of https://github.com/leslizhang/incubator-…

1a8e023

…uniffle into issue-1594

jerqi reviewed

View reviewed changes

client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java Outdated Show resolved Hide resolved


          support generating larger block size during shuffle map task by spill…

a650c97

… partial partitions data

Member

zuston commented May 13, 2024

Could you help review this again? @jerqi @rickyma

rickyma approved these changes

View reviewed changes

Contributor

rickyma left a comment

LGTM.

jerqi approved these changes

View reviewed changes

Contributor

jerqi left a comment

LGTM.

zuston approved these changes

View reviewed changes

Member

zuston left a comment •

edited

Loading

LGTM. Although I think if you want to acheive bigger block size, maybe the temproal executor side localfile could be implemented to store the task partition shuffle data.

zuston merged commit dc022d6 into apache:master

41 checks passed

This was referenced May 30, 2024

[Bug] Task that encounters inconsistent data records number fast fail #1755

Closed

[#1755] fix(spark): Avoid task failure of inconsistent record number #1756

Merged

zuston added a commit that referenced this pull request


          [#1755] fix(spark): Avoid task failure of inconsistent record number (#…

d182a03

…1756)

### What changes were proposed in this pull request?

1. When the spill ratio is `1.0` , the process of calculating target spill size will be ignored to avoid potential race condition that the `usedBytes` and `inSendBytes` are not thread safe. This could guarantee that the all data is flushed to the shuffle server at the end of task.
2. Adding the `bufferManager's` buffer remaining check

### Why are the changes needed?

Due to the #1670 , the partial data held by the bufferManager will not be flushed to shuffle servers in some corner cases, 
this will make task fail fast rather than silently data loss that should thanks the #1558 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

rickyma mentioned this pull request

[#1717] improvement: Flush a part of partitions if the shuffle size too big #1718

Merged

zuston added a commit to zuston/incubator-uniffle that referenced this pull request


          [apache#1755] fix(spark): Avoid task failure of inconsistent record n…

bd28be9

…umber (apache#1756)

### What changes were proposed in this pull request?

1. When the spill ratio is `1.0` , the process of calculating target spill size will be ignored to avoid potential race condition that the `usedBytes` and `inSendBytes` are not thread safe. This could guarantee that the all data is flushed to the shuffle server at the end of task.
2. Adding the `bufferManager's` buffer remaining check

### Why are the changes needed?

Due to the apache#1670 , the partial data held by the bufferManager will not be flushed to shuffle servers in some corner cases, 
this will make task fail fast rather than silently data loss that should thanks the apache#1558 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet