-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE-475][Improvement] It's unnecessary to use ConcurrentHashMap for "partitionToBlockIds" in RssShuffleWriter #480
Conversation
…r partitionToBlockIds in RssShuffleWriter Signed-off-by: Jifu Zhang <jiafu.zhang@intel.com>
…r partitionToBlockIds in RssShuffleWriter replaced concurrenthashmap with hashmap for local variable in ShuffleWriteClientImpl Signed-off-by: Jifu Zhang <jiafu.zhang@intel.com>
@advancedxy @jerqi please help review. |
client-spark/spark3/src/main/java/org/apache/spark/shuffle/writer/RssShuffleWriter.java
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #480 +/- ##
============================================
- Coverage 58.78% 58.77% -0.02%
Complexity 1704 1704
============================================
Files 206 206
Lines 11471 11468 -3
Branches 1024 1024
============================================
- Hits 6743 6740 -3
Misses 4317 4317
Partials 411 411
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…r partitionToBlockIds in RssShuffleWriter applied changes to spark2 module Signed-off-by: Jifu Zhang <jiafu.zhang@intel.com>
@@ -259,7 +259,7 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo | |||
} | |||
|
|||
// maintain the count of blocks that have been sent to the server | |||
Map<Long, AtomicInteger> blockIdsTracker = Maps.newConcurrentMap(); | |||
Map<Long, AtomicInteger> blockIdsTracker = Maps.newHashMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable will be accessed by multiple threads in sendShuffleDataAsync
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's not shared since a new instance is created each time you call the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked the CompletableFuture part inside sendShuffleDataAsync. let me rollback change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's not shared since a new instance is created each time you call the method.
You can see for more details.
incubator-uniffle/client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java
Line 172 in f4048fc
serverToBlockIds.get(ssi).forEach(block -> blockIdsTracker.get(block).incrementAndGet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rolled back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through some logic and didn't find any update to "blockIdsTracker" (correct me if I am wrong) in main thread after "sendShuffleDataAsync" call which runs asynchronously in the threadpool, "dataTransferPool". According to BlockingQueue (used internally by the thread pool), "...actions in a thread prior to placing an object into a BlockingQueue happen-before actions subsequent to the access or removal of that element from the BlockingQueue in another thread.".
So, I think we don't need cocurrentHashmap for "blockIdsTracker". And you use "AtomicInteger" as value part of "blockIdsTracker", it's enough to make the updated value visible to other threads in later code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. But it's more safe to use cocurrentHashmap
. If we modify this logic one day, we could forget to change this type to ConcurrentHashmap
. If you still think it's meaningful to modify this type, I think we could add some comments to explain why we don't use ConcurrentHashmap
and remind us of this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed back to HashMap with comments to show reason. And from the code logic perspective, we will unlikely to insert/delete entries after dispatching it for sendShuffleDataAsync.
…r partitionToBlockIds in RssShuffleWriter rolled back changes to blockIdsTracker in ShuffleWriteClientImpl since it will be referenced later in different threads Signed-off-by: Jifu Zhang <jiafu.zhang@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @zjf2012
…r partitionToBlockIds in RssShuffleWriter hashmap is good here since no delete/insert to the tracker in other threads Signed-off-by: Jifu Zhang <jiafu.zhang@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me.
@jerqi please take another look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,thanks @zjf2012 @advancedxy , merged.
What changes were proposed in this pull request?
replaced some unnecessary concurrenthashmp with hashmap
Why are the changes needed?
improve performance
Does this PR introduce any user-facing change?
No
How was this patch tested?
Tested with repartition workload