-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] add a configuration to control shuffle data flush #445
Conversation
if (pickedFlushSize > expectedFlushSize) { | ||
LOG.info("Finish flush pick with {} bytes", pickedFlushSize); | ||
break; | ||
if (size > this.shuffleFlushThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If max size in pickedShuffle
< shuffleFlushThreshold
, the flushing mechanism is invalid? Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... This shuffleFlushThreshold
is intended to be buffering small I/Os.
If it's a bad configurations, the configurations should be adjusted. Or do you have other ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need to do such change. If too much small IO happened, we'd better to expand the cluster to distribute pressure.
Besides, the small IO could be observed by the metrics of event_size_threshold_level4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need to do such change. If too much small IO happened, we'd better to expand the cluster to distribute pressure.
How to expand the cluster? more shuffle servers? There would be limited number of shuffle server and you cannot scale to many shuffle servers for one application.
This configurations is specially tailed for HDD only shuffle servers. I think it's necessary to reduce small I/Os to HDDs as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event_size_threshold_level4
do you mean event_size_threshold_level1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Got your thought. But I consider an extreme case, if the all buffers size are 1M and memory reaches the high-watermark. After that, the later sending data requests are not the existing buffers' partition, whether the memory's data will be flushed?
In this case, will dead-lock happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Got your thought. But I consider an extreme case, if the all buffers size are 1M and memory reaches the high-watermark. After that, the later sending data requests are not the existing buffers' partition, whether the memory's data will be flushed?
In this case, will dead-lock happen?
If misconfigured, say shuffle memory's high watermark: 1G, then all buffer sizes are 1M or less than 1M, then yes, it could be possible all data stays in memory. But like you said, it's an extremely case, and in practice shuffle data must varies differently.
I don't think we need to consider such cases, there are other ways to deal such case: lower down the threshold size for lower high watermark. Do you have other concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, It could be a bit more adaptive to select shuffleIds to flush... such as if it doesn't free any memory, then just ignore the threshold and re-select the shuffle ids... But I think it's a bit of overkill/over engineering in this stage to introduce such flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm.. If having this possibility, we should take consideration in our design instead of hoping not happening in users environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm.. If having this possibility, we should take consideration in our design instead of hoping not happening in users environment.
😂. The option is the safe bet, and the user/admin should take responsibility for what they are configuring. Take the
SERVER_BUFFER_CAPACITY
and low/high watermark for analog:
- do we need to take considerations when user misconfigured buffer_capacity to some extremely small value, say 1M.
- do we need to take considerations when low/high are equal, which makes flush process one shuffle one time
Did you occurred anything similar in your prod envs?
This is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @advancedxy @zuston
LOG.info("Finish flush pick with {} bytes", pickedFlushSize); | ||
break; | ||
if (size > this.shuffleFlushThreshold) { | ||
pickedFlushSize += size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will it happen if shuffle server only have small ios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shuffle server will not reached to high watermark and thus shuffle data is in almost in memory, which is effectively an in-memory shuffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the shuffle server reached the high watermark, but there only are small I/Os, what will it happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've discussed this situation with @zuston. In practice, the cases you described should not happen.
However since both of you raised similar concern, let me addressed it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, please take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the behavior different from origin one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that you change the flush strategy instead of adding an extra configuration. Could we keep the origin behavior by the configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that you change the flush strategy instead of adding an extra configuration. Could we keep the origin behavior by the configuration?
No, the new behavior is controlled by an extra configuration. The origin behavior is not changed by default unless admin/op specifically set the flush threshold.
Codecov Report
@@ Coverage Diff @@
## master #445 +/- ##
============================================
+ Coverage 58.65% 58.70% +0.04%
- Complexity 1647 1650 +3
============================================
Files 199 199
Lines 11201 11214 +13
Branches 996 998 +2
============================================
+ Hits 6570 6583 +13
Misses 4237 4237
Partials 394 394
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
LGTM, let @zuston take another look. |
Gently ping @zuston |
f62494d
to
6c1e04a
Compare
What changes were proposed in this pull request?
Add a configuration to control which kind of shuffle buffers should be flushed
Why are the changes needed?
To reduce potential small I/Os to local persistent storages. This should close #419
Does this PR introduce any user-facing change?
End user: No
Shuffle Server: a new configuration
How was this patch tested?
add UT