-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bandwidth_scheduler) - distribute remaining bandwidth #12682
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12682 +/- ##
==========================================
+ Coverage 70.56% 70.63% +0.07%
==========================================
Files 847 848 +1
Lines 172735 173651 +916
Branches 172735 173651 +916
==========================================
+ Hits 121897 122667 +770
- Misses 45737 45855 +118
- Partials 5101 5129 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM though I don't think I fully understand it.
pub fn distribute_remaining_bandwidth( | ||
sender_budgets: &ShardIndexMap<Bandwidth>, | ||
receiver_budgets: &ShardIndexMap<Bandwidth>, | ||
is_link_allowed: impl Fn(ShardIndex, ShardIndex) -> bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I consider using lambdas like this an anti pattern. This one is border-line because it's very localized and there is only one called. In my experience debugging any issues across the closure are really hard. Is there any other neat way to implement this logic? Perhaps you can define a trait here, have bandwidth scheduler implement it and pass it here? Or pull this logic into the the bandwidth scheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Wac but no hard preferences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to is_link_allowed: ShardLinkMap<bool>
. It'll probably be faster as well.
I used a lambda because that was the easiest way to do it, and I felt like it was clear enough.
I'd like to be able to write independent unit tests, so I'm trying to avoid tight integration with the rest of the bandwidth scheduler.
let mut sender_infos: ShardIndexMap<Info> = ShardIndexMap::new(shard_layout); | ||
let mut receiver_infos: ShardIndexMap<Info> = ShardIndexMap::new(shard_layout); | ||
|
||
for shard in shard_layout.shard_indexes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini nit: shard_index
struct Info { | ||
bandwidth_left: Bandwidth, | ||
links_num: u64, | ||
} | ||
|
||
impl Info { | ||
fn average_link_bandwidth(&self) -> Bandwidth { | ||
if self.links_num == 0 { | ||
return 0; | ||
} | ||
self.bandwidth_left / self.links_num | ||
} | ||
|
||
fn link_proposition(&self) -> Bandwidth { | ||
self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add comments please
} | ||
|
||
fn link_proposition(&self) -> Bandwidth { | ||
self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of the second term? To assign more bandwidth to links earlier on the priority list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's to evenly distribute bandwidth_left
. Say bandwidth_left
is 13 and links_num
is 3, so the split should be (5,4,4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's to evenly distribute bandwidth_left. Say bandwidth_left is 13 and links_num is 3, so the split should be (5,4,4)
That was the original intention, I wanted to make sure that all of bandwidth_left
is distributed, even if it doesn't divide cleanly. But now that I think about it, it doesn't really make sense, the last division will always be by 1
, so there will be no undistributed bandwidth. Maybe there was a reason for it in a an earlier version? I don't remember 🤔
Good point! I'll remove the modulo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm removing the modulo slightly increased link imbalance in one of the tests (1.03 -> 1.05). I don't have any good explanation for why it happened, the slight changes by modulo shouldn't matter there. I guess it's just variance, I adjusted the test to allow the higher imbalance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick look and everything looks fine
} | ||
|
||
fn link_proposition(&self) -> Bandwidth { | ||
self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's to evenly distribute bandwidth_left
. Say bandwidth_left
is 13 and links_num
is 3, so the split should be (5,4,4)
pub fn distribute_remaining_bandwidth( | ||
sender_budgets: &ShardIndexMap<Bandwidth>, | ||
receiver_budgets: &ShardIndexMap<Bandwidth>, | ||
is_link_allowed: impl Fn(ShardIndex, ShardIndex) -> bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Wac but no hard preferences.
bandwidth_grants | ||
} | ||
|
||
struct Info { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How's the name BandwidthInfo
or BudgetInfo
? Fine with just Info
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about EndpointInfo
? It's information about one end of a shard link, either a sender or receiver, I think that could be called an endpoint?
The struct is used only in this function, so I didn't spend too much time thinking about a beautiful name.
After bandwidth scheduler processes bandwidth requests, there's usually some leftover budget for sending and receiving.
Let's grant more bandwidth to use up this remaining budgets, it's wasteful to not use them.
The algorithm for distributing remaining bandwidth is a bit magical, I don't have an easy way to explain why it works well, it was developed by intuition and trying different things. I can prove that it's safe and fast, but proving fairness and high utilization is much harder.
The intuition is that when a shard can send out X bytes of data and there are Y links on which things could be send, the shard should send about X/Y bytes of data on each link. But it can't just send out X/Y, it has to make sure that everything stays withing the receiver's budget. The sender and receiver both calculate how much they could send, and then the minimum of the two values is granted on the link.
Senders and receivers are sorted in the order of increasing budgets. Processing them in this order gives the guarantee that all senders processed later will send at least as much as the one being processed right now. This means that we can grant
remaining_bandwidth/remaining_senders
and be sure that utilization will be high.The algorithm is safe because it never grants more than
remaining_bandwidth
, which ensures that the grants stay under the budget.I don't have a super clear explanation for it, but I think the important thing is that it's safe and behaves well in practice. I ran 10 million randomized tests and the algorithm always achieved 99% bandwidth utilization when all links are allowed. When some links are not allowed the utilization is lower, but it still stays above 75%. Having disallowed links makes the problem much harder, it becomes more like a maximum flow/matching problem. I'd say that these results are good enough for the bandwidth scheduler.
This is the last feature that I'd like to add to the bandwidth scheduler before release.