Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bandwidth_scheduler) - distribute remaining bandwidth #12682

Merged
merged 10 commits into from
Jan 8, 2025

Conversation

jancionear
Copy link
Contributor

After bandwidth scheduler processes bandwidth requests, there's usually some leftover budget for sending and receiving.
Let's grant more bandwidth to use up this remaining budgets, it's wasteful to not use them.

The algorithm for distributing remaining bandwidth is a bit magical, I don't have an easy way to explain why it works well, it was developed by intuition and trying different things. I can prove that it's safe and fast, but proving fairness and high utilization is much harder.

The intuition is that when a shard can send out X bytes of data and there are Y links on which things could be send, the shard should send about X/Y bytes of data on each link. But it can't just send out X/Y, it has to make sure that everything stays withing the receiver's budget. The sender and receiver both calculate how much they could send, and then the minimum of the two values is granted on the link.
Senders and receivers are sorted in the order of increasing budgets. Processing them in this order gives the guarantee that all senders processed later will send at least as much as the one being processed right now. This means that we can grant remaining_bandwidth/remaining_senders and be sure that utilization will be high.

The algorithm is safe because it never grants more than remaining_bandwidth, which ensures that the grants stay under the budget.

I don't have a super clear explanation for it, but I think the important thing is that it's safe and behaves well in practice. I ran 10 million randomized tests and the algorithm always achieved 99% bandwidth utilization when all links are allowed. When some links are not allowed the utilization is lower, but it still stays above 75%. Having disallowed links makes the problem much harder, it becomes more like a maximum flow/matching problem. I'd say that these results are good enough for the bandwidth scheduler.

This is the last feature that I'd like to add to the bandwidth scheduler before release.

@jancionear jancionear requested a review from a team as a code owner January 3, 2025 16:29
Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 98.28767% with 5 lines in your changes missing coverage. Please review.

Project coverage is 70.63%. Comparing base (21b5109) to head (f0b29ef).
Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
...ntime/runtime/src/bandwidth_scheduler/scheduler.rs 90.90% 2 Missing and 2 partials ⚠️
...me/src/bandwidth_scheduler/distribute_remaining.rs 99.56% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12682      +/-   ##
==========================================
+ Coverage   70.56%   70.63%   +0.07%     
==========================================
  Files         847      848       +1     
  Lines      172735   173651     +916     
  Branches   172735   173651     +916     
==========================================
+ Hits       121897   122667     +770     
- Misses      45737    45855     +118     
- Partials     5101     5129      +28     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (?)
genesis-check 1.36% <0.00%> (-0.01%) ⬇️
linux 69.23% <97.94%> (-0.03%) ⬇️
linux-nightly 70.22% <98.28%> (+0.05%) ⬆️
pytests 1.66% <0.00%> (-0.01%) ⬇️
sanity-checks 1.47% <0.00%> (-0.01%) ⬇️
unittests 70.47% <98.28%> (+0.07%) ⬆️
upgradability 0.20% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though I don't think I fully understand it.

pub fn distribute_remaining_bandwidth(
sender_budgets: &ShardIndexMap<Bandwidth>,
receiver_budgets: &ShardIndexMap<Bandwidth>,
is_link_allowed: impl Fn(ShardIndex, ShardIndex) -> bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I consider using lambdas like this an anti pattern. This one is border-line because it's very localized and there is only one called. In my experience debugging any issues across the closure are really hard. Is there any other neat way to implement this logic? Perhaps you can define a trait here, have bandwidth scheduler implement it and pass it here? Or pull this logic into the the bandwidth scheduler?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Wac but no hard preferences.

Copy link
Contributor Author

@jancionear jancionear Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to is_link_allowed: ShardLinkMap<bool>. It'll probably be faster as well.
I used a lambda because that was the easiest way to do it, and I felt like it was clear enough.

I'd like to be able to write independent unit tests, so I'm trying to avoid tight integration with the rest of the bandwidth scheduler.

let mut sender_infos: ShardIndexMap<Info> = ShardIndexMap::new(shard_layout);
let mut receiver_infos: ShardIndexMap<Info> = ShardIndexMap::new(shard_layout);

for shard in shard_layout.shard_indexes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: shard_index

Comment on lines 88 to 104
struct Info {
bandwidth_left: Bandwidth,
links_num: u64,
}

impl Info {
fn average_link_bandwidth(&self) -> Bandwidth {
if self.links_num == 0 {
return 0;
}
self.bandwidth_left / self.links_num
}

fn link_proposition(&self) -> Bandwidth {
self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add comments please

}

fn link_proposition(&self) -> Bandwidth {
self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of the second term? To assign more bandwidth to links earlier on the priority list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's to evenly distribute bandwidth_left. Say bandwidth_left is 13 and links_num is 3, so the split should be (5,4,4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's to evenly distribute bandwidth_left. Say bandwidth_left is 13 and links_num is 3, so the split should be (5,4,4)

That was the original intention, I wanted to make sure that all of bandwidth_left is distributed, even if it doesn't divide cleanly. But now that I think about it, it doesn't really make sense, the last division will always be by 1, so there will be no undistributed bandwidth. Maybe there was a reason for it in a an earlier version? I don't remember 🤔

Good point! I'll remove the modulo.

Copy link
Contributor Author

@jancionear jancionear Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm removing the modulo slightly increased link imbalance in one of the tests (1.03 -> 1.05). I don't have any good explanation for why it happened, the slight changes by modulo shouldn't matter there. I guess it's just variance, I adjusted the test to allow the higher imbalance.

Copy link
Contributor

@shreyan-gupta shreyan-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look and everything looks fine

}

fn link_proposition(&self) -> Bandwidth {
self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's to evenly distribute bandwidth_left. Say bandwidth_left is 13 and links_num is 3, so the split should be (5,4,4)

pub fn distribute_remaining_bandwidth(
sender_budgets: &ShardIndexMap<Bandwidth>,
receiver_budgets: &ShardIndexMap<Bandwidth>,
is_link_allowed: impl Fn(ShardIndex, ShardIndex) -> bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Wac but no hard preferences.

bandwidth_grants
}

struct Info {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How's the name BandwidthInfo or BudgetInfo? Fine with just Info as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about EndpointInfo? It's information about one end of a shard link, either a sender or receiver, I think that could be called an endpoint?
The struct is used only in this function, so I didn't spend too much time thinking about a beautiful name.

@jancionear jancionear added this pull request to the merge queue Jan 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 8, 2025
@jancionear jancionear added this pull request to the merge queue Jan 8, 2025
Merged via the queue into master with commit 7e3d46c Jan 8, 2025
28 checks passed
@jancionear jancionear deleted the jan_distribute_remaining branch January 8, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants