feat(bandwidth_scheduler) - distribute remaining bandwidth #12682

jancionear · 2025-01-03T16:29:23Z

After bandwidth scheduler processes bandwidth requests, there's usually some leftover budget for sending and receiving.
Let's grant more bandwidth to use up this remaining budgets, it's wasteful to not use them.

The algorithm for distributing remaining bandwidth is a bit magical, I don't have an easy way to explain why it works well, it was developed by intuition and trying different things. I can prove that it's safe and fast, but proving fairness and high utilization is much harder.

The intuition is that when a shard can send out X bytes of data and there are Y links on which things could be send, the shard should send about X/Y bytes of data on each link. But it can't just send out X/Y, it has to make sure that everything stays withing the receiver's budget. The sender and receiver both calculate how much they could send, and then the minimum of the two values is granted on the link.
Senders and receivers are sorted in the order of increasing budgets. Processing them in this order gives the guarantee that all senders processed later will send at least as much as the one being processed right now. This means that we can grant remaining_bandwidth/remaining_senders and be sure that utilization will be high.

The algorithm is safe because it never grants more than remaining_bandwidth, which ensures that the grants stay under the budget.

I don't have a super clear explanation for it, but I think the important thing is that it's safe and behaves well in practice. I ran 10 million randomized tests and the algorithm always achieved 99% bandwidth utilization when all links are allowed. When some links are not allowed the utilization is lower, but it still stays above 75%. Having disallowed links makes the problem much harder, it becomes more like a maximum flow/matching problem. I'd say that these results are good enough for the bandwidth scheduler.

This is the last feature that I'd like to add to the bandwidth scheduler before release.

codecov · 2025-01-03T17:14:49Z

Codecov Report

Attention: Patch coverage is 98.28767% with 5 lines in your changes missing coverage. Please review.

Project coverage is 70.63%. Comparing base (21b5109) to head (f0b29ef).
Report is 17 commits behind head on master.

Files with missing lines	Patch %	Lines
...ntime/runtime/src/bandwidth_scheduler/scheduler.rs	90.90%	2 Missing and 2 partials ⚠️
...me/src/bandwidth_scheduler/distribute_remaining.rs	99.56%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #12682      +/-   ##
==========================================
+ Coverage   70.56%   70.63%   +0.07%     
==========================================
  Files         847      848       +1     
  Lines      172735   173651     +916     
  Branches   172735   173651     +916     
==========================================
+ Hits       121897   122667     +770     
- Misses      45737    45855     +118     
- Partials     5101     5129      +28

Flag	Coverage Δ
backward-compatibility	`0.16% <0.00%> (-0.01%)`	⬇️
db-migration	`0.16% <0.00%> (?)`
genesis-check	`1.36% <0.00%> (-0.01%)`	⬇️
linux	`69.23% <97.94%> (-0.03%)`	⬇️
linux-nightly	`70.22% <98.28%> (+0.05%)`	⬆️
pytests	`1.66% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.47% <0.00%> (-0.01%)`	⬇️
unittests	`70.47% <98.28%> (+0.07%)`	⬆️
upgradability	`0.20% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wacban

LGTM though I don't think I fully understand it.

wacban · 2025-01-06T12:26:08Z

runtime/runtime/src/bandwidth_scheduler/distribute_remaining.rs

+pub fn distribute_remaining_bandwidth(
+    sender_budgets: &ShardIndexMap<Bandwidth>,
+    receiver_budgets: &ShardIndexMap<Bandwidth>,
+    is_link_allowed: impl Fn(ShardIndex, ShardIndex) -> bool,


Personally I consider using lambdas like this an anti pattern. This one is border-line because it's very localized and there is only one called. In my experience debugging any issues across the closure are really hard. Is there any other neat way to implement this logic? Perhaps you can define a trait here, have bandwidth scheduler implement it and pass it here? Or pull this logic into the the bandwidth scheduler?

Agree with Wac but no hard preferences.

I can change it to is_link_allowed: ShardLinkMap<bool>. It'll probably be faster as well.
I used a lambda because that was the easiest way to do it, and I felt like it was clear enough.

I'd like to be able to write independent unit tests, so I'm trying to avoid tight integration with the rest of the bandwidth scheduler.

wacban · 2025-01-06T12:26:42Z

runtime/runtime/src/bandwidth_scheduler/distribute_remaining.rs

+    let mut sender_infos: ShardIndexMap<Info> = ShardIndexMap::new(shard_layout);
+    let mut receiver_infos: ShardIndexMap<Info> = ShardIndexMap::new(shard_layout);
+
+    for shard in shard_layout.shard_indexes() {


mini nit: shard_index

wacban · 2025-01-06T12:31:09Z

runtime/runtime/src/bandwidth_scheduler/distribute_remaining.rs

+struct Info {
+    bandwidth_left: Bandwidth,
+    links_num: u64,
+}
+
+impl Info {
+    fn average_link_bandwidth(&self) -> Bandwidth {
+        if self.links_num == 0 {
+            return 0;
+        }
+        self.bandwidth_left / self.links_num
+    }
+
+    fn link_proposition(&self) -> Bandwidth {
+        self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num
+    }
+}


nit: Add comments please

wacban · 2025-01-06T12:37:25Z

runtime/runtime/src/bandwidth_scheduler/distribute_remaining.rs

+    }
+
+    fn link_proposition(&self) -> Bandwidth {
+        self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num


What is the point of the second term? To assign more bandwidth to links earlier on the priority list?

I think it's to evenly distribute bandwidth_left. Say bandwidth_left is 13 and links_num is 3, so the split should be (5,4,4)

I think it's to evenly distribute bandwidth_left. Say bandwidth_left is 13 and links_num is 3, so the split should be (5,4,4)

That was the original intention, I wanted to make sure that all of bandwidth_left is distributed, even if it doesn't divide cleanly. But now that I think about it, it doesn't really make sense, the last division will always be by 1, so there will be no undistributed bandwidth. Maybe there was a reason for it in a an earlier version? I don't remember 🤔

Good point! I'll remove the modulo.

Hmm removing the modulo slightly increased link imbalance in one of the tests (1.03 -> 1.05). I don't have any good explanation for why it happened, the slight changes by modulo shouldn't matter there. I guess it's just variance, I adjusted the test to allow the higher imbalance.

shreyan-gupta

Took a quick look and everything looks fine

shreyan-gupta · 2025-01-07T09:45:58Z

runtime/runtime/src/bandwidth_scheduler/distribute_remaining.rs

+    }
+
+    fn link_proposition(&self) -> Bandwidth {
+        self.bandwidth_left / self.links_num + self.bandwidth_left % self.links_num


I think it's to evenly distribute bandwidth_left. Say bandwidth_left is 13 and links_num is 3, so the split should be (5,4,4)

shreyan-gupta · 2025-01-07T09:47:52Z

runtime/runtime/src/bandwidth_scheduler/distribute_remaining.rs

+pub fn distribute_remaining_bandwidth(
+    sender_budgets: &ShardIndexMap<Bandwidth>,
+    receiver_budgets: &ShardIndexMap<Bandwidth>,
+    is_link_allowed: impl Fn(ShardIndex, ShardIndex) -> bool,


Agree with Wac but no hard preferences.

shreyan-gupta · 2025-01-07T09:49:17Z

runtime/runtime/src/bandwidth_scheduler/distribute_remaining.rs

+    bandwidth_grants
+}
+
+struct Info {


nit: How's the name BandwidthInfo or BudgetInfo? Fine with just Info as well

How about EndpointInfo? It's information about one end of a shard link, either a sender or receiver, I think that could be called an endpoint?
The struct is used only in this function, so I didn't spend too much time thinking about a beautiful name.

…test

jancionear added 3 commits January 3, 2025 14:50

Refactor estimate_link_throughputs

9e079ec

Distribute remaining bandwidth

56d40e2

Update worst-case scheduler performance

a2e251e

jancionear requested review from wacban and shreyan-gupta January 3, 2025 16:29

jancionear requested a review from a team as a code owner January 3, 2025 16:29

wacban approved these changes Jan 6, 2025

View reviewed changes

shreyan-gupta approved these changes Jan 7, 2025

View reviewed changes

jancionear added 7 commits January 7, 2025 15:28

shard -> shard_index

fcecf5d

Simplify link_proposition, remove modulo

323627a

Replace is_link_allowed lambda with a ShardLinkMap

70e280c

Rename Info to EndpointInfo

79b9636

Add comments in EndpointInfo

96631f0

Check that bandwidth is not granted on forbidden links in randomized_…

93002d3

…test

Update worst-case performance

f0b29ef

jancionear added this pull request to the merge queue Jan 8, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 8, 2025

jancionear added this pull request to the merge queue Jan 8, 2025

Merged via the queue into master with commit 7e3d46c Jan 8, 2025
28 checks passed

jancionear deleted the jan_distribute_remaining branch January 8, 2025 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bandwidth_scheduler) - distribute remaining bandwidth #12682

feat(bandwidth_scheduler) - distribute remaining bandwidth #12682

jancionear commented Jan 3, 2025

codecov bot commented Jan 3, 2025 •

edited

Loading

wacban left a comment

wacban Jan 6, 2025

shreyan-gupta Jan 7, 2025

jancionear Jan 7, 2025 •

edited

Loading

wacban Jan 6, 2025

wacban Jan 6, 2025

wacban Jan 6, 2025

shreyan-gupta Jan 7, 2025

jancionear Jan 7, 2025

jancionear Jan 7, 2025 •

edited

Loading

shreyan-gupta left a comment

shreyan-gupta Jan 7, 2025

shreyan-gupta Jan 7, 2025

shreyan-gupta Jan 7, 2025

jancionear Jan 7, 2025

feat(bandwidth_scheduler) - distribute remaining bandwidth #12682

feat(bandwidth_scheduler) - distribute remaining bandwidth #12682

Conversation

jancionear commented Jan 3, 2025

codecov bot commented Jan 3, 2025 • edited Loading

Codecov Report

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

shreyan-gupta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2025 •

edited

Loading

jancionear Jan 7, 2025 •

edited

Loading

jancionear Jan 7, 2025 •

edited

Loading