-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - use try_send to replace send.await, unbounded channel should always b… #7745
Conversation
…e sendable, this improves performance
Looks good: are there any measurable changes on |
Makes sense to me. This synchronously attempts to push onto the internal ConcurrentQueue without yielding to the async executor. This would put contention only on the channel and not the executor. Nice work investigating this! This does make the task now a fully synchronous one: it never yields. There may be some headroom to cut in looking for a way to mix synchronous and asynchronous tasks. This likely doesn't have much of an impact on our current examples, as the slowdown is likely contention dependent, but this will change as we add more systems to the engine. |
Here is a bench I just did from my game win box. Here is code how to print these pct values. https://github.com/shuoli84/bevy/blob/executor_channel_send_opt_bench/examples/animation/animated_fox.rs#L143 Pr:
2023-02-20T02:40:26.324662Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Windows 11 Pro", kernel: "22621", cpu: "Intel(R) Core(TM) i9-9900KF CPU @ 3.60GHz", core_count: "8", memory: "15.9 GiB" }
2023-02-20T02:40:26.342248Z INFO bevy_input::gamepad: Gamepad { id: 0 } Connected
2023-02-20T02:40:26.437791Z INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:40:26.438019Z INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:40:26.438227Z INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:40:26.438567Z INFO naga::back::spv::writer: Skip function Some("skin_model")
2023-02-20T02:40:26.690288Z INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:40:26.690418Z INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:40:26.690631Z INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:40:31.342537Z INFO animated_fox::diagnostic: min:531us p50:1149us p90:2863us max:7079us
2023-02-20T02:40:36.348494Z INFO animated_fox::diagnostic: min:546us p50:1192us p90:2790us max:7709us
2023-02-20T02:40:41.349897Z INFO animated_fox::diagnostic: min:618us p50:1160us p90:2884us max:7431us
2023-02-20T02:40:46.353189Z INFO animated_fox::diagnostic: min:564us p50:1291us p90:2803us max:7010us
2023-02-20T02:40:51.357689Z INFO animated_fox::diagnostic: min:615us p50:1220us p90:2820us max:7082us
2023-02-20T02:40:56.357926Z INFO animated_fox::diagnostic: min:512us p50:1185us p90:2770us max:7269us
2023-02-20T02:41:01.364985Z INFO animated_fox::diagnostic: min:562us p50:1136us p90:2808us max:7297us
2023-02-20T02:41:06.365066Z INFO animated_fox::diagnostic: min:599us p50:1142us p90:2670us max:7168us
2023-02-20T02:41:11.367200Z INFO animated_fox::diagnostic: min:607us p50:1320us p90:2792us max:7316us
2023-02-20T02:41:16.368412Z INFO animated_fox::diagnostic: min:591us p50:1142us p90:2777us max:7399us
2023-02-20T02:41:21.368891Z INFO animated_fox::diagnostic: min:497us p50:1186us p90:2824us max:6994us
2023-02-20T02:41:26.373178Z INFO animated_fox::diagnostic: min:585us p50:1164us p90:2696us max:7166us
2023-02-20T02:41:31.373573Z INFO animated_fox::diagnostic: min:554us p50:1150us p90:2709us max:7335us
2023-02-20T02:41:36.373897Z INFO animated_fox::diagnostic: min:630us p50:1376us p90:2839us max:6938us
2023-02-20T02:41:41.374206Z INFO animated_fox::diagnostic: min:537us p50:1189us p90:2797us max:7133us
2023-02-20T02:41:46.374365Z INFO animated_fox::diagnostic: min:539us p50:1179us p90:2761us max:6970us
2023-02-20T02:41:51.374704Z INFO animated_fox::diagnostic: min:449us p50:772us p90:1836us max:7043us
Baseline:
2023-02-20T02:45:27.962135Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Windows 11 Pro", kernel: "22621", cpu: "Intel(R) Core(TM) i9-9900KF CPU @ 3.60GHz", core_count: "8", memory: "15.9 GiB" }
2023-02-20T02:45:27.977233Z INFO bevy_input::gamepad: Gamepad { id: 0 } Connected
2023-02-20T02:45:28.014561Z INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:45:28.014794Z INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:45:28.015004Z INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:45:28.028847Z INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:45:28.029065Z INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:45:28.029261Z INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:45:28.029582Z INFO naga::back::spv::writer: Skip function Some("skin_model")
2023-02-20T02:45:32.977173Z INFO animated_fox::diagnostic: min:608us p50:1192us p90:2877us max:7172us
2023-02-20T02:45:37.980240Z INFO animated_fox::diagnostic: min:623us p50:1254us p90:2841us max:6966us
2023-02-20T02:45:42.980550Z INFO animated_fox::diagnostic: min:597us p50:1461us p90:2887us max:7528us
2023-02-20T02:45:47.985037Z INFO animated_fox::diagnostic: min:560us p50:1286us p90:2841us max:7105us
2023-02-20T02:45:52.990842Z INFO animated_fox::diagnostic: min:599us p50:1240us p90:2806us max:7415us
2023-02-20T02:45:57.998088Z INFO animated_fox::diagnostic: min:666us p50:1366us p90:2895us max:7102us
2023-02-20T02:46:03.002441Z INFO animated_fox::diagnostic: min:558us p50:1419us p90:2862us max:7019us
2023-02-20T02:46:08.002955Z INFO animated_fox::diagnostic: min:638us p50:1490us p90:2871us max:6909us
2023-02-20T02:46:13.010620Z INFO animated_fox::diagnostic: min:636us p50:1382us p90:2939us max:7283us
2023-02-20T02:46:18.014608Z INFO animated_fox::diagnostic: min:545us p50:1345us p90:2854us max:7284us
2023-02-20T02:46:23.017687Z INFO animated_fox::diagnostic: min:670us p50:1424us p90:2832us max:6996us
2023-02-20T02:46:28.022218Z INFO animated_fox::diagnostic: min:667us p50:1555us p90:2832us max:7142us
2023-02-20T02:46:33.023185Z INFO animated_fox::diagnostic: min:624us p50:1565us p90:2974us max:7097us
2023-02-20T02:46:38.025517Z INFO animated_fox::diagnostic: min:605us p50:1687us p90:2900us max:6848us
2023-02-20T02:46:43.026081Z INFO animated_fox::diagnostic: min:671us p50:1358us p90:2846us max:6832us
2023-02-20T02:46:48.026145Z INFO animated_fox::diagnostic: min:664us p50:1326us p90:2822us max:7321us
2023-02-20T02:46:53.026370Z INFO animated_fox::diagnostic: min:557us p50:1374us p90:2911us max:7007us
2023-02-20T02:46:58.026779Z INFO animated_fox::diagnostic: min:653us p50:1587us p90:2878us max:7326us
2023-02-20T02:47:03.033789Z INFO animated_fox::diagnostic: min:524us p50:1297us p90:2806us max:7150us |
For above bench, eyeball compare, seems pct50 is better, pct90 is slightly better, max kinda same? |
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but I'd like to get another expert's opinion here before merging.
ping~ the change is safe, try_send fails if channel is full or closed, but unbounded channel never full as stated in https://docs.rs/async-channel/latest/async_channel/struct.Sender.html#method.is_full. |
A thought came up: this used to be a bounded channel to minimize allocator use by said channel during the course of using the executor, which is why the await is there, but it seems like that might have disappeared during stageless. Might be worth testing if bringing back that bounded channel (with a capacity based on the total number of systems) will work with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change makes sense to me. I see a smaller change but noticable change on empty_systems. busy_systems, contrived, and many_foxes are all within noise for me.
group main-for-try-send try-send
----- ----------------- --------
busy_systems/01x_entities_03_systems 1.00 30.2±1.08µs ? ?/sec 1.01 30.5±0.80µs ? ?/sec
busy_systems/01x_entities_06_systems 1.09 47.3±0.77µs ? ?/sec 1.00 43.2±0.88µs ? ?/sec
busy_systems/01x_entities_09_systems 1.02 65.9±1.96µs ? ?/sec 1.00 64.5±0.73µs ? ?/sec
busy_systems/01x_entities_12_systems 1.02 82.3±1.60µs ? ?/sec 1.00 81.1±1.10µs ? ?/sec
busy_systems/01x_entities_15_systems 1.00 99.4±3.30µs ? ?/sec 1.00 99.7±6.23µs ? ?/sec
busy_systems/02x_entities_03_systems 1.01 44.8±1.38µs ? ?/sec 1.00 44.5±1.02µs ? ?/sec
busy_systems/02x_entities_06_systems 1.00 74.7±1.35µs ? ?/sec 1.01 75.1±3.72µs ? ?/sec
busy_systems/02x_entities_09_systems 1.00 105.2±2.06µs ? ?/sec 1.01 106.1±8.94µs ? ?/sec
busy_systems/02x_entities_12_systems 1.01 132.1±2.66µs ? ?/sec 1.00 131.0±2.88µs ? ?/sec
busy_systems/02x_entities_15_systems 1.00 160.2±1.54µs ? ?/sec 1.00 159.4±2.79µs ? ?/sec
busy_systems/03x_entities_03_systems 1.00 57.7±0.85µs ? ?/sec 1.03 59.2±1.51µs ? ?/sec
busy_systems/03x_entities_06_systems 1.00 101.8±1.40µs ? ?/sec 1.03 105.2±1.40µs ? ?/sec
busy_systems/03x_entities_09_systems 1.02 149.5±7.04µs ? ?/sec 1.00 146.2±3.83µs ? ?/sec
busy_systems/03x_entities_12_systems 1.00 181.6±2.73µs ? ?/sec 1.02 185.8±4.07µs ? ?/sec
busy_systems/03x_entities_15_systems 1.00 226.7±13.24µs ? ?/sec 1.01 228.3±3.48µs ? ?/sec
busy_systems/04x_entities_03_systems 1.00 70.1±1.22µs ? ?/sec 1.02 71.6±1.40µs ? ?/sec
busy_systems/04x_entities_06_systems 1.01 128.7±4.14µs ? ?/sec 1.00 127.2±5.93µs ? ?/sec
busy_systems/04x_entities_09_systems 1.00 178.7±2.76µs ? ?/sec 1.04 185.5±7.96µs ? ?/sec
busy_systems/04x_entities_12_systems 1.00 232.6±7.49µs ? ?/sec 1.02 237.8±4.35µs ? ?/sec
busy_systems/04x_entities_15_systems 1.00 289.0±11.74µs ? ?/sec 1.03 296.3±8.86µs ? ?/sec
busy_systems/05x_entities_03_systems 1.04 83.3±1.28µs ? ?/sec 1.00 80.5±2.34µs ? ?/sec
busy_systems/05x_entities_06_systems 1.05 154.8±2.74µs ? ?/sec 1.00 146.9±1.35µs ? ?/sec
busy_systems/05x_entities_09_systems 1.05 219.0±4.51µs ? ?/sec 1.00 208.6±3.86µs ? ?/sec
busy_systems/05x_entities_12_systems 1.05 286.1±8.59µs ? ?/sec 1.00 271.9±5.12µs ? ?/sec
busy_systems/05x_entities_15_systems 1.01 354.1±8.31µs ? ?/sec 1.00 351.4±11.61µs ? ?/sec
contrived/01x_entities_03_systems 1.04 27.1±1.07µs ? ?/sec 1.00 26.0±0.93µs ? ?/sec
contrived/01x_entities_06_systems 1.08 36.5±0.58µs ? ?/sec 1.00 33.9±0.97µs ? ?/sec
contrived/01x_entities_09_systems 1.03 48.2±1.86µs ? ?/sec 1.00 46.8±0.95µs ? ?/sec
contrived/01x_entities_12_systems 1.01 57.8±0.96µs ? ?/sec 1.00 57.0±1.22µs ? ?/sec
contrived/01x_entities_15_systems 1.06 71.5±8.13µs ? ?/sec 1.00 67.8±1.87µs ? ?/sec
contrived/02x_entities_03_systems 1.06 32.3±0.92µs ? ?/sec 1.00 30.4±0.91µs ? ?/sec
contrived/02x_entities_06_systems 1.00 51.1±1.04µs ? ?/sec 1.04 53.4±4.80µs ? ?/sec
contrived/02x_entities_09_systems 1.06 69.8±1.92µs ? ?/sec 1.00 65.8±1.65µs ? ?/sec
contrived/02x_entities_12_systems 1.01 82.3±3.38µs ? ?/sec 1.00 81.7±0.82µs ? ?/sec
contrived/02x_entities_15_systems 1.02 98.9±1.89µs ? ?/sec 1.00 97.0±0.93µs ? ?/sec
contrived/03x_entities_03_systems 1.00 38.2±0.74µs ? ?/sec 1.01 38.5±1.64µs ? ?/sec
contrived/03x_entities_06_systems 1.00 58.4±0.71µs ? ?/sec 1.05 61.4±1.25µs ? ?/sec
contrived/03x_entities_09_systems 1.00 80.4±1.01µs ? ?/sec 1.04 83.3±1.95µs ? ?/sec
contrived/03x_entities_12_systems 1.00 105.2±2.57µs ? ?/sec 1.01 105.8±2.45µs ? ?/sec
contrived/03x_entities_15_systems 1.00 127.0±3.00µs ? ?/sec 1.01 127.8±5.86µs ? ?/sec
contrived/04x_entities_03_systems 1.06 48.9±2.82µs ? ?/sec 1.00 45.9±0.97µs ? ?/sec
contrived/04x_entities_06_systems 1.03 74.9±1.93µs ? ?/sec 1.00 72.6±1.86µs ? ?/sec
contrived/04x_entities_09_systems 1.07 107.1±6.36µs ? ?/sec 1.00 100.3±2.62µs ? ?/sec
contrived/04x_entities_12_systems 1.00 129.1±3.09µs ? ?/sec 1.02 132.2±7.67µs ? ?/sec
contrived/04x_entities_15_systems 1.00 154.8±4.68µs ? ?/sec 1.01 155.9±3.50µs ? ?/sec
contrived/05x_entities_03_systems 1.02 52.6±1.96µs ? ?/sec 1.00 51.6±1.06µs ? ?/sec
contrived/05x_entities_06_systems 1.06 85.8±5.89µs ? ?/sec 1.00 80.9±0.84µs ? ?/sec
contrived/05x_entities_09_systems 1.04 117.1±4.09µs ? ?/sec 1.00 112.4±2.93µs ? ?/sec
contrived/05x_entities_12_systems 1.00 151.0±3.90µs ? ?/sec 1.05 158.1±10.09µs ? ?/sec
contrived/05x_entities_15_systems 1.00 181.1±7.14µs ? ?/sec 1.02 185.1±9.66µs ? ?/sec
empty_systems/000_systems 1.00 4.9±0.21ns ? ?/sec 1.05 5.2±0.29ns ? ?/sec
empty_systems/001_systems 1.00 10.1±0.40µs ? ?/sec 1.03 10.5±0.44µs ? ?/sec
empty_systems/002_systems 1.00 14.4±0.82µs ? ?/sec 1.00 14.3±0.46µs ? ?/sec
empty_systems/003_systems 1.00 15.4±0.23µs ? ?/sec 1.01 15.6±0.75µs ? ?/sec
empty_systems/004_systems 1.01 15.5±0.17µs ? ?/sec 1.00 15.4±0.33µs ? ?/sec
empty_systems/005_systems 1.03 16.0±0.39µs ? ?/sec 1.00 15.5±0.27µs ? ?/sec
empty_systems/010_systems 1.01 17.5±0.30µs ? ?/sec 1.00 17.4±0.50µs ? ?/sec
empty_systems/015_systems 1.05 22.4±0.39µs ? ?/sec 1.00 21.3±0.53µs ? ?/sec
empty_systems/020_systems 1.06 24.3±0.33µs ? ?/sec 1.00 23.0±0.39µs ? ?/sec
empty_systems/025_systems 1.01 26.6±0.79µs ? ?/sec 1.00 26.4±0.80µs ? ?/sec
empty_systems/030_systems 1.04 29.2±0.62µs ? ?/sec 1.00 28.1±0.66µs ? ?/sec
empty_systems/035_systems 1.03 31.4±0.67µs ? ?/sec 1.00 30.6±0.57µs ? ?/sec
empty_systems/040_systems 1.04 34.5±0.66µs ? ?/sec 1.00 33.2±0.68µs ? ?/sec
empty_systems/045_systems 1.03 36.9±0.62µs ? ?/sec 1.00 35.8±0.49µs ? ?/sec
empty_systems/050_systems 1.02 38.6±1.02µs ? ?/sec 1.00 37.7±0.96µs ? ?/sec
empty_systems/055_systems 1.00 41.0±0.75µs ? ?/sec 1.00 41.1±0.89µs ? ?/sec
empty_systems/060_systems 1.05 45.5±1.76µs ? ?/sec 1.00 43.5±1.06µs ? ?/sec
empty_systems/065_systems 1.05 47.7±0.61µs ? ?/sec 1.00 45.5±1.12µs ? ?/sec
empty_systems/070_systems 1.04 50.0±0.74µs ? ?/sec 1.00 47.9±1.08µs ? ?/sec
empty_systems/075_systems 1.04 52.4±0.53µs ? ?/sec 1.00 50.5±0.70µs ? ?/sec
empty_systems/080_systems 1.11 58.6±0.99µs ? ?/sec 1.00 53.0±0.63µs ? ?/sec
empty_systems/085_systems 1.06 60.1±0.78µs ? ?/sec 1.00 56.9±0.71µs ? ?/sec
empty_systems/090_systems 1.07 63.9±1.54µs ? ?/sec 1.00 59.9±1.50µs ? ?/sec
empty_systems/095_systems 1.07 65.5±0.81µs ? ?/sec 1.00 61.5±0.54µs ? ?/sec
empty_systems/100_systems 1.06 68.6±1.49µs ? ?/sec 1.00 64.6±0.93µs ? ?/sec
bors r+ |
#7745) …e sendable, this improves performance # Objective - From prev experience, `.await` is not free, also I did a profiling a half year ago, bevy's multithread executor spend lots cycles on ArcWaker. ## Solution - this pr replace `sender.send().await` to `sender.try_send()` to cut some future/await cost. benchmarked on `empty system` ```bash ➜ critcmp send_base send_optimize group send_base send_optimize ----- --------- ------------- empty_systems/000_systems 1.01 2.8±0.03ns ? ?/sec 1.00 2.8±0.02ns ? ?/sec empty_systems/001_systems 1.00 5.9±0.21µs ? ?/sec 1.01 5.9±0.23µs ? ?/sec empty_systems/002_systems 1.03 6.4±0.26µs ? ?/sec 1.00 6.2±0.19µs ? ?/sec empty_systems/003_systems 1.01 6.5±0.17µs ? ?/sec 1.00 6.4±0.20µs ? ?/sec empty_systems/004_systems 1.03 7.0±0.24µs ? ?/sec 1.00 6.8±0.18µs ? ?/sec empty_systems/005_systems 1.04 7.4±0.35µs ? ?/sec 1.00 7.2±0.21µs ? ?/sec empty_systems/010_systems 1.00 9.0±0.28µs ? ?/sec 1.00 9.1±0.80µs ? ?/sec empty_systems/015_systems 1.01 10.9±0.36µs ? ?/sec 1.00 10.8±1.29µs ? ?/sec empty_systems/020_systems 1.12 12.7±0.67µs ? ?/sec 1.00 11.3±0.37µs ? ?/sec empty_systems/025_systems 1.12 14.6±0.39µs ? ?/sec 1.00 13.0±1.02µs ? ?/sec empty_systems/030_systems 1.12 16.2±0.39µs ? ?/sec 1.00 14.4±0.37µs ? ?/sec empty_systems/035_systems 1.19 18.2±0.97µs ? ?/sec 1.00 15.3±0.48µs ? ?/sec empty_systems/040_systems 1.12 20.6±0.58µs ? ?/sec 1.00 18.3±1.87µs ? ?/sec empty_systems/045_systems 1.18 22.7±0.57µs ? ?/sec 1.00 19.2±0.46µs ? ?/sec empty_systems/050_systems 1.03 21.9±0.92µs ? ?/sec 1.00 21.3±0.96µs ? ?/sec empty_systems/055_systems 1.13 25.7±1.00µs ? ?/sec 1.00 22.8±0.50µs ? ?/sec empty_systems/060_systems 1.35 30.0±2.57µs ? ?/sec 1.00 22.2±1.04µs ? ?/sec empty_systems/065_systems 1.28 31.7±0.76µs ? ?/sec 1.00 24.8±0.79µs ? ?/sec empty_systems/070_systems 1.33 36.8±10.37µs ? ?/sec 1.00 27.6±0.55µs ? ?/sec empty_systems/075_systems 1.25 38.0±0.83µs ? ?/sec 1.00 30.3±0.63µs ? ?/sec empty_systems/080_systems 1.33 41.7±1.22µs ? ?/sec 1.00 31.4±1.01µs ? ?/sec empty_systems/085_systems 1.27 45.6±2.54µs ? ?/sec 1.00 35.8±4.06µs ? ?/sec empty_systems/090_systems 1.29 48.3±5.33µs ? ?/sec 1.00 37.6±5.32µs ? ?/sec empty_systems/095_systems 1.16 45.7±0.97µs ? ?/sec 1.00 39.4±2.75µs ? ?/sec empty_systems/100_systems 1.14 49.5±4.26µs ? ?/sec 1.00 43.5±1.06µs ? ?/sec ```
Pull request successfully merged into main. Build succeeded:
|
# Objective This is a follow-up to #7745. An unbounded `async_channel` occasionally allocates whenever it exceeds the capacity of the current buffer in it's internal linked list. This is avoidable. This also used to be a bounded channel before stageless, which was introduced in #4919. ## Solution Use a bounded channel to avoid allocations on system completion. This shouldn't conflict with #7745, as it's impossible for the scheduler to exceed the channel capacity, even if somehow every system completed at the same time.
# Objective This is a follow-up to #7745. An unbounded `async_channel` occasionally allocates whenever it exceeds the capacity of the current buffer in it's internal linked list. This is avoidable. This also used to be a bounded channel before stageless, which was introduced in #4919. ## Solution Use a bounded channel to avoid allocations on system completion. This shouldn't conflict with #7745, as it's impossible for the scheduler to exceed the channel capacity, even if somehow every system completed at the same time.
bevyengine#7745) …e sendable, this improves performance # Objective - From prev experience, `.await` is not free, also I did a profiling a half year ago, bevy's multithread executor spend lots cycles on ArcWaker. ## Solution - this pr replace `sender.send().await` to `sender.try_send()` to cut some future/await cost. benchmarked on `empty system` ```bash ➜ critcmp send_base send_optimize group send_base send_optimize ----- --------- ------------- empty_systems/000_systems 1.01 2.8±0.03ns ? ?/sec 1.00 2.8±0.02ns ? ?/sec empty_systems/001_systems 1.00 5.9±0.21µs ? ?/sec 1.01 5.9±0.23µs ? ?/sec empty_systems/002_systems 1.03 6.4±0.26µs ? ?/sec 1.00 6.2±0.19µs ? ?/sec empty_systems/003_systems 1.01 6.5±0.17µs ? ?/sec 1.00 6.4±0.20µs ? ?/sec empty_systems/004_systems 1.03 7.0±0.24µs ? ?/sec 1.00 6.8±0.18µs ? ?/sec empty_systems/005_systems 1.04 7.4±0.35µs ? ?/sec 1.00 7.2±0.21µs ? ?/sec empty_systems/010_systems 1.00 9.0±0.28µs ? ?/sec 1.00 9.1±0.80µs ? ?/sec empty_systems/015_systems 1.01 10.9±0.36µs ? ?/sec 1.00 10.8±1.29µs ? ?/sec empty_systems/020_systems 1.12 12.7±0.67µs ? ?/sec 1.00 11.3±0.37µs ? ?/sec empty_systems/025_systems 1.12 14.6±0.39µs ? ?/sec 1.00 13.0±1.02µs ? ?/sec empty_systems/030_systems 1.12 16.2±0.39µs ? ?/sec 1.00 14.4±0.37µs ? ?/sec empty_systems/035_systems 1.19 18.2±0.97µs ? ?/sec 1.00 15.3±0.48µs ? ?/sec empty_systems/040_systems 1.12 20.6±0.58µs ? ?/sec 1.00 18.3±1.87µs ? ?/sec empty_systems/045_systems 1.18 22.7±0.57µs ? ?/sec 1.00 19.2±0.46µs ? ?/sec empty_systems/050_systems 1.03 21.9±0.92µs ? ?/sec 1.00 21.3±0.96µs ? ?/sec empty_systems/055_systems 1.13 25.7±1.00µs ? ?/sec 1.00 22.8±0.50µs ? ?/sec empty_systems/060_systems 1.35 30.0±2.57µs ? ?/sec 1.00 22.2±1.04µs ? ?/sec empty_systems/065_systems 1.28 31.7±0.76µs ? ?/sec 1.00 24.8±0.79µs ? ?/sec empty_systems/070_systems 1.33 36.8±10.37µs ? ?/sec 1.00 27.6±0.55µs ? ?/sec empty_systems/075_systems 1.25 38.0±0.83µs ? ?/sec 1.00 30.3±0.63µs ? ?/sec empty_systems/080_systems 1.33 41.7±1.22µs ? ?/sec 1.00 31.4±1.01µs ? ?/sec empty_systems/085_systems 1.27 45.6±2.54µs ? ?/sec 1.00 35.8±4.06µs ? ?/sec empty_systems/090_systems 1.29 48.3±5.33µs ? ?/sec 1.00 37.6±5.32µs ? ?/sec empty_systems/095_systems 1.16 45.7±0.97µs ? ?/sec 1.00 39.4±2.75µs ? ?/sec empty_systems/100_systems 1.14 49.5±4.26µs ? ?/sec 1.00 43.5±1.06µs ? ?/sec ```
# Objective This is a follow-up to bevyengine#7745. An unbounded `async_channel` occasionally allocates whenever it exceeds the capacity of the current buffer in it's internal linked list. This is avoidable. This also used to be a bounded channel before stageless, which was introduced in bevyengine#4919. ## Solution Use a bounded channel to avoid allocations on system completion. This shouldn't conflict with bevyengine#7745, as it's impossible for the scheduler to exceed the channel capacity, even if somehow every system completed at the same time.
…e sendable, this improves performance
Objective
.await
is not free, also I did a profiling a half year ago, bevy's multithread executor spend lots cycles on ArcWaker.Solution
sender.send().await
tosender.try_send()
to cut some future/await cost.benchmarked on
empty system