-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing inline annotations to Cell #48905
Conversation
Were seeing some odd performance problems when using incremental compilation where `Rc` pointers were actually slower than `Arc` pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent). ``` test vector_push_back ... bench: 11,668,015 ns/iter (+/- 772,861) test vector_push_back_mut ... bench: 1,423,771 ns/iter (+/- 22,011) test vector_push_back_mut_rc ... bench: 1,181,765 ns/iter (+/- 123,724) test vector_push_back_rc ... bench: 17,141,746 ns/iter (+/- 203,048) ``` (Source and non incremental benchmarks orium/rpds#7 (comment))
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
@bors try Trying to get a compiled libstd, and going to compare the benchmark result with the un-inlined version as wanted in #42716 (comment). |
Add missing inline annotations to Cell Were seeing some odd performance problems when using incremental compilation where `Rc` pointers were actually slower than `Arc` pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent). ``` test vector_push_back ... bench: 11,668,015 ns/iter (+/- 772,861) test vector_push_back_mut ... bench: 1,423,771 ns/iter (+/- 22,011) test vector_push_back_mut_rc ... bench: 1,181,765 ns/iter (+/- 123,724) test vector_push_back_rc ... bench: 17,141,746 ns/iter (+/- 203,048) ``` (Source and non incremental benchmarks orium/rpds#7 (comment))
☀️ Test successful - status-travis |
Benchmark result based on Marwes/rpds@e482d5a, first one is without $ cargo +87344aa59af2ebb868253228e2b558d701573dff bench vector
...
test vector_drop_last ... bench: 10,546,095 ns/iter (+/- 35,342)
test vector_drop_last_mut ... bench: 768,934 ns/iter (+/- 2,609)
test vector_get ... bench: 322,871 ns/iter (+/- 621)
test vector_iterate ... bench: 144,636 ns/iter (+/- 2,131)
test vector_push_back ... bench: 11,965,770 ns/iter (+/- 100,901)
test vector_push_back_mut ... bench: 1,576,195 ns/iter (+/- 41,488)
test vector_push_back_mut_rc ... bench: 1,371,017 ns/iter (+/- 42,805)
test vector_push_back_rc ... bench: 7,474,717 ns/iter (+/- 120,604)
...
$ cargo +3a1fd611fc2ad3085a9298b46a85cd055724c45e bench vector
...
test vector_drop_last ... bench: 10,771,358 ns/iter (+/- 19,155)
test vector_drop_last_mut ... bench: 777,418 ns/iter (+/- 28,679)
test vector_get ... bench: 322,735 ns/iter (+/- 713)
test vector_iterate ... bench: 144,504 ns/iter (+/- 568)
test vector_push_back ... bench: 12,188,718 ns/iter (+/- 51,151)
test vector_push_back_mut ... bench: 1,567,152 ns/iter (+/- 34,553)
test vector_push_back_mut_rc ... bench: 1,367,240 ns/iter (+/- 39,629)
test vector_push_back_rc ... bench: 7,471,407 ns/iter (+/- 32,327)
... I cannot see any significant change in timing. Also, contrary to OP's measurement, the You may install the toolchains yourself to check:
|
Getting an error when trying to download the artifact unfortunately (guess it does not include windows binaries, linux subsystem does not work either since I haven't managed to get openssl installed in it...).
@kennytm Did you run with If this slowdown is due to these missing inline annotations and that is acceptable, then I'd assume that we should remove the annotations from the other functions instead to keep things consistent? |
@Marwes Unfortunately, only 64-bit Linux build is available before merging 😞 You can force a Linux build with:
I've retried using
So there is 7% improvement in the |
:/ The only other real difference is that Line 1349 in 2f0e6a3
|
I've checked debug build (opt-level=0) as well, no improvement is seen, because
I'm now going to defer this to someone from the libs team to decide whether to merge or not. r? @BurntSushi Summary: 7% improvement in release mode with incremental. No improvement otherwise. |
@alexcrichton Has opinions on inline annotations, so I defer to him. :-) |
Thanks for the PR! I think though that this PR may not be necessary in the sense that incremental compilation in release mode is not at all tuned for performance. These methods are all inlined as necessary (due to them being generic) so the extra Due to how codegen units are partitioned and how we do incremental compilation, though, you may not be able to inline these across codegen units in incremental mode. This will eventually be fixed if we implement incremental ThinLTO, but that has yet to be done at this point! In that sense I would be inclined to not merge this PR as extra |
I didn't think to check incremental + debug but it appears that the problem also occurs there
The added inline annotations here does not seem to help much though so it seems prudent to just close this (though I'd argue that the inline annotations that exist should be removed in that case as there is no reason to single out these 4 methods as the only ones without inline annotations). |
Er wait is that running benchmarks in debug mode? If so we've never optimized for that, so I wouldn't put much weight in those numbers! |
The ones in the first comment were release + incremental, the ones in #48905 (comment) were debug (opt-level = 0) + incremental. While I don't expect the debug mode code to be fast, I wanted to point out that it is rather unexpected that the Still, that might be fine. |
@Marwes would you be ok closing this or would you still prefer to land this? |
Were seeing some odd performance problems when using incremental compilation where
Rc
pointers were actually slower thanArc
pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent).(Source and non incremental benchmarks orium/rpds#7 (comment))