-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use TrustedRandomAccess for loop desugaring #93243
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
edabd37
to
54b2de5
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 8c229b8fec45c0a6baac50735ea8ec2a2cffe2fb with merge 1e754880c626b06734cc4147827bfd63e31e9a97... |
☀️ Try build successful - checks-actions |
Queued 1e754880c626b06734cc4147827bfd63e31e9a97 with parent 84322ef, future comparison URL. |
Finished benchmarking commit (1e754880c626b06734cc4147827bfd63e31e9a97): comparison url. Summary: This change led to very large relevant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
Huh. That's actually not too terrible. I think this could go in and be built upon. |
|
||
#[unstable(feature = "trusted_random_access", issue = "none")] | ||
#[doc(hidden)] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray blank line?
#[unstable(feature = "trusted_random_access", issue = "none")] | ||
#[doc(hidden)] | ||
|
||
pub trait IntoIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid a trait named IntoIterator
that's not the actual trait -- it's really confusing, particularly for later PRs which'll touch just parts of this file and so won't see the RealIntoIterator import.
|
||
#[unstable(feature = "trusted_random_access", issue = "none")] | ||
// #[cfg_attr(not(bootstrap), lang = "loop_desugar")] | ||
#[cfg_attr(not(bootstrap), lang = "into_iter")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, this into_iter feels iffy -- would be better to call it shim_into_iter or similar.
@@ -2,10 +2,10 @@ error[E0277]: the size for values of type `dyn Iterator<Item = &'a mut u8>` cann | |||
--> $DIR/issue-20605.rs:2:17 | |||
| | |||
LL | for item in *things { *item = 0 } | |||
| ^^^^^^^ expected an implementor of trait `IntoIterator` | |||
| ^^^^^^^ expected an implementor of trait `iter::loop_desugar::IntoIterator` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a pretty unfortunate diagnostics regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named the trait and method the way they are so that the diagnostics look at least mostly right. If I rename them then this would get worse.
The implementation here is remarkably simple for what it does, and the next steps look quite appealing -- there's a whole swath of code that this could clean up, presumably. That said, the compile time and diagnostics regressions look pretty concerning, it seems like a lot of real-world crates are regressing. The wins are mostly in the smaller stress tests from what I can see, though I haven't attempted a thorough evaluation. It sounds like you roughly expect that the diagnostics regression will go away within a few subsequent PRs? FWIW, I may not be the best reviewer and/or would need to do some educational reading of the various Iterator extension traits we've grown -- it feels like there's quite a bit of complexity particularly around TRA(NC) and I'm not sure that I'm currently well-versed enough to evaluate tradeoffs here appropriately. I at least would find it useful to see a higher-level summary of the situation with Iterator, TRA(NC), Zip specialization etc, and then a summary of the different approaches we could take (and roughly the set of things we end up with at each end), along with the motivation from a performance perspective for adding TRA and similar things. As-is, it's difficult for me to participate in / evaluate the tradeoffs being made in PRs like this because without that high level roadmap I feel like I'm only seeing a small part of the maze and guessing at the rest. (If there's someone who is up to speed on all of this and you want to r? them, that's also fine, though I think ideally we want this kind of documentation in the long run regardless.) |
No, I could use some advice how to fix them. I only massaged this PR to keep them as small as you see here, previous iterations were worse.
Well, it's fairly mixed but also real crates showing improvements e.g
The extension traits interact in a few places but none of the changes I have described would interfere with that. So this is on TRA and Zip only. Well, maybe it'll touch
Zip's unsafe specializations are complex due to the way it's currently built (using unchecked accesses in the single-step methods and working hard to preserve side-effects). This has lead to unsoundness and other issues before (#85873 #85863 #86443 #82282 #80670 #82291 #81740 ...) and requires a pile of safety requirements. So the goal is simplification while retaining or possibly even improving runtime performance. Imo TRA is one of the more radioactive traits we currently have in the standard library. The introduction (#33090) predates perf reports so there wasn't an initial measurement and its use has extended over time anyway, so it's unclear what the cumulative performance benefit is. Microbenchmarks did show measurable improvements whenever it was added to another adapter. We could try ripping it out see how much regresses if that helps? But the compiler isn't the most Zip-heavy code, so maybe perf.rlo isn't the best reference. #53340 #44424 are examples of user-code that needed zip specialization to gain performance. Options that have been suggested:
I'm the one driving this, I don't think anyone else has a roadmap and I have outlined my rough next steps in the PR description. Where do I need to fill in details?
Maybe yaahc? She wanted to learn about other iterator traits (#87667) to be able to better review them. But considering the age of that PR it seems she's busy. Other than that I only know about alumni or non-team members who are familiar with this stuff. |
Assuming this is a good idea, I wonder if it should be a more primary thing? Like if |
That could happen separately. Whether with this specialization hack or changing the desugaring in some other way to drive loops via TRA does enable the same cleanups. Some changes may need pre- and post-actions, with the specialization they could be mapped to |
I think it probably makes sense to have a separate write up and/or one on this PR that discusses the limitations imposed on us due to for loops using IntoIterator::into_iter on its argument, and possible paths forward. I think just ad-hoc replacing it with this PR's approach is probably not something I at least would be willing to r+ but if we had a plan going forward I think that would make more me amenable. This somewhat touches on e.g. aspirations for making std less coupled to unstable rustc, where this kind of desugaring change seems to go the other way. Plus, if there are performance benefits to not using for loops, perhaps those should be otherwise exposed somehow as well. |
⌛ Trying commit aec6846 with merge 3ddb857a0f23a17a811588ce61b0d1b89f2abfd6... |
☀️ Try build successful - checks-actions |
Queued 3ddb857a0f23a17a811588ce61b0d1b89f2abfd6 with parent 4ca19e0, future comparison URL. |
Finished benchmarking commit (3ddb857a0f23a17a811588ce61b0d1b89f2abfd6): comparison url. Summary:
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
The regressions in debug builds are due to more llvm-IR being emitted. I think I can shave off some of that, but not all. |
Finished benchmarking commit (3ddb857a0f23a17a811588ce61b0d1b89f2abfd6): comparison url. Summary:
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
…al the need for pre/post loop calls The loop setup and cleanup functions are only required for some specific iterators. To avoid emitting IR when they're not needed even in debug mode we need to use the type system instead of const-DCE because the constants would only be known after monomorphization.
This reduces the amount of IR generated when post-loop cleanup is needed
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit ffff194 with merge b7aa46da46e95cf0f692a7576bc9b9ad742224ab... |
The job Click to see the possible cause of the failure (guessed by this bot)
|
☀️ Try build successful - checks-actions |
Queued b7aa46da46e95cf0f692a7576bc9b9ad742224ab with parent 27af517, future comparison URL. |
Finished benchmarking commit (b7aa46da46e95cf0f692a7576bc9b9ad742224ab): comparison url. Summary:
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
perfbot sets to waiting on reviewer, but I suspect that's not actually true. FWIW probably a good next step once you're ready from my perspective would be a libs-api and maybe libs nomination -- the first for the user-facing diagnostics impact we expect, and the second for implementation complexity judgement call. Maybe just cc'ing another reviewer would also work, though, as a lightweight alternative (or discussion on #t-libs or so). |
@the8472 any update on this? |
Early perf numbers looked good but they only covered the case of using TRA for loop iteration, none of the TRA-unsafety-reductions. For the reductions a bunch of helper traits and methods are needed which are bad for compile times and debug build sizes. It might be possible that tweaking the helpers a bit could result in less compile overhead but I have another angle I'll try before that. So I'll close this for now. |
The idea behind this PR is to change
for _ in
to use TrustedRandomAccessNoCoerce (TRANC) which means the optimizer immediately sees see that they're counted loops instead of only having the possibility of seeing that after inlining. This mostly alleviates the need to have single-step methods such asZip::next
use TRA but may also enable some other external iterations to optimize more like internal iteration.The possible steps after this change:
Zip::fold
specializationtry_fold
,by_ref
andvec::IntoIter
for Drop types. These were previously blocked on the cleanup method not being feasible with single-step methods, which this change would eliminateSo the primary goal of this PR is to open up the path to complexity reduction in Zip and potential performance gains by bringing counted loops to more iterators. This PR on its own is not expected to improve performance significantly due to the extra indirection in the loop desugaring.
Prior Zulip discussion outlining the idea.
Some of the diagnostics may need fixes.
Since this uses TRA in more places this also contains the commits from #92566 for perf testing (the first two commits), but I can remove them again if we want them to land separately.