-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise Joiner to preallocate correct-sized StringBuilder #7532
Comments
Hi, Mark. I remember looking into this once a little, so I've tried to refresh my memory and then done some additional research. One huge complication here is that we'd really like users to use Another complication is that I don't have a good feel for the cost of garbage. While resizing obviously also costs us CPU from having to copy from one array to the next larger array, it would be nice to understand if the allocations along the way are hurting us much. But I don't know good rules of thumb for that. And then of course we're back to Android: We don't have easy access to profiling data from all Google Android apps the way that we do for our server apps, and that's unfortunate because Android is still the place that we worry relatively more about allocations. Now, given all that, if you were to come to us with a CL to optimize Even at that point, there's at least one more complication: Thanks in large part to Guava itself, we see a small but non-negligible number of cases in which users create lists/collections/iterables that act like So: My hope would be that the biggest durable bang for our buck would be to look into using the JDK's joining functionality more—and maybe even optimizing it if we find it to be a hot spot, whether on the server or on Android. But if you can point to an optimization that helps enough today, we could conceivably have a look. |
Thank you Chris for this extremely thoughtful response. I agree. With the presence of standard library options, which in the near-ish future will be present on the minimum SDK for most android apps, the benefit of optimising this is reduced. Thanks also for mentioning the transformed-list case. I didn't realise that, you're right: it's important. For a good optimization here, you'd want to optimise a type that takes a
Right, yes, I'm still trying to develop this feel. Certainly in tight loops in Android the garbage costs us. But otherwise, yeah we do have a thread-local allocation buffer, to mitigate some of the cost. And I think it's reasonable to say no until we have a profile showing a problem. I took a quick look at some recent Google Maps profiles and while Joiner is present, it is almost all in one usage that we've fixed. See internal bug b/381147963#comment7, I can't share the profiles publicly. I think we can close this for now. Sorry for the noise! |
No trouble at all! This issue is what led me to realize that |
Err, wait, no, I got mixed up between calling the There are still Things We Could Do, whether using |
I'm wrong again! While I don't see it in lists like https://developer.android.com/studio/write/java11-minimal-support-table, it is indeed always desugared and has been for years. Maybe it's not in the list because it's not a "Java 11" API, having been present since Java 8? But that's sad that that apparently means that those lists of APIs are not a complete list of desugared APIs.... |
Just for fun, note that my earlier approach failed for the additional reason that Android's [edit: not that I'm likely to employ any |
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+. See discussion in #7532. RELNOTES=n/a PiperOrigin-RevId: 701952500
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+. See discussion in #7532. RELNOTES=n/a PiperOrigin-RevId: 701952500
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+. See discussion in #7532. RELNOTES=n/a PiperOrigin-RevId: 701952500
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+. Implementation strategy: - No matter what we do, the JDK is going to allocate a `String[]` to guard (presumably) against concurrent mutation. (Paging [Frozen Arrays](https://openjdk.org/jeps/8261007)?) - Our main ability to optimize around that is to convince the JDK that it can preallocate an array of the proper size. Since the JDK [won't do that in its `Iterable` overload](https://bugs.openjdk.org/browse/JDK-8305774), we want to pass it an array. - Sadly, the JDK requires that the array be of type `CharSequence[]`, so we can't reuse the result of `toArray()`. Thus, if we want to avoid allocating `toArray()` output _on top of_ a `CharSequence[]` and a `String[]`, we have to write directly to a `CharSequence[]` during iteration over the array, with the array pre-sized based on the collection size (but without relying on that size to exactly match what we get from the iteration). See discussion in #7532. RELNOTES=n/a PiperOrigin-RevId: 701952500
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+. Implementation strategy: - No matter what we do, the JDK is going to allocate a `String[]` to guard (presumably) against concurrent mutation. (Paging [Frozen Arrays](https://openjdk.org/jeps/8261007)?) - Our main ability to optimize around that is to convince the JDK that it can preallocate an array of the proper size. Since the JDK [won't do that in its `Iterable` overload](https://bugs.openjdk.org/browse/JDK-8305774), we want to pass it an array. - Sadly, the JDK requires that the array be of type `CharSequence[]`, so we can't reuse the result of `toArray()`. Thus, if we want to avoid allocating `toArray()` output _on top of_ a `CharSequence[]` and a `String[]`, we have to write directly to a `CharSequence[]` during iteration over the array, with the array pre-sized based on the collection size (but without relying on that size to exactly match what we get from the iteration). See discussion in #7532. RELNOTES=n/a PiperOrigin-RevId: 701952500
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+. Implementation strategy: - No matter what we do, the JDK is going to allocate a `String[]` to guard (presumably) against concurrent mutation. (Paging [Frozen Arrays](https://openjdk.org/jeps/8261007)?) - Our main ability to optimize around that is to convince the JDK that it can preallocate an array of the proper size. Since the JDK [won't do that in its `Iterable` overload](https://bugs.openjdk.org/browse/JDK-8305774), we want to pass it an array. - Sadly, the JDK requires that the array be of type `CharSequence[]`, so we can't reuse the result of `toArray()`. Thus, if we want to avoid allocating `toArray()` output _on top of_ a `CharSequence[]` and a `String[]`, we have to write directly to a `CharSequence[]` during iteration over the array, with the array pre-sized based on the collection size (but without relying on that size to exactly match what we get from the iteration). See discussion in #7532. RELNOTES=n/a PiperOrigin-RevId: 704698587
API(s)
How do you want it to be improved?
Would love to preallocate the size of the StringBuilder's backing char[] by using the StringBuilder(int) constructor after summing the size of the strings passed in.
Now this might only be feasible if:
instanceof List
andinstanceof RandomAccess
so we can pre-sum the sizes without allocating another Iterator (although you could argue that allocating one more iterator might be worth saving the extrachar[]
allocations)String
objects, not otherObject
s that we have totoString
, because thetoString
the first run through the loop might allocate or give different responses from the second time through the loop.Maybe these limitations make it "too hard". I'd previously done something similar for protobuf: https://github.com/protocolbuffers/protobuf/blob/525e16a8757802cb0dfcef064d88144a38d2b595/java/core/src/main/java/com/google/protobuf/AbstractMessageLite.java#L352
Why do we need it to be improved?
I was surprised that Joiner was dynamically resizing the backing array. It makes performance worse.
Example
Current Behavior
Makes a
StringBuilder
that dynamically starts off at size 16 and then reallocates and grows and copies into a new array. Then copies into a String in toString, which probably trims the array down to the right size (copying again).Desired Behavior
Minimise allocations. Allocate the right size char[] up front so it can be used directly by the String without another copy.
Concrete Use Cases
Don't have one. Haven't looked at the profiler yet to see how much of a problem this is. We just spotted this in internal cl/700560986 when optimising something similar.
Checklist
I agree to follow the code of conduct.
I have read and understood the contribution guidelines.
I have read and understood Guava's philosophy, and I strongly believe that this proposal aligns with it.
The text was updated successfully, but these errors were encountered: