Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise Joiner to preallocate correct-sized StringBuilder #7532

Closed
3 tasks done
mhansen opened this issue Nov 27, 2024 · 6 comments
Closed
3 tasks done

Optimise Joiner to preallocate correct-sized StringBuilder #7532

mhansen opened this issue Nov 27, 2024 · 6 comments
Labels
P4 no SLO package=base type=enhancement Make an existing feature better

Comments

@mhansen
Copy link

mhansen commented Nov 27, 2024

API(s)

com.google.common.base.Joiner

How do you want it to be improved?

Would love to preallocate the size of the StringBuilder's backing char[] by using the StringBuilder(int) constructor after summing the size of the strings passed in.

Now this might only be feasible if:

  • The Iterable passed in is instanceof List and instanceof RandomAccess so we can pre-sum the sizes without allocating another Iterator (although you could argue that allocating one more iterator might be worth saving the extra char[] allocations)
  • The passed in Iterable is only full of String objects, not other Objects that we have to toString, because the toString the first run through the loop might allocate or give different responses from the second time through the loop.

Maybe these limitations make it "too hard". I'd previously done something similar for protobuf: https://github.com/protocolbuffers/protobuf/blob/525e16a8757802cb0dfcef064d88144a38d2b595/java/core/src/main/java/com/google/protobuf/AbstractMessageLite.java#L352

Why do we need it to be improved?

I was surprised that Joiner was dynamically resizing the backing array. It makes performance worse.

Example

Joiner.on(", ", "Foo", "Bar", "Baz", "Qux", "etc");

Current Behavior

Makes a StringBuilder that dynamically starts off at size 16 and then reallocates and grows and copies into a new array. Then copies into a String in toString, which probably trims the array down to the right size (copying again).

Desired Behavior

Minimise allocations. Allocate the right size char[] up front so it can be used directly by the String without another copy.

Concrete Use Cases

Don't have one. Haven't looked at the profiler yet to see how much of a problem this is. We just spotted this in internal cl/700560986 when optimising something similar.

Checklist

@mhansen mhansen added the type=enhancement Make an existing feature better label Nov 27, 2024
@cpovirk
Copy link
Member

cpovirk commented Nov 27, 2024

Hi, Mark. I remember looking into this once a little, so I've tried to refresh my memory and then done some additional research.

One huge complication here is that we'd really like users to use String.join, StringJoiner, and Collectors.joining. (Then there is the additional complication that the first requires API Level 26 and the others require 24 or opt-in library desugaring :)) That's partially because we want to encourage use of the standard library over preexisting Guava APIs when we can just on principle, but it's also because both of those classes can have recourse to JDK-internal fast paths that we could never compete with (though they could do still better, too, and I'm confused about what happened to this JDK9-only(??) optimization). [edit: Wait, no, that particular patch isn't needed anymore because StringJoiner now delegates to String.join, which has a similar optimization.] (But I also note the complication that the JDK methods are restricted to CharSequence inputs, whereas Joiner will accept any Object, albeit in a case that you aren't proposing to optimize, anyway....)

Another complication is that I don't have a good feel for the cost of garbage. While resizing obviously also costs us CPU from having to copy from one array to the next larger array, it would be nice to understand if the allocations along the way are hurting us much. But I don't know good rules of thumb for that. And then of course we're back to Android: We don't have easy access to profiling data from all Google Android apps the way that we do for our server apps, and that's unfortunate because Android is still the place that we worry relatively more about allocations.

Now, given all that, if you were to come to us with a CL to optimize Joiner and a profile from an Android app that supports API Level 25 or less and that shows practical benefit, then we might still go for it. But I think we'd need that kind of hard data.

Even at that point, there's at least one more complication: Thanks in large part to Guava itself, we see a small but non-negligible number of cases in which users create lists/collections/iterables that act like Lists.transform and friends. In those cases, each access to an element performs the transformation again, and that could include constructing the string from scratch. (It can also include operations like filtering, at least for a plain Collection, so we can't necessarily even estimate the string length based on the number of elements.) We really wish that such collections didn't exist, but given that we do, we'd have to be careful that our optimizations don't significantly regress such use cases as we micro-optimize the much more common case. (One option would be to optimize only the case of array inputs and perhaps some well-known classes, like getClass() == ArrayList.class and the immutable collections (though the latter is complicated by the fact that those collections live in collect while Joiner lives in base... and maybe also by ContiguousSet??).)

So: My hope would be that the biggest durable bang for our buck would be to look into using the JDK's joining functionality more—and maybe even optimizing it if we find it to be a hot spot, whether on the server or on Android. But if you can point to an optimization that helps enough today, we could conceivably have a look.

@mhansen
Copy link
Author

mhansen commented Nov 27, 2024

Thank you Chris for this extremely thoughtful response. I agree.

With the presence of standard library options, which in the near-ish future will be present on the minimum SDK for most android apps, the benefit of optimising this is reduced.

Thanks also for mentioning the transformed-list case. I didn't realise that, you're right: it's important. For a good optimization here, you'd want to optimise a type that takes a String[]. Maybe we could add an overload for join(String[]) (I suppose that would be used in preference to join(Object[]) at runtime), but it's probably not worth it.

Another complication is that I don't have a good feel for the cost of garbage.

Right, yes, I'm still trying to develop this feel. Certainly in tight loops in Android the garbage costs us. But otherwise, yeah we do have a thread-local allocation buffer, to mitigate some of the cost.

And I think it's reasonable to say no until we have a profile showing a problem. I took a quick look at some recent Google Maps profiles and while Joiner is present, it is almost all in one usage that we've fixed. See internal bug b/381147963#comment7, I can't share the profiles publicly.

I think we can close this for now. Sorry for the noise!

@mhansen mhansen closed this as completed Nov 27, 2024
@cpovirk
Copy link
Member

cpovirk commented Dec 2, 2024

No trouble at all! This issue is what led me to realize that String.join and StringJoiner, which weren't particularly optimized as of Java 11, did get optimized by Java 21. And even then, I'd initially thought that we'd have a hard time using them because of the complexities of Joiner (as we touched on above), like different null handling and accepting Iterable, as well as the fact that the optimizations didn't exist under older versions of Java. But I think I've convinced myself that it's actually pretty easy to at least improve on our current performance for Iterable, even if we could imagine pursuing further improvements with further effort. I've started playing around in cl/701952500....

@cpovirk
Copy link
Member

cpovirk commented Dec 2, 2024

Err, wait, no, I got mixed up between calling the Iterable overload (which, per JDK-8305774 and like StringJoiner, allocates a potentially unnecessarily large array, especially given that we don't want to allocate at all by the time we're passing an immutable Iterable<CharSequence>!) and calling the array/varargs overload (which is better but which doesn't let me play my trick of calling toArray on an input collection and then overwriting its elements in place, since the result of that has to be an Object[] in general, not a CharSequence[]...).

There are still Things We Could Do, whether using StringJoiner, adding an overload for List<? extends CharSequence> (since Iterable<? extends CharSequence> would clash), doing more work ourselves, or something else. But I'm back to thinking that this would require some actual thought. Not that that will stop me from at least seeing where an internal List special case might lead....

@cpovirk
Copy link
Member

cpovirk commented Dec 2, 2024

Then there is the additional complication that [String.join] requires API Level 26

I'm wrong again! While I don't see it in lists like https://developer.android.com/studio/write/java11-minimal-support-table, it is indeed always desugared and has been for years. Maybe it's not in the list because it's not a "Java 11" API, having been present since Java 8? But that's sad that that apparently means that those lists of APIs are not a complete list of desugared APIs....

@cpovirk
Copy link
Member

cpovirk commented Dec 2, 2024

Just for fun, note that my earlier approach failed for the additional reason that Android's Arrays.asList(...).toArray() is/was broken in the way that the JDK's used to be. So I couldn't use that approach for the Android flavor, and (assuming that the comment there is accurate) I couldn't use it for the JRE flavor, either, until we someday drop support for Java 8.

[edit: not that I'm likely to employ any String.join approach in our Android flavor, since Android doesn't have the same optimizations there as the JDK]

copybara-service bot pushed a commit that referenced this issue Dec 9, 2024
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+.

See discussion in #7532.

RELNOTES=n/a
PiperOrigin-RevId: 701952500
copybara-service bot pushed a commit that referenced this issue Dec 9, 2024
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+.

See discussion in #7532.

RELNOTES=n/a
PiperOrigin-RevId: 701952500
copybara-service bot pushed a commit that referenced this issue Dec 9, 2024
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+.

See discussion in #7532.

RELNOTES=n/a
PiperOrigin-RevId: 701952500
copybara-service bot pushed a commit that referenced this issue Dec 10, 2024
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+.

Implementation strategy:

- No matter what we do, the JDK is going to allocate a `String[]` to guard (presumably) against concurrent mutation. (Paging [Frozen Arrays](https://openjdk.org/jeps/8261007)?)
- Our main ability to optimize around that is to convince the JDK that it can preallocate an array of the proper size. Since the JDK [won't do that in its `Iterable` overload](https://bugs.openjdk.org/browse/JDK-8305774), we want to pass it an array.
- Sadly, the JDK requires that the array be of type `CharSequence[]`, so we can't reuse the result of `toArray()`. Thus, if we want to avoid allocating `toArray()` output _on top of_ a `CharSequence[]` and a `String[]`, we have to write directly to a `CharSequence[]` during iteration over the array, with the array pre-sized based on the collection size (but without relying on that size to exactly match what we get from the iteration).

See discussion in #7532.

RELNOTES=n/a
PiperOrigin-RevId: 701952500
copybara-service bot pushed a commit that referenced this issue Dec 10, 2024
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+.

Implementation strategy:

- No matter what we do, the JDK is going to allocate a `String[]` to guard (presumably) against concurrent mutation. (Paging [Frozen Arrays](https://openjdk.org/jeps/8261007)?)
- Our main ability to optimize around that is to convince the JDK that it can preallocate an array of the proper size. Since the JDK [won't do that in its `Iterable` overload](https://bugs.openjdk.org/browse/JDK-8305774), we want to pass it an array.
- Sadly, the JDK requires that the array be of type `CharSequence[]`, so we can't reuse the result of `toArray()`. Thus, if we want to avoid allocating `toArray()` output _on top of_ a `CharSequence[]` and a `String[]`, we have to write directly to a `CharSequence[]` during iteration over the array, with the array pre-sized based on the collection size (but without relying on that size to exactly match what we get from the iteration).

See discussion in #7532.

RELNOTES=n/a
PiperOrigin-RevId: 701952500
copybara-service bot pushed a commit that referenced this issue Dec 10, 2024
…s://bugs.openjdk.org/browse/JDK-8265237) for JDK 17+.

Implementation strategy:

- No matter what we do, the JDK is going to allocate a `String[]` to guard (presumably) against concurrent mutation. (Paging [Frozen Arrays](https://openjdk.org/jeps/8261007)?)
- Our main ability to optimize around that is to convince the JDK that it can preallocate an array of the proper size. Since the JDK [won't do that in its `Iterable` overload](https://bugs.openjdk.org/browse/JDK-8305774), we want to pass it an array.
- Sadly, the JDK requires that the array be of type `CharSequence[]`, so we can't reuse the result of `toArray()`. Thus, if we want to avoid allocating `toArray()` output _on top of_ a `CharSequence[]` and a `String[]`, we have to write directly to a `CharSequence[]` during iteration over the array, with the array pre-sized based on the collection size (but without relying on that size to exactly match what we get from the iteration).

See discussion in #7532.

RELNOTES=n/a
PiperOrigin-RevId: 704698587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 no SLO package=base type=enhancement Make an existing feature better
Projects
None yet
Development

No branches or pull requests

2 participants