Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add HTTP/2 enabled transport as default transport #979

Merged
merged 23 commits into from
Oct 4, 2024

Conversation

jonathanedey
Copy link
Contributor

This changes the default HttpTransport to an HTTP/2 enabled Apache 5 HttpClient.

@jonathanedey jonathanedey added release-note release:stage Stage a release candidate labels Aug 20, 2024
Comment on lines 32 to 34
// Sets a 1 second delay before response
private static final String DELAY_URL = "https://nghttp2.org/httpbin/delay/1";
private static final String POST_URL = "https://nghttp2.org/httpbin/post";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test the timeouts I used this live server to simulate delay which is not ideal. What are your thoughts on this?

@emindeniz99
Copy link

emindeniz99 commented Aug 20, 2024

Thank you for your efforts. I think, if a new transport is implemented and used instead of tested built-in google http client, they should be very well tested. we can copy test cases from this not merged PR to this PR in order to test new http2transport layer.

https://github.com/googleapis/google-http-java-client/pull/1960/files#diff-f6e7e17e639b08337930ac900026abfe1531b7c7e9bea3b6866835526ed1fa81

googleapis/google-http-java-client#1960

@emindeniz99
Copy link

It fixes one of the big problem with current state of sdk but it is could be efficient if thread per request structure of library is changed. CallOp causes blocking a thread per request.

This is better, it does not open many sockets, etc. HTTP2 is used. It solves our main problem.
But the fcm library creates a thread for each request, for example sending 6 batch requests with each batch consisting of 500 device tokens. 6*500= 3k threads will be created.
As I know, each thread requires 1 MB memory, this is another problem.
Copied from: #834 (comment)

@lahirumaramba lahirumaramba self-assigned this Aug 20, 2024
@jonathanedey
Copy link
Contributor Author

Hi @emindeniz99, Thanks for your patience on this!

Went ahead and mirrored most of those tests here.

The issue of threads is something we have been looking into along side http2. The main culprit being the default ThreadExecutor (although it reuses idle threads) has no thread limit which causes more harm than good. We're planning to address this in a separate PR.

Ideally we would want to make full use of Apache HttpClient 5's async clients and use CompletableFutures instead of a thread per request. However from my understanding this requires the google http client to also support async clients or an effort on our side to modify their classes to do so.

This transport in itself is a bandaid solution until async clients required for HTTP/2 are supported in google http client.

@honorhs
Copy link

honorhs commented Aug 27, 2024

Thank you for your efforts in providing the HTTP/2 functionality.
It would be beneficial to have a report comparing the performance of the version with HTTP/2 applied and the previous version. This could include metrics such as system resources and processing completion time, under the assumption of sending a large number of push messages

Copy link
Member

@lahirumaramba lahirumaramba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I added a few comments. We can also discuss offline
Thanks!

import org.apache.hc.core5.http.nio.AsyncEntityProducer;
import org.apache.hc.core5.http.nio.DataStreamChannel;

@SuppressWarnings("deprecation")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which deprecation is this? Maybe we should add a note

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the google api client's StreamingContent. Removed the suppression to match other files where we don't suppress this warning.

ByteBuffer getBytebuf() {
return bytebuf;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

content.writeTo(baos);
} catch (IOException e) {
writeFuture.completeExceptionally(e);
// failed(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we remove this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been uncommented. Fixed that and fixed an issue where exceptions were not being re-thrown.

new PoolingAsyncClientConnectionManager();
connectionManager.setMaxTotal(100);
connectionManager.setDefaultMaxPerRoute(100);
connectionManager.closeIdle(TimeValue.of(30, TimeUnit.SECONDS));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note on how we decided on these limits? 100 max connections and 30 seconds timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I changed the settings here to match the google api client's apache http transport and added a comment stating that.

Also correctly limited the max concurrent streams to 100 and added a comment on the reasoning for this.

Also added more additional comments throughout this file.

ImmutableMap.<String, Object>of("foo", "bar");

// Sets a 5 second delay before response
private static final String DELAY_URL = "https://nghttp2.org/httpbin/delay/5";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a note explaining why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -88,6 +88,6 @@ public static JsonFactory getDefaultJsonFactory() {
}

public static HttpTransport getDefaultTransport() {
return Utils.getDefaultTransport();
return new ApacheHttp2Transport();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we dropping the google api client DefaultTransport and moving to the new apache client? This change affects the core SDK right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't require core SDK changes since we are still using a base HttpTransport and developers can still pass in their custom transports instead. However since the underlying transport is changing, there are still minor side effects in how requests are sent under the hood.

Keys ones being:

  • Timeouts
  • Low level request exceptions

For non http/2 backend service endpoints, requests will fallback to use http/1. As we discussed, we can decide on whether this should replace the current default transport or simply expose the new http/2 transport for developers to add manually, after we verify proxy functionality.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i build this yesterday and we are testing it in production .. :) since we completely overlooked the change.
had conflicts with pushy dependencies, so this was the last option.

im basically using your apachehttp2transport with more connections. any tip on max output settings? the 9.3.0 with sendEachForMulticastAsync required to use semaphores or it crashed our push server.

we are a soccer company and need to send out like ~80k pushes regulary as fasts as possible.
this solution seem to be stable in terms of pushes send & received, but processing seems kinda slow. and its crazy hard to get settings and processing queues correct.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i build this yesterday and we are testing it in production .. :) since we completely overlooked the change. had conflicts with pushy dependencies, so this was the last option.

im basically using your apachehttp2transport with more connections. any tip on max output settings? the 9.3.0 with sendEachForMulticastAsync required to use semaphores or it crashed our push server.

we are a soccer company and need to send out like ~80k pushes regulary as fasts as possible. this solution seem to be stable in terms of pushes send & received, but processing seems kinda slow. and its crazy hard to get settings and processing queues correct.

you can use current library for creating auth token and body, after that, send via spring webclient or reactor netty.
#834 (comment)
you check the code i shared on the comment and read the other issue for details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi emindeniz99 thanks for the reply. i tried it before but run in dependencies nightmare with pushy.

our issue is the delay. it used to be 1-2s till everything was out, now its like 1s - 30s on large loads.
firebase seems to have added some anti burst behavior. with warmups etc. which is fine for normal stuff, but people want to know when a goal was shot asap.

like can we get out 80k~ in a second?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SmikeSix2 Have you checked the GCP quota? Also, you're correct—the FCM endpoints are unreliable, and their response times fluctuate. We're using Pushy as well. FCM, WebClient, Reactor Netty, and Pushy work well together in our setup. We can reach 150k rpm on single pod.(I do not remember specs).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SmikeSix2 80k rps is very high, 5million rpm :) fcm does not like it. Did not you reach the quota? Do you have 5m+ limit for fcm? Did you check on gcp console?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe its up to 350k rps... in bad scenarios. for example when the national team plays. old solution did fine with it, I was always complimenting firebase. but yeah probably need to switch to topics for the large teams.

my colleague has the access ill ask around, thanks for your input!

@diegomarquezp
Copy link

This transport in itself is a bandaid solution until async clients required for HTTP/2 are supported in google http client.

cc: @burkedavison @ldetmer

@jonathanedey jonathanedey merged commit ab46aaa into master Oct 4, 2024
6 checks passed
@jonathanedey jonathanedey deleted the je-http2-default branch October 4, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:stage Stage a release candidate release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants