-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THRIFT-5240: Do connectivity check in Go server #2190
Conversation
1f2cbb0
to
8b4e17a
Compare
Not so sure about this one:
Did you test this on a production workload by any chance? I'd be especially interested to see its effects on latency. |
@dcelasun Currently our prod is still running pre-bcae3bb (we plan to deploy bcae3bb to prod next week). I've been running prod version (old) vs. bcae3bb+this PR (new) side-by-side in our staging environment for a while now. We have 2 go services, A & B, while A is also client to B (and another Java service). Here are some graphs for the last 12hr (in all the graphs, cyan is new and green is old): A's client side latency p99 for A->B requests: A's client side latency p50 for A->B requests: A's goroutines (this is a ticker of 10s getting (in p50 charts, the unit of y axis is ms) I'll run bcae3bb vs. this PR now and post results a few hours later. |
Thanks for the graphs! Eyeballing the p50 latency, seems like a ~15% increase? That's a bit worse than I expected. If it's not too much to ask, can you also test the effects of different Also, what's your Go version? I've found the ticker optimization I was talking about and that should be available with 1.14+ |
We are running go 1.14.4. I think it's more like a constant ~0.5ms overhead than 15% overhead :) If someone is running a really low latency go thrift server, they can also disable this feature by having |
also the charts' y-axis are not starting from 0 (I know that's bad but that's just the default behavior from the monitoring service we use) |
Ha, that's a better interpretation :)
I know. I just picked a few random points from p50 charts and saw a ~15% increase but if it's actually a constant 0.5ms then that's no problem at all. |
Also the |
Now bcae3bb (old) vs. this PR (new) has been running for 2hr+, here are the 2hr charts (still, cyan is new and green is old). Service A server side latency p50: Service A server side latency p99: Service B server side latency p50: Service B server side latency p99 (this one is weird, the old one actually hits the timeout/40ms a lot, probably just bad luck?): A->B client side latency p99 (note that it's not always old A hitting old B and new A hitting new B. service A and B are both in their pool so any A->B request can hit any B service with a roughly equal chance): The most notable gap is on B server side p50 latency, which could be as large as 1ms (it could be calculated as ~50% increase in latency, depending on how you calculate that). I think we can just document that (probably in |
OK this looks reasonable I think.
Yeah, just a note about how this feature will slightly increase latency and goroutine count, but can be disabled with |
8b4e17a
to
980c7d7
Compare
README updated. |
Client: go In compiler generated TProcessorFunction implementations, add a goroutine after read the request to do connectivity check on the input transport. If the transport is no longer open, cancel the context object passed into the handler implementation. Also define ErrAbandonRequest error, to help TSimpleServer closing client connections that's already closed on the other end.
980c7d7
to
07dc2fe
Compare
Although this is a very demanding feature in my POV, but i think there are some problems in this PR
I will make a new PR for this in some days |
@zerosnake0 The default interval has been changed from 1ms to 5ms in #2256. We also discussed between disable it by default vs. use 5ms as default in that PR. Did you try that version? Is it still too much cpu in your case? |
Client: go
In compiler generated TProcessorFunction implementations, add a
goroutine after read the request to do connectivity check on the input
transport. If the transport is no longer open, cancel the context object
passed into the handler implementation.
Also define ErrAbandonRequest error, to help TSimpleServer closing
client connections that's already closed on the other end.
[skip ci]
anywhere in the commit message to free up build resources.