-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fixed issue with context cancelled error leading to connection spikes on Primary instances #3190
base: master
Are you sure you want to change the base?
Conversation
06a4de2
to
c6d0fff
Compare
c6d0fff
to
de69b49
Compare
can I get a review on this PR please? Thank you. |
@EXPEbdodla looking into it |
Looks related to #3282 |
@@ -38,6 +38,15 @@ type Error interface { | |||
|
|||
var _ Error = proto.RedisError("") | |||
|
|||
func isContextError(err error) bool { | |||
switch err { | |||
case context.Canceled, context.DeadlineExceeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the code to know how the errors get passed here, but this approach would only match directly/exact context.Canceled
and context.DeadlineExceeded
errors, it won't match errors wrapping those errors.
In general (https://pkg.go.dev/errors#section-documentation) we should use errors.Is(err, context.Canceled)
over err == context.Canceled
, unless there's a reason we only want to match direct exact errors.
(not sure whether this is the reason this PR does not fix the issue we saw in #3282 (comment) verified that the change doesn't fix that issue either)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally errors are not wrapped in the go-redis
package and this error should occur within the package. I do agree that with the new version there are tons of things that can be improved, error handling included. Currently we are focused on improving the stability, testing infrastructure , bug fixes and redis 8 support. Once those are addressed we will move to other improvements that are nice to have... I will continue looking into your issue and continue the discussion in its thread.
@EXPEbdodla let's add a test for this change and we are good to go with the pr. |
Thanks @ndyakov . I'll look into adding tests for this. |
575ef70
to
6559130
Compare
@ndyakov Added tests to this PR. |
Issue: After upgrading from 9.5.1 to 9.7.0, we noticed the spikes in connections to Master/ Primary nodes and also reads are happening from Master nodes. Also noticed increase in pool_conn_total_current metrics.
Environment: AWS Cloud, Elasticache Redis with Cluster Mode
Cause: After debugging, we noticed that nodes are marked as failed when the
context.Cancelled
error is raisedgo-redis/osscluster.go
Line 1324 in 930d904
We tested this by deploying the change from my Fork and noticed improvements. Elasticache Cluster Current Connections Screenshot:
data:image/s3,"s3://crabby-images/f0de3/f0de34a2865db8138b3072e233bb48f4b06243ec" alt="image"
FYI: My first PR to go-redis. Happy to fix if any concerns.