Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fixed issue with context cancelled error leading to connection spikes on Primary instances #3190

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

EXPEbdodla
Copy link

Issue: After upgrading from 9.5.1 to 9.7.0, we noticed the spikes in connections to Master/ Primary nodes and also reads are happening from Master nodes. Also noticed increase in pool_conn_total_current metrics.

Environment: AWS Cloud, Elasticache Redis with Cluster Mode

Cause: After debugging, we noticed that nodes are marked as failed when the context.Cancelled error is raised

node.MarkAsFailing()
. This is introduced in v9.5.2.

We tested this by deploying the change from my Fork and noticed improvements. Elasticache Cluster Current Connections Screenshot:
image

FYI: My first PR to go-redis. Happy to fix if any concerns.

@EXPEbdodla
Copy link
Author

can I get a review on this PR please? Thank you.

@ndyakov
Copy link
Collaborator

ndyakov commented Feb 20, 2025

@EXPEbdodla looking into it

@ndyakov
Copy link
Collaborator

ndyakov commented Feb 20, 2025

Looks related to #3282

@@ -38,6 +38,15 @@ type Error interface {

var _ Error = proto.RedisError("")

func isContextError(err error) bool {
switch err {
case context.Canceled, context.DeadlineExceeded:
Copy link
Contributor

@fishy fishy Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the code to know how the errors get passed here, but this approach would only match directly/exact context.Canceled and context.DeadlineExceeded errors, it won't match errors wrapping those errors.

In general (https://pkg.go.dev/errors#section-documentation) we should use errors.Is(err, context.Canceled) over err == context.Canceled, unless there's a reason we only want to match direct exact errors.

(not sure whether this is the reason this PR does not fix the issue we saw in #3282 (comment) verified that the change doesn't fix that issue either)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally errors are not wrapped in the go-redis package and this error should occur within the package. I do agree that with the new version there are tons of things that can be improved, error handling included. Currently we are focused on improving the stability, testing infrastructure , bug fixes and redis 8 support. Once those are addressed we will move to other improvements that are nice to have... I will continue looking into your issue and continue the discussion in its thread.

@ndyakov
Copy link
Collaborator

ndyakov commented Feb 21, 2025

@EXPEbdodla let's add a test for this change and we are good to go with the pr.

@EXPEbdodla
Copy link
Author

@EXPEbdodla let's add a test for this change and we are good to go with the pr.

Thanks @ndyakov . I'll look into adding tests for this.

@EXPEbdodla
Copy link
Author

@EXPEbdodla let's add a test for this change and we are good to go with the pr.

Thanks @ndyakov . I'll look into adding tests for this.

@ndyakov Added tests to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants