Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid gRPC retries for non-idempotent KV operations on gRPC UNAVAILABLE as default #1444

Closed
jcferretti opened this issue Feb 16, 2025 · 0 comments

Comments

@jcferretti
Copy link
Contributor

jcferretti commented Feb 16, 2025

Versions

  • etcd: N/A
  • jetcd: 0.8.3; also impacts earlier versions since 0.6.0
  • java: N/A

Describe the bug
Jepsen report https://jepsen.io/analyses/jetcd-0.8.2 explains issues that can occur due to automated retries of non-idempotent operations. The issue has been discussed at length in several etcd github issues (listed below), and some of the conclusions of those discussions captured for etcd clients on etcd issue 18424. 18424 describes problems in detail.

To Reproduce
Refer to previous paragraph

Expected behavior
Non-idempotent operations like writes and transactions should not retry by default when it is not known at the client if the server may have performed the operation. Eg, retrying an arbitrary etcd Put RPC on an auth token expired error is safe, retrying it on a general gRPC UNAVAILABLE error is not

Additional context

  • See the description for UNAVAILABLE in the gRPC documentation https://grpc.github.io/grpc/core/md_doc_statuscodes.html stating "Note that it is not always safe to retry non-idempotent operations"
  • Jepsen originally created jetcd issue 1072 but the understanding of the root casue of the problem was less clear at that point; the discussions on etcd issues and further jepsen research since have shed more light, clearly identifying the originally reported behavior as being a direct result of automated retries of non-idempotent operations (TXN in the original jetcd issue report 1072).
  • The etcd issues filed by jepsen whose discussion led to the conclusions in 18424 are mentioned in the jepsen report, section 4. "Discussion", and are listed below for convenience, note the last one 14890 is the one that generated engagement from the etcd team and led eventually to the creation of 18424:
    14092
    14890

Proposed change
I propose we separate the current implementation of isRetryable

return Status.UNAVAILABLE.getCode().equals(status.getCode()) || isInvalidTokenError(status)

and create two distinct functions, one for idempotent operations (eg, Get RPC), and another one for non-idempotent operations (eg, Put, Txn RPCs). The current implementation of isRetryable is good for retries of idempotent ops. The non-idempotent version would remove Status.UNAVAILABLE from the or (||) expression to prevent retries on that case.
Locations that reference isRetryable today can be modified to call the appropriate new version, eg:

There is value in providing users with the ability to ask for automated retry of operations that due to their use case they know will be safe, eg, ensuring a particular Put RPC goes through may be preferable in some particular case even if it ends up resulting in two writes in the server; we can give users the ability to request automated retries of non-idempotent calls but we shouldn't make it the default.

jcferretti added a commit to jcferretti/jetcd that referenced this issue Feb 16, 2025
@jcferretti jcferretti changed the title jetcd should not do gRPC retries for non-idempotent operations on gRPC UNAVAILABLE Avoid gRPC retries for non-idempotent KV operations on gRPC UNAVAILABLE as default Feb 16, 2025
jcferretti added a commit to jcferretti/jetcd that referenced this issue Feb 17, 2025
Also use the chance to improve javadoc on some *Options objects

Signed-off-by: Cristian Ferretti <jcferretti2020@gmail.com>
jcferretti added a commit to jcferretti/jetcd that referenced this issue Feb 18, 2025
Also use the chance to improve javadoc on some *Options objects

Signed-off-by: Cristian Ferretti <jcferretti2020@gmail.com>
jcferretti added a commit to jcferretti/jetcd that referenced this issue Feb 18, 2025
Also use the chance to improve javadoc on some *Options objects

Signed-off-by: Cristian Ferretti <jcferretti2020@gmail.com>
jcferretti added a commit to jcferretti/jetcd that referenced this issue Feb 18, 2025
Also use the chance to improve javadoc on some *Options objects

Signed-off-by: Cristian Ferretti <jcferretti2020@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant