Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix token bucket not being set for both standard and adaptive retry modes #3964

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aajtodd
Copy link
Contributor

@aajtodd aajtodd commented Jan 8, 2025

Motivation and Context

awslabs/aws-sdk-rust#1234

Description

PR adds a new interceptor registered as part of the default retry plugin components that ensures a token bucket is always present and available to the retry strategy. The buckets are partitioned off the retry partition (which defaults to the service name and is already set by the default plugin). We use a static variable in the runtime for this which means that token buckets can and will apply to every single client that uses the same retry partition. The implementation tries to avoid contention on this new global lock by only consulting it if the retry partition is overridden after client creation.

For AWS SDK clients I've updated the default retry partition clients are created with to include the region when set. Now the default partition for a client will be {service}-{region} (e.g. sts-us-west-2) rather than just the service name (e.g. sts). This partitioning is a little more granular and closer to what we want/expect as failures in one region should not cause throttling to another (and vice versa for success in one should not increase available quota in another).

I also updated the implementation to follow the SEP a little more literally/closely as far as structure which fixes some subtle bugs. State is updated in one place and we ensure that the token bucket is always consulted (before the token bucket could be skipped in the case of adaptive retries returning a delay and the adaptive rate limit was updated in multiple branches).

Testing

Checklist

  • [x ] For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "client," "server," or both in the applies_to key.
  • [ x] For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "aws-sdk-rust" in the applies_to key.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

github-actions bot commented Jan 8, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

github-actions bot commented Jan 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

github-actions bot commented Jan 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

github-actions bot commented Jan 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

A new generated diff is ready to view.

A new doc preview is ready to view.

@aajtodd aajtodd marked this pull request as ready for review January 10, 2025 17:27
@aajtodd aajtodd requested review from a team as code owners January 10, 2025 17:27
Comment on lines +182 to +185
let default_retry_partition = match config.region() {
Some(region) => #{Cow}::from(format!("{default_retry_partition}-{}", region)),
None => #{Cow}::from(default_retry_partition),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let default_retry_partition = match config.region() {
Some(region) => #{Cow}::from(format!("{default_retry_partition}-{}", region)),
None => #{Cow}::from(default_retry_partition),
};
let default_retry_partition = match config.region() {
Some(region) => #{Cow}::from(format!("{default_retry_partition}-{region}")),
None => #{Cow}::from(default_retry_partition),
};

Comment on lines +226 to +229
fun usesRegion(codegenContext: ClientCodegenContext) =
codegenContext.getBuiltIn(AwsBuiltIns.REGION) != null ||
ServiceIndex.of(codegenContext.model)
.getEffectiveAuthSchemes(codegenContext.serviceShape).containsKey(SigV4Trait.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever have to worry about models supporting sigv4a but not sigv4? How do retry partitions interact with Sigv4a regions (if at all)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this same thought when reading/extracting this function. I think the possibility is certainly coming but wasn't sure.

.expect("success");

let log_contents = logs_rx.contents();
assert!(log_contents.contains("token bucket for RetryPartition { name: \"dontcare-us-west-2\" } added to config bag"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider updating the orchestrator to note the retry partition used for each retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, is there any advantage over logging it here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants