Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry after HTTP2 GOAWAY errors #738

Closed
ozgrakkurt opened this issue Feb 13, 2023 · 15 comments
Closed

Retry after HTTP2 GOAWAY errors #738

ozgrakkurt opened this issue Feb 13, 2023 · 15 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.

Comments

@ozgrakkurt
Copy link

Describe the bug

[2023-02-13T11:10:36Z WARN aws_smithy_client::hyper_ext] unrecognized error from Hyper. If this error should be retried, please file an issue. err=http2 error: connection error received: not a result of an error: connection error received: not a result of an error (hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))

Expected Behavior

Should retry

Current Behavior

Error out

Reproduction Steps

none

Possible Solution

No response

Additional Information/Context

No response

Version

aws-smithy-client = "0.54.2"

Environment details (OS name and version, etc.)

docker ubuntu:latest

Logs

[2023-02-13T11:10:36Z WARN aws_smithy_client::hyper_ext] unrecognized error from Hyper. If this error should be retried, please file an issue. err=http2 error: connection error received: not a result of an error: connection error received: not a result of an error (hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))

@ozgrakkurt ozgrakkurt added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 13, 2023
@ozgrakkurt
Copy link
Author

ozgrakkurt commented Feb 13, 2023

@Velfi Velfi added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Feb 13, 2023
@Velfi
Copy link
Contributor

Velfi commented Feb 13, 2023

@ozgrakkurt Thanks for submitting this. Are you actually encountering this error in production right now?

@ozgrakkurt
Copy link
Author

Hey, yes. We get this regularly

@vac-adb
Copy link

vac-adb commented May 5, 2023

I can reproduce it easily with the following code.
Note that 60s sleep duration between calls is important (there was no issue with 10s sleep).

Version
aws-config = "0.55.1"
aws-sdk-cognitoidentityprovider = "0.26.0"

use aws_sdk_cognitoidentityprovider as cognitoidentityprovider;
use tokio;
use std::thread;
use std::time::Duration;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let config = aws_config::load_from_env().await;
    let client = cognitoidentityprovider::Client::new(&config);
    let username = "fake_user";
    let password ="fake_password";
    let client_id = "fake_client";
    for i in 1..4 {
        let cognito_result = client.initiate_auth().set_client_id(Some(client_id.to_string())).auth_flow(cognitoidentityprovider::types::AuthFlowType::UserPasswordAuth).auth_parameters("USERNAME", username).auth_parameters("PASSWORD", password).send().await;
        println!("Result {:}: {:?}", i, cognito_result);
        thread::sleep(Duration::from_secs(60));
    }
}

Results:

  1. ✅ Err(ServiceError(ServiceError { ... { message: Some("User pool client fake_client does not exist.")
  2. Err(DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(None), source: hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }), connection: Unknown } }))
  3. ✅ Err(ServiceError(ServiceError { ... { message: Some("User pool client fake_client does not exist.")

@rcoh
Copy link
Contributor

rcoh commented May 5, 2023

ah perfect! thanks for the reproducer. it's a quick fix but I wasn't able to reproduce it in tests so I wasn't able to figure out if my fix actually worked. We'll get this prioritized and fixed—or we're happy to accept a PR, the relevant code is here:
https://github.com/awslabs/smithy-rs/blob/main/rust-runtime/aws-smithy-client/src/hyper_ext.rs#LL207C1-L224C2

If we correctly classify that error is kind: Io it should be fixed

rcoh added a commit to smithy-lang/smithy-rs that referenced this issue Sep 5, 2023
This PR adds two additional classes of retries tested:
1. GO_AWAY: awslabs/aws-sdk-rust#738
2. REFUSED_STREAM: awslabs/aws-sdk-rust#858

I tested 1 by using the example helpfully provided. The fix for 2 is untested and difficult to reproduce but since my fix for 1 worked, I'm confident that we're detecting the correct error class here.
rcoh added a commit to smithy-lang/smithy-rs that referenced this issue Sep 7, 2023
This PR adds two additional classes of retries tested:
1. GO_AWAY: awslabs/aws-sdk-rust#738
2. REFUSED_STREAM: awslabs/aws-sdk-rust#858

I tested 1 by using the example helpfully provided. The fix for 2 is untested and difficult to reproduce but since my fix for 1 worked, I'm confident that we're detecting the correct error class here.
@hxk1633
Copy link

hxk1633 commented Sep 12, 2023

I get a similar error when trying to invoke dynamodb in Rust AWS SDK:
DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(None), source: NoMatchingAuthScheme, connection: Unknown } })
Not sure what this error means...

@rcoh
Copy link
Contributor

rcoh commented Sep 12, 2023 via email

rcoh added a commit to smithy-lang/smithy-rs that referenced this issue Sep 12, 2023
This PR adds two additional classes of retries tested:
1. GO_AWAY: awslabs/aws-sdk-rust#738
2. REFUSED_STREAM: awslabs/aws-sdk-rust#858

I tested 1 by using the example helpfully provided. The fix for 2 is untested and difficult to reproduce but since my fix for 1 worked, I'm confident that we're detecting the correct error class here.
github-merge-queue bot pushed a commit to smithy-lang/smithy-rs that referenced this issue Sep 12, 2023
Draft pull request pending merge of #2970 
## Motivation and Context
- awslabs/aws-sdk-rust#738
- awslabs/aws-sdk-rust#858
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here -->

## Description
<!--- Describe your changes in detail -->
This PR adds two additional classes of retries tested:
1. GO_AWAY: awslabs/aws-sdk-rust#738
2. REFUSED_STREAM: awslabs/aws-sdk-rust#858

I tested 1 by using the example helpfully provided. The fix for 2 is
untested and difficult to reproduce but since my fix for 1 worked, I'm
confident that we're detecting the correct error class here.

## Testing
I used the example provided by the customer and validated the H2 GO_AWAY
fix.

## Checklist
<!--- If a checkbox below is not applicable, then please DELETE it
rather than leaving it unchecked -->
- [ ] I have updated `CHANGELOG.next.toml` if I made changes to the
smithy-rs codegen or runtime crates
- [ ] I have updated `CHANGELOG.next.toml` if I made changes to the AWS
SDK, generated SDK code, or SDK runtime crates

----

_By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice._
@ferologics
Copy link

ferologics commented Nov 8, 2023

we're seeing this issue daily in prod for aws-sdk-apigatewaymanagement::apigatewaymanagement_client.post_to_connection call (sending a websocket connection meesage):

Warning
14:33:15
API Gateway Management message send error with error: Other(ErrorString("connection_id: ConnectionId(\"OFT8NePuvHcCGqQ=\"), error: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(None), source: hyper::Error(Http2, Error { kind: GoAway(b\"\", NO_ERROR, Remote) }), connection: Unknown } }), request_id: 'OFVWzGGVPHcF98w='"))

Warning
14:33:15
aws_smithy_runtime::client::http::hyper_014
unrecognized error from Hyper. If this error should be retried, please file an issue.
{
err: http2 error: connection error received: not a result of an error: connection error received: not a result of an error (hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))
}
image

@rcoh
Copy link
Contributor

rcoh commented Nov 24, 2023

It looks like the fix for this was merged but then accidentally removed as part of a merge. We'll get a fix out soon

@jdisanti
Copy link
Contributor

jdisanti commented Dec 4, 2023

The fix has been merged and will go out in a future release: smithy-lang/smithy-rs#3250

@jdisanti jdisanti added the pending-release This issue will be fixed by an approved PR that hasn't been released yet. label Dec 4, 2023
@ferologics
Copy link

@jdisanti thx for the update. curiously, we stopped receiving this error, and are not sure what changed. one theory is that it was an AWS service bug that got fixed. we also wonder whether this error should be retried, after reading this particular post (we use ELB)

@rcoh
Copy link
Contributor

rcoh commented Dec 4, 2023 via email

@ferologics
Copy link

ferologics commented Dec 4, 2023

@rcoh we did run cargo update on two occasions, Nov 8th (bump from 0.56.1 to 0.57.1), and Nov 30th (bump from 0.57.1 to 1.0.1), so this is possible.

@jdisanti
Copy link
Contributor

jdisanti commented Dec 6, 2023

This fix for this went out in the December 5, 2023 release.

@jdisanti jdisanti closed this as completed Dec 6, 2023
Copy link

github-actions bot commented Dec 6, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.
Projects
None yet
Development

No branches or pull requests

7 participants