-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 get_object() call sometimes returns partial response #1079
Comments
Thank you for reporting this. Can you enable trace debug logging, reproduce the issue, and then post the logs here? The log will probably be our next best bet without a reproducer. |
Was finally able to capture some TRACE-level logs around the time we observe the partial response. |
Do you have a sense of what "sometimes" is here? 1%? 0.1%? 0.001%? Trying to figure out what we might need to try to do to reproduce this on our side. Are you connecting directly to S3? Is it possible there is an intermediate proxy somewhere that may be truncating the stream? It seems like at a minimum, the SDK should be doing the validation you're doing so we can return an error if this case is detected. I'm going to keep digging into this and I'll let you know what I find. |
Also, I'd suggest turning on request checksums in your get_object() call:
|
Are you doing anything notable when constructing your client? I've been unable to reproduce this failure mode—we actually have existing tests for this behavior.
From the logs, I can see you are using HTTP 1 which exactly matches the scenario we're testing. We can add an additional safeguard to error out in this case—I think that would be prudent anyway. |
We don't have good metrics on this, but I'll estimate this somewhere at <0.1%.
These processes are running in EC2 and hitting S3 directly. There's very little infrastructure in the way -- no ECS/EKS, no proxies, about as simple as you can get. Additionally, our VPCs and route tables are configured with the S3 Gateway Endpoints. Notably, I think we typically see these "partial responses" with cross-region S3 calls. E.g. the log I posted above was an EC2 instance in eu-central-1 making a GetObject call to a us-east-1 bucket. This might be significant.
Excellent callout. We'll add this.
Nothing specifically. Our code looks like this: let sdk_config = aws_config::defaults(BehaviorVersion::latest())
.region(Region::new(region.clone()))
.load()
.await;
let s3_client = Client::new(&sdk_config); Perhaps moderately interesting, we do instantiate a fresh tokio runtime essentially per GetObject attempt. Something analogous to the following: fn some_synchronous_function_which_is_called_infrequently(&self, s3_file_reference: &S3FileReference) -> Result<Value> {
tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()?
.block_on(async move {
self.fetch_object(&s3_file_reference).await
})
}
Yeah, our own retry logic is able to resolve the issue. That is, our code makes the same exact GetObject call, using the same S3Client, in the exact same tokio runtime, and the second call succeeds. Thus, I would guess that reproducing it is tricky. |
That is almost certainly the problem. Because the rust client contains a connection pool, weird things happen when you drop the runtime that's running the futures. I'm working with Sean to fix this on Hyper's side but I'm also adding a middleware to the Rust SDK to validate content length. In the mean time, share the runtime between clients :-) |
Fix here: smithy-lang/smithy-rs#3491 |
…3491) ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If it fixes an open issue, please link to the issue here --> There is a rarely-triggered bug (awslabs/aws-sdk-rust#1079) that can occur when the runtime is dropped between requests. Although this is definitely the _wrong thing to do_(tm) which should still aim to at least protect the users from bad data in this case. This adds an interceptor which validates that the body returned is the correct length. ## Description - Adds an interceptor that computes the actual content length of the body and compares it to the actual length ## Testing - Integration style test. Note that this is very hard to test using Hyper because Hyper will attempt to mitigate this issue. ## Checklist <!--- If a checkbox below is not applicable, then please DELETE it rather than leaving it unchecked --> - [x] I have updated `CHANGELOG.next.toml` if I made changes to the smithy-rs codegen or runtime crates ---- _By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice._
Comments on closed issues are hard for our team to see. |
Describe the bug
We are occasionally seeing S3 get_object() calls return incomplete data.
Example error:
Expected Behavior
get_object()
should return the full object.Current Behavior
get_object()
sometimes returns incomplete data.Reproduction Steps
I do not have a minimum repro script available right now.
Possible Solution
No response
Additional Information/Context
No response
Version
Environment details (OS name and version, etc.)
Ubuntu 22.04
Logs
No response
The text was updated successfully, but these errors were encountered: