-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
botocore no longer populates the Content-MD5
header leading to MissingContentMD5
error
#931
Comments
Thanks for bringing this to my attention. Is this against AWS, or another implementation of S3? If yes, how are you expected to delete files now? |
No, I'm using S3FS to interact with an internal Minio instance (and to be honest I don't know enough about AWS/S3 to answer the follow up - it just appears to me to be a potentially very impactful change in behaviour). Just to follow-up, have tried to look though what I believe to be the offending commit (here) and perhaps |
OK, so I gather AWS must have switched to CRC and minio (maybe depending on deployment version) has not. The doc suggests that changing the value of client_config.request_checksum_calculation to "when_supported" in the config (or the AWS_REQUEST_CHECKSUM_CALCULATION env variable) will only affect whether the CRC is calculated, never MD5, where all the associated code is marked deprecated. Maybe still worth a try? |
Upstream Minio issue: minio/minio#20845 |
We’re running into a similar issue, though it’s slightly different:
It looks like this is a breaking change in
Would this be something that can be fixed in |
# /// script
# requires-python = ">=3.9"
# dependencies = [
# "pandas",
# "s3fs",
# "botocore==1.36",
# ]
# ///
import s3fs
import pandas as pd
s3 = s3fs.S3FileSystem(profile="my-profile")
df = pd.DataFrame({"my_col":[1, 2, 3]})
df.to_csv("/tmp/test_df.csv")
s3.put("/tmp/test_df.csv", "s3://my-bucket/my-prefix/test_df.csv")
# when botocore<1.36:
# ,my_col
# 0,1
# 1,2
# 2,3
# when botocore==1.36.0
# 14
# ,my_col
# 0,1
# 1,2 Essentially there is some kind of data corruption by a random string (or number?) being put at the top of my csv. In this case 14. (ran the above as a PEP 722 script using |
As far as I know, the only solution currently is to downgrade botocore. I don't know if there's any scope for s3fs to add extra headers to add extra headers, since the values are calculated on the finished HTTP request after control has passed to botocore. Unfortunately, it doesn't seem like botocore is interested in maintaining compatibility, since they explicitly target AWS. Having said that, I'm surprised to see PutObject implicated too - either with the client error (which seems to be the same issue) or data corruption (which may well be something else). In the case of PutObject, we do always know the length of the body beforehand, so we can pass it explicitly if we know the header key required. |
Perhaps someone can do a trace to see how the calls differ between the new and old botocore? I have another emergency I need to deal with today... |
@martindurant, I think changes made to checksum in PR boto/botocore#3271 are likely causing this issue. Setting environment variable AWS_REQUEST_CHECKSUM_CALCULATION to WHEN_REQUIRED might address the issue. |
@boringbyte , I don't think so. In fact, "required" is the default; setting it to the more general "when_available" doesn't help either, though, since it still produces a CRC rather than the previous behaviour with MD5. |
Just to follow up, updating Minio to the latetst version (RELEASE.2025-01-20T14-49-07Z) resolved the issue for me. I therefore think this can be closed as this is an upstream boto / minio issue. Thank you |
I'll leave it open for now as the ecosystem catches up - and maybe someone comes up with a way to inject those headers for older deployments. |
It seems to me there is a way to disable this behaviour according to the issue on botocore: boto/boto3#4398 (comment) Is it not possible for us to pass in some kind of extra config to enable this? |
That config can be changed via environment variable ( #931 (comment) ), so please do try it! |
Environment variables are fine and dandy, but it seems like a limited solution to need to know about and set an env var in every place this might be running. Plus, not all of us are here because we use s3fs directly - in my case it's because pyiceberg relies on s3fs. It would be much more effective imo for us and other libs using s3fs to be able to set a flag directly in our code that carries across to all environments. |
The question is: does this workaround solve the problem? If yes, we can work out how to expose it programatically. |
@martindurant I can confirm adding the environment variable fixes the problem.
|
Thanks for testing. request_checksum_calculation appears in the botocore config (https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html ), so I would try passing it using client_kwargs or config_kwargs to s3fs. Assuming one/both of those works, then I suppose we are done: we have a workaround. However, we might still try to make this more prominent, provide extra documentation or try to catch that exact exception and provide remediation instructions. |
It works with Side note: Unfortunately, using |
OK, so we certainly can't make this default. What is the opinion here, is this thread enough to get people working again? Do we need a documentation note somewhere? |
Wanted to update that this fixes my data corruption issue posted about above: #931 (comment) I think it would be very beneficial if |
Couldn't we programmatically add that in the config kwargs by if we see that
|
Is it not the case that this config should not be set in the case that the endpoint is real AWS? |
Hello,
As of version 1.36.0, botocore no longer populates the
Content-MD5
header (see changelog entry here). This change was subsequently merged into aiobotocore as of version 2.18 (see commit here).Practically, this now seems to mean that when I try to perform a delete operation on an S3FS file system I receive the following error:
So far my only work around is to pin
aiobotocore < 2.18
. I am using the latest S3FS (2024.12.0).Thanks
The text was updated successfully, but these errors were encountered: