-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
md5 hash with upload_fileobj. #845
Comments
Yeah so the md5 is only equal to the ETag under certain circumstances (i.e. non multipart upload). Boto3 does not integrity check the md5 of the entire file for multipart uploads, but it does send an md5 header for each part that is uploaded when doing a multipart upload such that if there is an md5 mismatch in any of the parts, it will retry the request until it is correct. That is probably the best we can do while still doing the transfer efficiently because determining the MD5 and doing integrity checking would require streaming the entire file upfront into memory. How large are the files you are uploading? If they are small enough you may be able to increase the multipart threshold so multipart uploads are not used if md5 checking of each individual part is not sufficient. |
Thanks for replying. I think we should be good if library is matching md5's of the individual |
You can turn on debug logs by adding: |
Thanks for adding the logging support. |
Hi,
Does upload_fileobj take care of making sure the md5 of
the file being uploaded matches the md5 of the uploaded file in s3
once upload is over ?
I see since upload_fileobj is done in multipart md5 hash in etag is of the format hash-2
and differs from the original md5 of the file.
How to make sure md5 of the file being uploaded matches the md5 of the file in s3 efficiently.
If upload_fileobj makes sure of taking care of integrity then application can safely assume
that object went to S3 and don't have to implement anything to match the md5's.
The text was updated successfully, but these errors were encountered: