-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3: force get_object to return Vector{UInt8}? #224
Comments
This functions the same way as Line 166 in 6dcae89
Maybe we change the |
IMO the "right" default is to return the raw file contents, but it's less disruptive to default to the existing method. But I think changing |
Boto3 returns a dict with metadata and the readable stream under the 'Body' key. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object What's the source of the difference here? |
Found the difference: https://github.com/boto/botocore/blob/develop/botocore/data/s3/2006-03-01/service-2.json#L3860 Boto3 respects the input shape and output shape specifications, while AWS.jl applies a heuristic to determine how to return data. In the long term, we can also use the shape specifications. In the short term, I think AWSS3.jl should be made able to provide the interface that you want, and AWS.jl should just need to be configurable to facilitate that. |
Yeah, I don't think AWS.jl is the right place to have fancy heuristics for the return type. IMO a higher layer should deal with that.
Forgive my ignorance but what's the "shape" refer to? Is it the set of keys To the extent that the MIME type, etc, is stored explicitly on the S3 side, interpreting that into a Julia type seems somewhat reasonable. But I think there's many problems in mapping serialized data to an in-memory representation, primarily because there's a many-to-many mapping between the serialized form and the julia type system. Actually I've been thinking very hard about this exact problem in DataSets.jl: https://github.com/JuliaComputing/DataSets.jl and I think I'm closing in on some kind of solution there. (It involves recognizing that data on disk has a kind of ad-hoc structural type system, that this can be mapped into Julia types in many ways, and that the mapping involves many choices which could have defaults but must ultimately be configurable.) Back to the issue at hand, I think AWS.jl can't solve the mapping problem systematically unless it wants to depend on the wide variety of libraries used for parsing all the various MIME types (eg, I assume we'll never depend on PNGFiles.jl just so we can parse So I think it'll always be limited to a few "special" types like JSON. But in that case, it's better to default to |
The |
Just wanted to pop-in and acknowledge this has not been forgotten. It seems that the proposed change of At a future date I can make this change, unless someone else would like to take the lead. In which case I can advise and do the code review on it going forward. It should be a relatively simple change to make. |
In the context of this issue this might help others: Specify
|
384: Use `AWS.Response` to handle streaming/raw/parsing r=omus a=omus The idea is to use a struct in AWS.jl that can be used to handle the automatic parsing that is currently used to turn XML/JSON into a dict while also giving the option of accessing the raw output as a string or stream without all the keywords currently needed to be specified. Depends on: - #457 Related: - #346 - #348 - #438 Closes: - #224 - #433 - #468 (when using `use_response_type` the passed in I/O is not closed) - #471 - #433 Update: The tests in this PR run using the deprecated behaviour. Mainly I did that here to prove this change is non-breaking. For the updated tests see #458 Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
As of AWS.jl 1.63.0 (#384) you can now do: @service S3 use_response_type=true
response = S3.get_object("bucket","file.json")
response.body::Vector{UInt8} |
The return type of
get_object
seems to depend on what type of file you're looking for. In the case of strings vs bytes it's not so bad, but IMO getting a Dict or Vector for a JSON file is not great, since you lose all formatting (and who says I want it parsed anyways?). Is this behaviour intended? Is there a way to always get a byte vector?The text was updated successfully, but these errors were encountered: