Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First chunk of StorageStreamDownloader.chunks() is always 32MB big #15648

Closed
mth-cbc opened this issue Dec 4, 2020 · 5 comments
Closed

First chunk of StorageStreamDownloader.chunks() is always 32MB big #15648

mth-cbc opened this issue Dec 4, 2020 · 5 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Storage Storage Service (Queues, Blobs, Files)

Comments

@mth-cbc
Copy link

mth-cbc commented Dec 4, 2020

  • Package: azure-storage-blob
  • Package Version: 12.6.0
  • Operating System: Ubuntu 18.04
  • Python Version: 3.8.0

Describe the bug
When using the StorageStreamDownloaders chunk interator to download/iterator offer a blob the first chunk is always 32*1024*1024 bytes big, regardless of the setting for max_chunk_get_size.

To Reproduce
Steps to reproduce the behavior:

  1. instantiate a BlobClient for a file bigger than 32 MB
  2. get a StorageStreamDownloader for the blob: stream = blob_client.download_blob()
  3. iterate over the blob using the chunks() function and print the chunk size:
for chunk in stream.chunks():
    print(len(chunk))

Expected behavior
Each chunk, apart from the last one, should have the size given by the max_chunk_get_size parameter, so by default 4*1024*1024 bytes.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 4, 2020
@xiangyan99 xiangyan99 added Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) labels Dec 15, 2020
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 15, 2020
@tasherif-msft tasherif-msft self-assigned this Dec 16, 2020
@xiafu-msft xiafu-msft added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Dec 17, 2020
@xiafu-msft
Copy link
Contributor

xiafu-msft commented Dec 17, 2020

Hi @mth-cbc

Thanks so much for reporting this issue, it's an SDK bug.
Currently you can use this workaround by setting max_single_get_size=your expected chunk size, then the first chunk size should be changed.

Let me know if you have any concern! Sorry about the inconvenience.

@xiafu-msft xiafu-msft removed the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label Dec 17, 2020
@mth-cbc
Copy link
Author

mth-cbc commented Dec 17, 2020

Hi @xiafu-msft

Thanks for your answer, but I observe the same behaviour when specifing an explicit chunk size via max_chunk_get_size.

@xiafu-msft
Copy link
Contributor

xiafu-msft commented Dec 17, 2020

Sorry I had a typo there! I mean max_single_get_size, updated the previous comment. set max_chunk_get_size and max_single_get_size to the same value will give you chunks with equal size when you call chunks()

@mth-cbc
Copy link
Author

mth-cbc commented Dec 17, 2020

Thanks for the workaround, now it works as expected.

@tasherif-msft
Copy link
Contributor

This pr #17559 will resolve this issue in the future :)
I will go ahead and close this issue!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

4 participants