-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Artificially Low Chunk Size Limits from GCS + Azure Blob Stores #59279
Conversation
Removing these limits as they cause unnecessarily many object in the blob stores. We do not have to worry about BwC of this change since we do not support any 3rd party implementations of Azure or GCS. Also, since there is no valid reason to set a different than the default maximum chunk size at this point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting from the docs. Closes elastic#56018
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@@ -162,12 +162,6 @@ The Azure repository supports following settings: | |||
Specifies the path within container to repository data. Defaults to empty | |||
(root directory). | |||
|
|||
`chunk_size`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still keep the docs for chunk_size. Perhaps we can mention that lowering this value can result in much more files in the repo. Still, it's a valid option and should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, it's a valid option and should be documented.
Is it though? Setting this option has no advantages whatsoever doesn't it? And we had a few cases both internal and external where people mistook this option as some sort of buffer size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting this option has no advantages whatsoever doesn't it?
In the searchable snapshot case, it allows parallel downloads of file chunks. Not saying that that's a good use case, just that there might be some that we don't know about.
And we had a few cases both internal and external where people mistook this option as some sort of buffer size.
This could be fixed by improving the documentation :)
I prefer full transparency on this setting (saying what it's default is and why), and actively recommend against tuning it, but not hide it from the docs (as folks might otherwise set it based on old docs / S3 docs and have completely wrong expectations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I pushed f45a4f7 :)
@@ -226,12 +226,6 @@ The following settings are supported: | |||
Specifies the path within bucket to repository data. Defaults to | |||
the root of the bucket. | |||
|
|||
`chunk_size`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
(test are broken because they are broken in |
Specify the chunk size as a value and unit, for example: | ||
`10MB`, `5KB`, `500B`. Defaults to `64MB` (64MB max). | ||
`10MB`, `5KB`, `500B`. Defaults to the maximum size of a blob in the Azure blob store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's spell out what the value is (same for GCS)
Thanks Yannick! |
elastic#59279) Removing these limits as they cause unnecessarily many object in the blob stores. We do not have to worry about BwC of this change since we do not support any 3rd party implementations of Azure or GCS. Also, since there is no valid reason to set a different than the default maximum chunk size at this point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting from the docs. Closes elastic#56018
#59279) (#59564) Removing these limits as they cause unnecessarily many object in the blob stores. We do not have to worry about BwC of this change since we do not support any 3rd party implementations of Azure or GCS. Also, since there is no valid reason to set a different than the default maximum chunk size at this point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting from the docs. Closes #56018
Removing these limits as they cause unnecessarily many object in the blob stores.
We do not have to worry about BwC of this change since we do not support any 3rd party
implementations of Azure or GCS.
Also, since there is no valid reason to set a different than the default maximum chunk size at this
point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting
from the docs.
Closes #56018