Remove Artificially Low Chunk Size Limits from GCS + Azure Blob Stores #59279

original-brownbear · 2020-07-09T08:19:20Z

Removing these limits as they cause unnecessarily many object in the blob stores.
We do not have to worry about BwC of this change since we do not support any 3rd party
implementations of Azure or GCS.
Also, since there is no valid reason to set a different than the default maximum chunk size at this
point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting
from the docs.

Closes #56018

Removing these limits as they cause unnecessarily many object in the blob stores. We do not have to worry about BwC of this change since we do not support any 3rd party implementations of Azure or GCS. Also, since there is no valid reason to set a different than the default maximum chunk size at this point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting from the docs. Closes elastic#56018

elasticmachine · 2020-07-09T08:19:22Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2020-07-14T09:12:39Z

@ywelsch @tlrx I only did a quick manual test for this and it seems everything works just fine for larger blobs still. I guess our benchmarking efforts will stress test this anyway with GB-scale files so we should be on the safe side here?

ywelsch · 2020-07-14T11:11:01Z

docs/plugins/repository-azure.asciidoc

@@ -162,12 +162,6 @@ The Azure repository supports following settings:
    Specifies the path within container to repository data. Defaults to empty
    (root directory).

-`chunk_size`::


I think we should still keep the docs for chunk_size. Perhaps we can mention that lowering this value can result in much more files in the repo. Still, it's a valid option and should be documented.

Still, it's a valid option and should be documented.

Is it though? Setting this option has no advantages whatsoever doesn't it? And we had a few cases both internal and external where people mistook this option as some sort of buffer size.

Setting this option has no advantages whatsoever doesn't it?

In the searchable snapshot case, it allows parallel downloads of file chunks. Not saying that that's a good use case, just that there might be some that we don't know about.

And we had a few cases both internal and external where people mistook this option as some sort of buffer size.

This could be fixed by improving the documentation :)

I prefer full transparency on this setting (saying what it's default is and why), and actively recommend against tuning it, but not hide it from the docs (as folks might otherwise set it based on old docs / S3 docs and have completely wrong expectations).

Makes sense, I pushed f45a4f7 :)

ywelsch · 2020-07-14T11:11:23Z

docs/plugins/repository-gcs.asciidoc

@@ -226,12 +226,6 @@ The following settings are supported:
    Specifies the path within bucket to repository data. Defaults to
    the root of the bucket.

-`chunk_size`::


original-brownbear · 2020-07-14T14:33:47Z

(test are broken because they are broken in master currently ...)

ywelsch · 2020-07-14T14:48:57Z

docs/plugins/repository-azure.asciidoc

    Specify the chunk size as a value and unit, for example:
-    `10MB`, `5KB`, `500B`. Defaults to `64MB` (64MB max).
+    `10MB`, `5KB`, `500B`. Defaults to the maximum size of a blob in the Azure blob store.


let's spell out what the value is (same for GCS)

original-brownbear · 2020-07-14T19:34:42Z

Thanks Yannick!

elastic#59279) Removing these limits as they cause unnecessarily many object in the blob stores. We do not have to worry about BwC of this change since we do not support any 3rd party implementations of Azure or GCS. Also, since there is no valid reason to set a different than the default maximum chunk size at this point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting from the docs. Closes elastic#56018

#59279) (#59564) Removing these limits as they cause unnecessarily many object in the blob stores. We do not have to worry about BwC of this change since we do not support any 3rd party implementations of Azure or GCS. Also, since there is no valid reason to set a different than the default maximum chunk size at this point, removing the documentation (which was incorrect in the case of Azure to begin with) for the setting from the docs. Closes #56018

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.9.0 labels Jul 9, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 9, 2020

original-brownbear requested review from tlrx and ywelsch July 14, 2020 09:10

ywelsch reviewed Jul 14, 2020

View reviewed changes

original-brownbear requested a review from ywelsch July 14, 2020 13:40

original-brownbear added 3 commits July 14, 2020 15:56

Merge remote-tracking branch 'elastic/master' into 56018

76a3b0b

Merge remote-tracking branch 'elastic/master' into 56018

076281a

CR: docs

f45a4f7

Merge remote-tracking branch 'elastic/master' into 56018

a0cd339

ywelsch approved these changes Jul 14, 2020

View reviewed changes

CR: spell out defaults

a63e5ce

original-brownbear merged commit 60e0b46 into elastic:master Jul 14, 2020

original-brownbear deleted the 56018 branch July 14, 2020 19:34

original-brownbear mentioned this pull request Jul 14, 2020

Remove Artificially Low Chunk Size Limits from GCS + Azure Blob Stores (#59279) #59564

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Artificially Low Chunk Size Limits from GCS + Azure Blob Stores #59279

Remove Artificially Low Chunk Size Limits from GCS + Azure Blob Stores #59279

original-brownbear commented Jul 9, 2020

elasticmachine commented Jul 9, 2020

original-brownbear commented Jul 14, 2020

ywelsch Jul 14, 2020

original-brownbear Jul 14, 2020

ywelsch Jul 14, 2020

original-brownbear Jul 14, 2020

ywelsch Jul 14, 2020

original-brownbear commented Jul 14, 2020

ywelsch Jul 14, 2020

original-brownbear commented Jul 14, 2020

Remove Artificially Low Chunk Size Limits from GCS + Azure Blob Stores #59279

Remove Artificially Low Chunk Size Limits from GCS + Azure Blob Stores #59279

Conversation

original-brownbear commented Jul 9, 2020

elasticmachine commented Jul 9, 2020

original-brownbear commented Jul 14, 2020

ywelsch Jul 14, 2020

Choose a reason for hiding this comment

original-brownbear Jul 14, 2020

Choose a reason for hiding this comment

ywelsch Jul 14, 2020

Choose a reason for hiding this comment

original-brownbear Jul 14, 2020

Choose a reason for hiding this comment

ywelsch Jul 14, 2020

Choose a reason for hiding this comment

original-brownbear commented Jul 14, 2020

ywelsch Jul 14, 2020

Choose a reason for hiding this comment

original-brownbear commented Jul 14, 2020