Add bucket index support to store gateway (cortexproject#3625)

* Integrated bucket index in store-gateway Signed-off-by: Marco Pracucci <marco@pracucci.com> * Unit tested BucketIndexMetadataFetcher Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed bucketindex unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved store-gateway unit tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upated CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed doc and comments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Log even the case the bucket index does not exist Signed-off-by: Marco Pracucci <marco@pracucci.com> * Do not track failure if bucket index does not exist when reading it Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added cortex_bucket_blocks_partials_count metric exported by compactor Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved error handling Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added missing doc image Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated code comment Signed-off-by: Marco Pracucci <marco@pracucci.com>
roystchiang · Jan 5, 2021 · bb28fb5 · bb28fb5
1 parent 739d3f0
commit bb28fb5
Show file tree

Hide file tree

Showing 35 changed files with 1,145 additions and 281 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,15 +6,16 @@
 * [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
 * [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via `-compactor.block-deletion-marks-migration-enabled=false` once new compactor has successfully started once in your cluster. #3583
 * [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a `|` character in the `X-Scope-OrgID` request header. This is an experimental feature, which can be enabled by setting `-tenant-federation.enabled=true` on all Cortex services. #3250
-* [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers and store-gateways. The bucket index is updated by the compactor during blocks cleanup, on every `-compactor.cleanup-interval`. #3553 #3555 #3561 #3583
-* [ENHANCEMENT] Blocks storage: introduced an option `-blocks-storage.bucket-store.bucket-index.enabled` to enable the usage of the bucket index in the querier. When enabled, the querier will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics have been added: #3614
+* [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers and store-gateways. The bucket index is updated by the compactor during blocks cleanup, on every `-compactor.cleanup-interval`. #3553 #3555 #3561 #3583 #3625
+* [ENHANCEMENT] Blocks storage: introduced an option `-blocks-storage.bucket-store.bucket-index.enabled` to enable the usage of the bucket index in the querier and store-gateway. When enabled, the querier and store-gateway will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier: #3614 #3625
   * `cortex_bucket_index_loads_total`
   * `cortex_bucket_index_load_failures_total`
   * `cortex_bucket_index_load_duration_seconds`
   * `cortex_bucket_index_loaded`
-* [ENHANCEMENT] Compactor: exported the following metrics. #3583
-  * `cortex_bucket_blocks_count`: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion.
+* [ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
+  * `cortex_bucket_blocks_count`: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.
   * `cortex_bucket_blocks_marked_for_deletion_count`: Total number of blocks per tenant marked for deletion in the bucket.
+  * `cortex_bucket_blocks_partials_count`: Total number of partial blocks.
   * `cortex_bucket_index_last_successful_update_timestamp_seconds`: Timestamp of the last successful update of a tenant's bucket index.
 * [ENHANCEMENT] Ruler: Add `cortex_prometheus_last_evaluation_samples` to expose the number of samples generated by a rule group per tenant. #3582
 * [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575

diff --git a/docs/blocks-storage/bucket-index.md b/docs/blocks-storage/bucket-index.md
@@ -5,18 +5,18 @@ weight: 5
 slug: bucket-index
 ---
 
-The bucket index is a **per-tenant file containing the list of blocks and block deletion marks** in the storage. The bucket index itself is stored in the backend object storage, is periodically updated by the compactor and used by queriers to discover blocks in the storage.
+The bucket index is a **per-tenant file containing the list of blocks and block deletion marks** in the storage. The bucket index itself is stored in the backend object storage, is periodically updated by the compactor, and used by queriers and store-gateways to discover blocks in the storage.
 
 The bucket index usage is **optional** and can be enabled via `-blocks-storage.bucket-store.bucket-index.enabled=true` (or its respective YAML config option).
 
 ## Benefits
 
-The [querier](./querier.md) needs to have an almost up-to-date view over the entire storage bucket, in order to find the right blocks to lookup at query time. Because of this, querier needs to periodically scan the bucket to look for new blocks uploaded by ingester or compactor, and blocks deleted (or marked for deletion) by compactor.
+The [querier](./querier.md) and [store-gateway](./store-gateway.md) need to have an almost up-to-date view over the entire storage bucket, in order to find the right blocks to lookup at query time (querier) and load block's [index-header](./binary-index-header.md) (store-gateway). Because of this, they need to periodically scan the bucket to look for new blocks uploaded by ingester or compactor, and blocks deleted (or marked for deletion) by compactor.
 
-When this bucket index is enabled, the querier periodically look up the per-tenant bucket index instead of scanning the bucket via "list objects" operations. This brings few benefits:
+When the bucket index is enabled, the querier and store-gateway periodically look up the per-tenant bucket index instead of scanning the bucket via "list objects" operations. This brings few benefits:
 
-1. Reduced number of API calls to the object storage by querier
-2. No "list objects" storage API calls done by querier
+1. Reduced number of API calls to the object storage by querier and store-gateway
+2. No "list objects" storage API calls done by querier and store-gateway
 3. The [querier](./querier.md) is up and running immediately after the startup (no need to run an initial bucket scan)
 
 ## Structure of the index
@@ -42,7 +42,7 @@ The [querier](./querier.md), at query time, checks whether the bucket index for
 
 _Given it's a small file, lazy downloading it doesn't significantly impact on first query performances, but allows to get a querier up and running without pre-downloading every tenant's bucket index. Moreover, if the [metadata cache](./querier.md#metadata-cache) is enabled, the bucket index will be cached for a short time in a shared cache, reducing the actual latency and number of API calls to the object storage in case multiple queriers will fetch the same tenant's bucket index in a short time._
 
-![Querier - Bucket index](/images/blocks-storage/bucket-index-querier-logic.png)
+![Querier - Bucket index](/images/blocks-storage/bucket-index-querier-workflow.png)
 <!-- Diagram source at https://docs.google.com/presentation/d/1bHp8_zcoWCYoNU2AhO2lSagQyuIrghkCncViSqn14cU/edit -->
 
 While in-memory, a background process will keep it **updated at periodic intervals**, so that subsequent queries from the same tenant to the same querier instance will use the cached (and periodically updated) bucket index. There are two config options involved:
@@ -55,3 +55,7 @@ While in-memory, a background process will keep it **updated at periodic interva
 If a bucket index is unused for a long time (configurable via `-blocks-storage.bucket-store.bucket-index.idle-timeout`), e.g. because that querier instance is not receiving any query from the tenant, the querier will offload it, stopping to keep it updated at regular intervals. This is particularly for tenants which are resharded to different queriers when [shuffle sharding](../guides/shuffle-sharding.md) is enabled.
 
 Finally, the querier, at query time, checks how old is a bucket index (based on its `updated_at`) and fail a query if its age is older than `-blocks-storage.bucket-store.bucket-index.max-stale-period`. This circuit breaker is used to ensure queriers will not return any partial query results due to a stale view over the long-term storage.
+
+## How it's used by the store-gateway
+
+The [store-gateway](./store-gateway.md), at startup and periodically, fetches the bucket index for each tenant belonging to their shard and uses it as the source of truth for the blocks (and deletion marks) in the storage. This removes the need to periodically scan the bucket to discover blocks belonging to their shard.
diff --git a/docs/blocks-storage/compactor.md b/docs/blocks-storage/compactor.md
@@ -10,7 +10,7 @@ slug: compactor
 The **compactor** is an service which is responsible to:
 
 - Compact multiple blocks of a given tenant into a single optimized larger block. This helps to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).
-- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) to discover new blocks in the storage.
+- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) and [store-gateways](./store-gateway.md) to discover new blocks in the storage.
 
 The compactor is **stateless**.
 

diff --git a/docs/blocks-storage/compactor.template b/docs/blocks-storage/compactor.template
@@ -10,7 +10,7 @@ slug: compactor
 The **compactor** is an service which is responsible to:
 
 - Compact multiple blocks of a given tenant into a single optimized larger block. This helps to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).
-- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) to discover new blocks in the storage.
+- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) and [store-gateways](./store-gateway.md) to discover new blocks in the storage.
 
 The compactor is **stateless**.
 

diff --git a/docs/blocks-storage/querier.md b/docs/blocks-storage/querier.md
@@ -365,8 +365,9 @@ blocks_storage:
     # CLI flag: -blocks-storage.bucket-store.sync-dir
     [sync_dir: <string> | default = "tsdb-sync"]
 
-    # How frequently scan the bucket to look for changes (new blocks shipped by
-    # ingesters and blocks removed by retention or compaction). 0 disables it.
+    # How frequently scan the bucket - or fetch the bucket index (if enabled) -
+    # to look for changes (new blocks shipped by ingesters and blocks removed by
+    # retention or compaction). 0 disables it.
     # CLI flag: -blocks-storage.bucket-store.sync-interval
     [sync_interval: <duration> | default = 5m]
 
@@ -634,22 +635,24 @@ blocks_storage:
     [ignore_deletion_mark_delay: <duration> | default = 6h]
 
     bucket_index:
-      # True to enable querier to discover blocks in the storage via bucket
-      # index instead of bucket scanning.
+      # True to enable querier and store-gateway to discover blocks in the
+      # storage via bucket index instead of bucket scanning.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
       [enabled: <boolean> | default = false]
 
-      # How frequently a cached bucket index should be refreshed.
+      # How frequently a cached bucket index should be refreshed. This option is
+      # used only by querier.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-stale-interval
       [update_on_stale_interval: <duration> | default = 15m]
 
       # How frequently a bucket index, which previously failed to load, should
-      # be tried to load again.
+      # be tried to load again. This option is used only by querier.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-error-interval
       [update_on_error_interval: <duration> | default = 1m]
 
       # How long a unused bucket index should be cached. Once this timeout
       # expires, the unused bucket index is removed from the in-memory cache.
+      # This option is used only by querier.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.idle-timeout
       [idle_timeout: <duration> | default = 1h]
 

diff --git a/docs/blocks-storage/store-gateway.md b/docs/blocks-storage/store-gateway.md
@@ -13,6 +13,13 @@ The store-gateway is **semi-stateful**.
 
 ## How it works
 
+The store-gateway needs to have an almost up-to-date view over the storage bucket, in order to discover blocks belonging to their shard. The store-gateway can keep the bucket view updated in to two different ways:
+
+1. Periodically scanning the bucket (default)
+2. Periodically downloading the [bucket index](./bucket-index.md)
+
+### Bucket index disabled (default)
+
 At startup **store-gateways** iterate over the entire storage bucket to discover blocks for all tenants and download the `meta.json` and index-header for each block. During this initial bucket synchronization phase, the store-gateway `/ready` readiness probe endpoint will fail.
 
 While running, store-gateways periodically rescan the storage bucket to discover new blocks (uploaded by the ingesters and [compactor](./compactor.md)) and blocks marked for deletion or fully deleted since the last scan (as a result of compaction). The frequency at which this occurs is configured via `-blocks-storage.bucket-store.sync-interval`.
@@ -21,6 +28,12 @@ The blocks chunks and the entire index are never fully downloaded by the store-g
 
 _For more information about the index-header, please refer to [Binary index-header documentation](./binary-index-header.md)._
 
+### Bucket index enabled
+
+When bucket index is enabled, the overall workflow is the same but, instead of iterating over the bucket objects, the store-gateway fetch the [bucket index](./bucket-index.md) for each tenant belonging to their shard in order to discover each tenant's blocks and block deletion marks.
+
+_For more information about the bucket index, please refer to [bucket index documentation](./bucket-index.md)._
+
 ## Blocks sharding and replication
 
 The store-gateway optionally supports blocks sharding. Sharding can be used to horizontally scale blocks in a large cluster without hitting any vertical scalability limit.
@@ -399,8 +412,9 @@ blocks_storage:
     # CLI flag: -blocks-storage.bucket-store.sync-dir
     [sync_dir: <string> | default = "tsdb-sync"]
 
-    # How frequently scan the bucket to look for changes (new blocks shipped by
-    # ingesters and blocks removed by retention or compaction). 0 disables it.
+    # How frequently scan the bucket - or fetch the bucket index (if enabled) -
+    # to look for changes (new blocks shipped by ingesters and blocks removed by
+    # retention or compaction). 0 disables it.
     # CLI flag: -blocks-storage.bucket-store.sync-interval
     [sync_interval: <duration> | default = 5m]
 
@@ -668,22 +682,24 @@ blocks_storage:
     [ignore_deletion_mark_delay: <duration> | default = 6h]
 
     bucket_index:
-      # True to enable querier to discover blocks in the storage via bucket
-      # index instead of bucket scanning.
+      # True to enable querier and store-gateway to discover blocks in the
+      # storage via bucket index instead of bucket scanning.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
       [enabled: <boolean> | default = false]
 
-      # How frequently a cached bucket index should be refreshed.
+      # How frequently a cached bucket index should be refreshed. This option is
+      # used only by querier.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-stale-interval
       [update_on_stale_interval: <duration> | default = 15m]
 
       # How frequently a bucket index, which previously failed to load, should
-      # be tried to load again.
+      # be tried to load again. This option is used only by querier.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-error-interval
       [update_on_error_interval: <duration> | default = 1m]
 
       # How long a unused bucket index should be cached. Once this timeout
       # expires, the unused bucket index is removed from the in-memory cache.
+      # This option is used only by querier.
       # CLI flag: -blocks-storage.bucket-store.bucket-index.idle-timeout
       [idle_timeout: <duration> | default = 1h]
 

diff --git a/docs/blocks-storage/store-gateway.template b/docs/blocks-storage/store-gateway.template
@@ -13,6 +13,13 @@ The store-gateway is **semi-stateful**.
 
 ## How it works
 
+The store-gateway needs to have an almost up-to-date view over the storage bucket, in order to discover blocks belonging to their shard. The store-gateway can keep the bucket view updated in to two different ways:
+
+1. Periodically scanning the bucket (default)
+2. Periodically downloading the [bucket index](./bucket-index.md)
+
+### Bucket index disabled (default)
+
 At startup **store-gateways** iterate over the entire storage bucket to discover blocks for all tenants and download the `meta.json` and index-header for each block. During this initial bucket synchronization phase, the store-gateway `/ready` readiness probe endpoint will fail.
 
 While running, store-gateways periodically rescan the storage bucket to discover new blocks (uploaded by the ingesters and [compactor](./compactor.md)) and blocks marked for deletion or fully deleted since the last scan (as a result of compaction). The frequency at which this occurs is configured via `-blocks-storage.bucket-store.sync-interval`.
@@ -21,6 +28,12 @@ The blocks chunks and the entire index are never fully downloaded by the store-g
 
 _For more information about the index-header, please refer to [Binary index-header documentation](./binary-index-header.md)._
 
+### Bucket index enabled
+
+When bucket index is enabled, the overall workflow is the same but, instead of iterating over the bucket objects, the store-gateway fetch the [bucket index](./bucket-index.md) for each tenant belonging to their shard in order to discover each tenant's blocks and block deletion marks.
+
+_For more information about the bucket index, please refer to [bucket index documentation](./bucket-index.md)._
+
 ## Blocks sharding and replication
 
 The store-gateway optionally supports blocks sharding. Sharding can be used to horizontally scale blocks in a large cluster without hitting any vertical scalability limit.

diff --git a/docs/configuration/config-file-reference.md b/docs/configuration/config-file-reference.md
@@ -3682,8 +3682,9 @@ bucket_store:
   # CLI flag: -blocks-storage.bucket-store.sync-dir
   [sync_dir: <string> | default = "tsdb-sync"]
 
-  # How frequently scan the bucket to look for changes (new blocks shipped by
-  # ingesters and blocks removed by retention or compaction). 0 disables it.
+  # How frequently scan the bucket - or fetch the bucket index (if enabled) - to
+  # look for changes (new blocks shipped by ingesters and blocks removed by
+  # retention or compaction). 0 disables it.
   # CLI flag: -blocks-storage.bucket-store.sync-interval
   [sync_interval: <duration> | default = 5m]
 
@@ -3950,22 +3951,24 @@ bucket_store:
   [ignore_deletion_mark_delay: <duration> | default = 6h]
 
   bucket_index:
-    # True to enable querier to discover blocks in the storage via bucket index
-    # instead of bucket scanning.
+    # True to enable querier and store-gateway to discover blocks in the storage
+    # via bucket index instead of bucket scanning.
     # CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
     [enabled: <boolean> | default = false]
 
-    # How frequently a cached bucket index should be refreshed.
+    # How frequently a cached bucket index should be refreshed. This option is
+    # used only by querier.
     # CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-stale-interval
     [update_on_stale_interval: <duration> | default = 15m]
 
     # How frequently a bucket index, which previously failed to load, should be
-    # tried to load again.
+    # tried to load again. This option is used only by querier.
     # CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-error-interval
     [update_on_error_interval: <duration> | default = 1m]
 
     # How long a unused bucket index should be cached. Once this timeout
-    # expires, the unused bucket index is removed from the in-memory cache.
+    # expires, the unused bucket index is removed from the in-memory cache. This
+    # option is used only by querier.
     # CLI flag: -blocks-storage.bucket-store.bucket-index.idle-timeout
     [idle_timeout: <duration> | default = 1h]