Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add querier metric for block source and compaction level #7112

Merged
merged 8 commits into from
Jan 25, 2024

Conversation

jhalterman
Copy link
Member

@jhalterman jhalterman commented Jan 13, 2024

What this PR does

Adds source and compaction level to the cortex_bucket_store_series_blocks_queried metric, which indicates the number of compacted blocks that were queried from store gateways.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@jhalterman jhalterman force-pushed the compacted-blocks-metric branch 2 times, most recently from e7fe57a to 75042c9 Compare January 13, 2024 03:20
@jhalterman jhalterman marked this pull request as ready for review January 13, 2024 03:27
@jhalterman jhalterman requested a review from a team as a code owner January 13, 2024 03:27
@jhalterman jhalterman marked this pull request as draft January 13, 2024 18:29
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great. I never knew this is is "knowable". It could serve as a reliable indicator for the compactor falling behind.

pkg/querier/block_streaming.go Outdated Show resolved Hide resolved
pkg/querier/blocks_store_queryable.go Outdated Show resolved Hide resolved
pkg/storage/tsdb/block/meta.go Outdated Show resolved Hide resolved
pkg/storegateway/bucket.go Outdated Show resolved Hide resolved
pkg/storage/tsdb/block/meta.go Outdated Show resolved Hide resolved
pkg/querier/blocks_store_queryable.go Outdated Show resolved Hide resolved
@jhalterman jhalterman force-pushed the compacted-blocks-metric branch 2 times, most recently from 87de38c to 08630e1 Compare January 17, 2024 01:08
@jhalterman jhalterman changed the title Add querier metric for compacted store gateway blocks Add querier metric for non-compacted store gateway blocks Jan 17, 2024
@jhalterman
Copy link
Member Author

jhalterman commented Jan 17, 2024

Updated the PR to move this metric to the store-gateway itself, and use a summary to be consistent with cortex_bucket_store_series_blocks_queried. There don't seem to be any related tests to update.

I'm on the fence about whether this metric should be about compacted or non-compacted blocks - input appreciated. My immediate use case is mostly interested in non-compacted blocks, but @dimitarvdimitrov pointed out that understanding compacted blocks could be useful as well. Either one can be derived from the other, along with cortex_bucket_store_series_blocks_queried.

@jhalterman jhalterman marked this pull request as ready for review January 17, 2024 01:44
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jonathan for re-iterating on it. I left a couple of final comments. About tests to improve, I think you can look at the test using assertQueryStatsMetricsRecorded().

pkg/storage/tsdb/block/meta.go Outdated Show resolved Hide resolved
pkg/storage/tsdb/block/meta.go Outdated Show resolved Hide resolved
pkg/storage/tsdb/block/meta.go Outdated Show resolved Hide resolved
@dimitarvdimitrov
Copy link
Contributor

I'm on the fence about whether this metric should be about compacted or non-compacted blocks - input appreciated. My immediate use case is mostly interested in non-compacted blocks, but @dimitarvdimitrov pointed out that understanding compacted blocks could be useful as well.

The potential value that I see is to be able to tell when store-gateways start querying non-compcated 2-hour blocks. If the compactor falls behind on compacting 12-hour blocks it's not that big of a deal, so I don't think observability there is as critical.

So if the new metric you are adding only counts non-compacted blocks, then I think it will still serve this purpose. If we cover all blocks via the (level, source) tuple Marco suggested might future-proof us, so if it's not too hard, maybe we should do that?

Adds a querier metric, cortex_querier_compacted_blocks_queried_total, which indicates the number of blocks fetched from store gateways that were compacted. This can be compared to cortex_querier_blocks_queried_total.
- move metric to storegateway
- rename metric to cortex_bucket_store_series_non_compacted_blocks_queried
This reverts commit 50d04932f40ba470e2f6e72a3e5ed9a6b82f8acd.
@jhalterman jhalterman force-pushed the compacted-blocks-metric branch from 08630e1 to 1b5a1fc Compare January 17, 2024 23:35
@jhalterman jhalterman requested a review from pracucci January 18, 2024 00:11
@jhalterman jhalterman force-pushed the compacted-blocks-metric branch from 1b5a1fc to 5da75da Compare January 18, 2024 02:59
@jhalterman jhalterman removed the request for review from pracucci January 18, 2024 02:59
@jhalterman
Copy link
Member Author

Moving to draft since testing in dev shows that some block meta isn't populated before creating the metrics. Feel free to comment/review otherwise.

@jhalterman jhalterman marked this pull request as draft January 18, 2024 04:31
pkg/storegateway/stats.go Outdated Show resolved Hide resolved
pkg/storegateway/stats.go Show resolved Hide resolved
pkg/storegateway/bucket_store_metrics.go Outdated Show resolved Hide resolved
pkg/storegateway/bucket.go Outdated Show resolved Hide resolved
@jhalterman
Copy link
Member Author

jhalterman commented Jan 19, 2024

@dimitarvdimitrov I noticed that some block meta I was using wasn't being populated since it was being fetched via the BucketIndexMetadataFetcher rather than the MetadataFetcher, which I see was changed in #6808. Since compaction level and source aren't available in the bucket index, I reverted to using the MetadataFetcher (integration tests still use this). Let me know what you think.

@jhalterman jhalterman marked this pull request as ready for review January 19, 2024 00:11
@jhalterman jhalterman force-pushed the compacted-blocks-metric branch from 56026bf to e0c5cd2 Compare January 19, 2024 00:13
@dimitarvdimitrov
Copy link
Contributor

reverted to using the MetadataFetcher

I'm afraid this will make the store-gateway scan the bucket each every 15 minutes instead of relying on the bucket index. This makes scanning more brittle, slower, and adds costs for list operations (we have docs on the bucket index which contain some more details). Because of this I think it's best to keep using the bucket index scanner.

One option to resolve this is to update the bucket index to start including these two items.

@pracucci
Copy link
Collaborator

I reverted to using the MetadataFetcher (integration tests still use this)

I agree with what @dimitarvdimitrov said. It's not an option. Actually we want to get rid of MetadataFetcher.

This reverts commit fdd002a6f49f45f43a1919d77822aa17eccfb575.
@jhalterman jhalterman changed the title Add querier metric for non-compacted store gateway blocks Add querier metric for block source and compaction level Jan 19, 2024
@jhalterman
Copy link
Member Author

jhalterman commented Jan 20, 2024

@dimitarvdimitrov I took your suggestion, thanks. The bucket index JSON field name is compaction_level, but the label is just level. Let me know if something different is better.

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your patience with this PR :)

@dimitarvdimitrov
Copy link
Contributor

perhaps cortex_bucket_store_blocks_loaded_by_duration could be removed in a separate PR if it's not needed or redundant with this?

once this PR is merged, should I open a follow-up PR to do this or have you already started on it?

@jhalterman
Copy link
Member Author

once this PR is merged, should I open a follow-up PR to do this or have you already started on it?

I have not started that yet. If you'd like to do a follow up that would be great.

@jhalterman jhalterman merged commit dcc2d9b into grafana:main Jan 25, 2024
28 checks passed
@jhalterman jhalterman deleted the compacted-blocks-metric branch January 25, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants