query: Add replica priorities for deduplication #716

roganartu · 2019-01-08T16:52:28Z

Implements #707.

This turned out to be reasonably straight forward once I had some time to look at it. This is my first pass at an implementation and I'm happy to get feedback or improvement suggestions.

The tl;dr is that the priorities adjust the series set sort order, which causes the existing dedup iterators to work with minimal modification. Omitting priorities falls back to the existing behaviour by setting all priorities to be equal so the sort does nothing.

Changes

Added support for replica priorities for query deduplication. Priorities are provided as cli args in <pattern>,<priority> pairs. These are applied in alphabetical order (aside: unsure if this is the best way to prioritise these args. Should we apply them in the order they are provided on the cli instead?) and for any given query with replica results, points from the highest priority replica are preferred. If the highest priority replica has no data points the next highest priority replica is preferred, continuing until no replicas remain.

The main use case for this is clusters running different granularity replicas (eg: short-term retention with high granularity, and long-term retention with low granularity), where the user querying would prefer to see data from the high granularity replica where possible, falling back to the low granularity data only when the high granularity replicas have no data.

Verification

I added tests to cover both the base case and the new cases with configured replica priorities for both the sorting func and the deduplication iterator. On top of this, I also did some adhoc testing by running the binary in front of prom and performing the queries that prompted the desire for this change.

Old binary

New binary with no priorities configured

New binary with short-term-retention node priority 100 and long-term-retention node priority 50

Known problems

The resolution of series to use isn't perfect. As you can see in the above new binary with priorities configured image, there's a gap in the deduplicated graph before it falls-forward to the short-term-retention node data again when data returns there. I think this is a result of the long-term node having a higher penalty (because it has a much longer scrape interval) causing the priority-based fallback to drop more data than desired. I couldn't figure out a good solution to this, open to suggestions (but I also think a small gap in that case is probably ok as long as it's documented).

bwplotka

Thanks for this, but as per discussion on ticket, can we figure out more automatic algorithm?

stale · 2020-01-11T04:42:34Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

roganartu added 3 commits January 7, 2019 14:59

Add support for replica priorities. Implements thanos-io#707

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Learn about vigilant mode

443ddd3

add regex support to replica priorities

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Learn about vigilant mode

99e02ca

use int64 instead of MinInt hack

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Learn about vigilant mode

2fd9bf2

bwplotka requested review from bwplotka and domgreen January 8, 2019 17:58

bwplotka requested changes Jan 14, 2019

View reviewed changes

GiedriusS mentioned this pull request Mar 15, 2019

WIP store: Add --skip-window functionality #930

Closed

stale bot added the stale label Jan 11, 2020

stale bot closed this Jan 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query: Add replica priorities for deduplication #716

query: Add replica priorities for deduplication #716

roganartu commented Jan 8, 2019

bwplotka left a comment

stale bot commented Jan 11, 2020

query: Add replica priorities for deduplication #716

query: Add replica priorities for deduplication #716

Conversation

roganartu commented Jan 8, 2019

Changes

Verification

Old binary

New binary with no priorities configured

New binary with short-term-retention node priority 100 and long-term-retention node priority 50

Known problems

bwplotka left a comment

Choose a reason for hiding this comment

stale bot commented Jan 11, 2020