Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: Add replica priorities for deduplication #716

Closed

Conversation

roganartu
Copy link

Implements #707.

This turned out to be reasonably straight forward once I had some time to look at it. This is my first pass at an implementation and I'm happy to get feedback or improvement suggestions.

The tl;dr is that the priorities adjust the series set sort order, which causes the existing dedup iterators to work with minimal modification. Omitting priorities falls back to the existing behaviour by setting all priorities to be equal so the sort does nothing.

Changes

Added support for replica priorities for query deduplication. Priorities are provided as cli args in <pattern>,<priority> pairs. These are applied in alphabetical order (aside: unsure if this is the best way to prioritise these args. Should we apply them in the order they are provided on the cli instead?) and for any given query with replica results, points from the highest priority replica are preferred. If the highest priority replica has no data points the next highest priority replica is preferred, continuing until no replicas remain.

The main use case for this is clusters running different granularity replicas (eg: short-term retention with high granularity, and long-term retention with low granularity), where the user querying would prefer to see data from the high granularity replica where possible, falling back to the low granularity data only when the high granularity replicas have no data.

Verification

I added tests to cover both the base case and the new cases with configured replica priorities for both the sorting func and the deduplication iterator. On top of this, I also did some adhoc testing by running the binary in front of prom and performing the queries that prompted the desire for this change.

Old binary

screen shot 2019-01-03 at 10 20 41 am

screen shot 2019-01-03 at 10 20 58 am

New binary with no priorities configured

screen shot 2019-01-08 at 11 39 05 am

screen shot 2019-01-08 at 11 39 36 am

New binary with short-term-retention node priority 100 and long-term-retention node priority 50

screen shot 2019-01-08 at 11 40 42 am

screen shot 2019-01-08 at 11 41 03 am

Known problems

The resolution of series to use isn't perfect. As you can see in the above new binary with priorities configured image, there's a gap in the deduplicated graph before it falls-forward to the short-term-retention node data again when data returns there. I think this is a result of the long-term node having a higher penalty (because it has a much longer scrape interval) causing the priority-based fallback to drop more data than desired. I couldn't figure out a good solution to this, open to suggestions (but I also think a small gap in that case is probably ok as long as it's documented).

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@bwplotka bwplotka requested review from bwplotka and domgreen January 8, 2019 17:58
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, but as per discussion on ticket, can we figure out more automatic algorithm?

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@stale stale bot closed this Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants