query: Add replica priorities for deduplication #716
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements #707.
This turned out to be reasonably straight forward once I had some time to look at it. This is my first pass at an implementation and I'm happy to get feedback or improvement suggestions.
The tl;dr is that the priorities adjust the series set sort order, which causes the existing dedup iterators to work with minimal modification. Omitting priorities falls back to the existing behaviour by setting all priorities to be equal so the sort does nothing.
Changes
Added support for replica priorities for query deduplication. Priorities are provided as cli args in
<pattern>,<priority>
pairs. These are applied in alphabetical order (aside: unsure if this is the best way to prioritise these args. Should we apply them in the order they are provided on the cli instead?) and for any given query with replica results, points from the highest priority replica are preferred. If the highest priority replica has no data points the next highest priority replica is preferred, continuing until no replicas remain.The main use case for this is clusters running different granularity replicas (eg: short-term retention with high granularity, and long-term retention with low granularity), where the user querying would prefer to see data from the high granularity replica where possible, falling back to the low granularity data only when the high granularity replicas have no data.
Verification
I added tests to cover both the base case and the new cases with configured replica priorities for both the sorting func and the deduplication iterator. On top of this, I also did some adhoc testing by running the binary in front of prom and performing the queries that prompted the desire for this change.
Old binary
New binary with no priorities configured
New binary with short-term-retention node priority 100 and long-term-retention node priority 50
Known problems
The resolution of series to use isn't perfect. As you can see in the above new binary with priorities configured image, there's a gap in the deduplicated graph before it falls-forward to the short-term-retention node data again when data returns there. I think this is a result of the long-term node having a higher penalty (because it has a much longer scrape interval) causing the priority-based fallback to drop more data than desired. I couldn't figure out a good solution to this, open to suggestions (but I also think a small gap in that case is probably ok as long as it's documented).