-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store: Add min resolution to be specific what downsampling levels to use on query #1104
Conversation
Let me know what you think of this approach @bwplotka |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I like it 🥇
Could you add changelog entry and tests.
I think we are not covering the case where we explictly request some resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Can you share the Make proto issue you have?
So max resolution was simple. We could put 2h and essentially we take every downsamplng level that is not higher than 2h (so 1h in our case). If we don't have 1h we take 5m etc.
Allowed resolution method is tricky as it assumes the resolution apriori. Someday maybe we will give more flexibility on downsampling? Then the migration would be painful etc
I have just yolo idea. What if we would add just min_resolution
to the proto? This gives us full flexibility on debugging but also other use cases for choosing exactly one resolution. Thoughts? We can make this enum on Querier side (just one side) to chose from given list, but the implementation would work on min and max resolution.
It also gives a way for other use cases like use any downsampling but not raw data, because it's expensive. What do you think?
Just curious have you found any bug in test cases or getFor
? Something is not right there overall, wonder if you found anything.
Curious what others think of it: @povilasv @GiedriusS @domgreen @brancz ? It is essentially super useful for debugging and more explicit than current max resolution. With ONLY max resolution it's not immidiately clear that it fallbacks to lower resolution if not having the requested ones. If we have both max and min I think it would be more clearly and powerful.
I completely agree with @bwplotka but I just had one idea while working through getFor's code: we should probably prioritize the higher resolution data in general. For example, the user specifies that they want max 1h resolution and raw resolution as the minimum. In such a case, we should probably opt to first fill the requested time range with the raw resolution where possible and only then fill it up with the downsampled one. It would probably make more sense from the user's perspective if they are using a small range vector in their query. Such a change would give more results to the users compared to selecting the data with the worst resolution first. Thoughts? |
ff44fd2
to
4b4e3ef
Compare
Changed to use the min resolution suggestion, which is much nicer :) I also rebased on top of #1132 for building the protos, but the main changes for this PR are still the ones to the store API. @GiedriusS that's an interesting question whether we should prefer raw data or downsampled data. Preferring the downsampled like we do now does mean we're sending less data overall which is good, but on the other hand having the raw data is more useful for the user. I think what we're doing now is reasonable but would have to think about it more, there's definitely a trade-off here. Edit - just saw #1170 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIce, but we need a rebase as we fixed this part. I think it should not change much of you algorithm.
Thanks for this and sorry for review delay.
pkg/store/bucket.go
Outdated
} | ||
|
||
// The covering is greedy, preferring blocks of the lowest allowed resolutions. | ||
func (s *bucketBlockSet) getFor(minTime, maxTime, minResolution, maxResolution int64) (bs []*bucketBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We change a big for this part, so wonder how it will look like after rebase, but generally it looks like something we want 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 My changes are very small now actually. Previously I had refactored this function to not be recursive, but I don't think that it helpful any more (the recursion does a nice job of filling in the gaps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when we should take another look (:
b71c5d2
to
84ba758
Compare
Sorry for the delay, PTAL @bwplotka |
Closing as the only current use case for adding this was debugging #922, but there is a plan to add a UI for Discussed whether there might be any other use cases. Potentially, but we don't have any FRs yet it's easy enough to add if we get one :) |
Thanks for this @mjd95 We agreed offline that the debug use case that this is solving is solved in better way by this: #1246 (cc @jaseemabid), so postponing this until more use cases appears. (: |
Changes
Refactor the series interface to take a list of allowed resolutions instead of a max resolution. This is to allowed viewing e.g. only 5m downsampled data, which is a useful debugging tool. (c.f. #922)
WIP as I'm currently having issues generating the protos via the Makefile (probably something about my setup). Also wondering whether we should have stronger typing on the new fields (e.g., a
repeated enum
rather than arepeated int64
).Verification
Only updated the unit tests to pass so far, happy to add more.