-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Return correct data in get_*_for_project methods #29037
Conversation
Type checking fails because I'm not sure what the return type of |
overall this looks great, just left a few comments most of which are nitpicky. thanks for catching this! |
Arpad's comments about the ttls prompted me to think about the expiry logic a bit more and I realized that it's currently unsound. Example: Let This isn't especially hard to fix, but we need to decide what the correct behavior should be. In the example above, what should be the first bucket we get back? You can make a case for |
In my most recent commit, I did some renaming of options/fields/function parameters:
I also realized that it's totally fine for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally really like this!
counts = realtime_metrics.get_counts_for_project(project_id, cutoff) | ||
durations = realtime_metrics.get_durations_for_project(project_id, cutoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rather mixed feelings about plugging the cutoff
through instead of implicitly getting the current time in the methods as it was before this PR. I think this is entirely for testing, you could have them Optional
, but also you could mock time.time()
during testing and things would work nicely. anyway, i don't mind if you'd prefer to keep it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll reply to this since i was the author of this code - i wanted to snapshot a single point in time defined by the scanning task, and pass that down to all of the tasks it triggers. this is reflected by the fact that this PR passes cutoff
from scan_for_suspect_projects
down to update_lpq_eligibility
, further down to get_x_for_project
as you've noted here.
my understanding was that we had two options:
- let the innermost invocations determine the cutoff themselves, meaning that a single
scan_...
may trigger multipleupdate_...
s with drifting timestamps. anupdate_...
for project 9 might grab metrics from a time period that's slightly different from project 110'supdate_...
- pin timestamps for all
update_..
s to some time determined by their parentscan_...
so that everyupdate_...
with a common parentscan_...
is making a decision based on metrics from the same period of time. anupdate_...
for project 9 will grab metrics from timestamp 42, and anupdate_...
for project 110 will also grab metrics from timestamp 42 if the samescan_...
task triggered them.
i went for the latter option in this case since i find it's a little easier to reason about timestamps and when tasks execute this way. tests are also easier to write without needing to resort to freezing time. i could be overlooking something here though. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arguably freezing the time for all projects is not that desired. If somehow the workers of these tasks get severely backlogged they'll start computations on the wrong time and make wrong decisions. Since the decision they make is applied now they should also decide it on the most recent data, not some date from the past.
This changes the methods
get_counts_for_project
andget_durations_for_project
onRedisRealtimeMetricsStore
to also return information that isn't stored in redis. This happens if no events are recorded in certain time intervals; data only gets written to redis when something happens. To fix this, these methods now compute the keys they expect to be there ahead of time, then get what they can from redis and fill up the rest with default values.