-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(metrics): Crash free breakdown on metrics backend [INGEST-249] #28945
Conversation
src/sentry/release_health/metrics.py
Outdated
start: datetime, | ||
environments: Optional[Sequence[EnvironmentName]] = None, | ||
) -> Callable[[datetime], CrashFreeBreakdown]: | ||
def dummy_fn(end: datetime) -> CrashFreeBreakdown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this something else like generate_defaults
or something?
src/sentry/release_health/metrics.py
Outdated
conditions.append(Condition(Column(environment_key), Op.IN, environment_values)) | ||
|
||
def query_stats(end: datetime) -> CrashFreeBreakdown: | ||
def _get_data(entity: EntityKey, metric_name: str) -> Tuple[int, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to entity_key
timedelta(days=1), | ||
timedelta(days=2), | ||
timedelta(days=7), | ||
timedelta(days=14), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What is the reason behind looping over a tuple here?
- Not a big fan of looping over the timedelta, so much repitition, why don't you just loop over the number of days and thencall timedelta once outside of the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuples are cheaper to construct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but how expensive really is it to have a list of 4 items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the argument that lists are supposed to be the homogenous container and tuples the heterogenous but I don't buy it. Not a lot of code works this way and it seems like a style guideline that was invented long after tuples and lists were. For all practical purposes tuples are immutable lists as long as mypy doesn't get in your way about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I deliberately did not touch this code but simply copied it over from sessions.py
.
src/sentry/release_health/metrics.py
Outdated
try: | ||
item_start = start + offset | ||
if item_start > now: | ||
if last is None or (item_start - last).days > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
item_start - last > timedelta(days=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately did not touch this code but simply copied it over from sessions.py.
Can you please a description to the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall nothing really problematic, this increases queries by a factor of 2 which isn't too bad but I think we can do better than the original code here
timedelta(days=1), | ||
timedelta(days=2), | ||
timedelta(days=7), | ||
timedelta(days=14), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuples are cheaper to construct
break | ||
rv.append(query_fn(item_start)) | ||
last = item_start | ||
except QueryOutsideRetentionError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having dummy_fn, what about letting the tag indexer exceptions bubble up until here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to do the whole "does tag exist" dance once instead of in every loop iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense. I think we may want an in-process cache around indexer anyway, because we have some endpoints that just call one method on the releasehealth service after the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the in-memory cache should be part of the indexer service IMO. I'm writing all my code under the assumption that lookups will be optimized there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much cleaner! Thanks :)
return { | ||
"date": end, | ||
"total_users": users_total, | ||
"crash_free_users": 100 - users_crashed / float(users_total) * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would unify how this is calculated everywhere because if i recall correctly @untitaker might have been doing this differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I did exactly this... we're just porting the formulas from sessions.py where it was also duplicated but also consistent.
not opposed to deduping in any case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's dedup when we do https://getsentry.atlassian.net/browse/INGEST-456.
Port get_crash_free_breakdown from sentry.snuba.sessions to metrics.
https://getsentry.atlassian.net/browse/INGEST-249