Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(metrics): Crash free breakdown on metrics backend [INGEST-249] #28945

Merged
merged 5 commits into from
Oct 4, 2021

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Sep 29, 2021

Port get_crash_free_breakdown from sentry.snuba.sessions to metrics.

https://getsentry.atlassian.net/browse/INGEST-249

start: datetime,
environments: Optional[Sequence[EnvironmentName]] = None,
) -> Callable[[datetime], CrashFreeBreakdown]:
def dummy_fn(end: datetime) -> CrashFreeBreakdown:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this something else like generate_defaults or something?

conditions.append(Condition(Column(environment_key), Op.IN, environment_values))

def query_stats(end: datetime) -> CrashFreeBreakdown:
def _get_data(entity: EntityKey, metric_name: str) -> Tuple[int, int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to entity_key

timedelta(days=1),
timedelta(days=2),
timedelta(days=7),
timedelta(days=14),
Copy link
Contributor

@ahmedetefy ahmedetefy Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What is the reason behind looping over a tuple here?
  • Not a big fan of looping over the timedelta, so much repitition, why don't you just loop over the number of days and thencall timedelta once outside of the loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuples are cheaper to construct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but how expensive really is it to have a list of 4 items?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the argument that lists are supposed to be the homogenous container and tuples the heterogenous but I don't buy it. Not a lot of code works this way and it seems like a style guideline that was invented long after tuples and lists were. For all practical purposes tuples are immutable lists as long as mypy doesn't get in your way about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I deliberately did not touch this code but simply copied it over from sessions.py.

try:
item_start = start + offset
if item_start > now:
if last is None or (item_start - last).days > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item_start - last > timedelta(days=1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately did not touch this code but simply copied it over from sessions.py.

@ahmedetefy
Copy link
Contributor

Can you please a description to the PR?

Copy link
Member

@untitaker untitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall nothing really problematic, this increases queries by a factor of 2 which isn't too bad but I think we can do better than the original code here

timedelta(days=1),
timedelta(days=2),
timedelta(days=7),
timedelta(days=14),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuples are cheaper to construct

break
rv.append(query_fn(item_start))
last = item_start
except QueryOutsideRetentionError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having dummy_fn, what about letting the tag indexer exceptions bubble up until here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do the whole "does tag exist" dance once instead of in every loop iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. I think we may want an in-process cache around indexer anyway, because we have some endpoints that just call one method on the releasehealth service after the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the in-memory cache should be part of the indexer service IMO. I'm writing all my code under the assumption that lookups will be optimized there.

Copy link
Contributor

@ahmedetefy ahmedetefy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner! Thanks :)

return {
"date": end,
"total_users": users_total,
"crash_free_users": 100 - users_crashed / float(users_total) * 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would unify how this is calculated everywhere because if i recall correctly @untitaker might have been doing this differently?

Copy link
Member

@untitaker untitaker Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I did exactly this... we're just porting the formulas from sessions.py where it was also duplicated but also consistent.

not opposed to deduping in any case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbayer jjbayer merged commit 4f53cd2 into master Oct 4, 2021
@jjbayer jjbayer deleted the feat/metrics-crash-free-breakdown branch October 4, 2021 12:32
@github-actions github-actions bot locked and limited conversation to collaborators Oct 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants