Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(releasehealth): Implement check_has_health_data on metrics backend [INGEST-249] #28729

Merged
merged 13 commits into from
Sep 24, 2021

Conversation

RaduW
Copy link
Contributor

@RaduW RaduW commented Sep 21, 2021

Like in #28598, port check_has_health_data from sentry.snuba.sessions to metrics.

INGEST-249

@@ -100,3 +108,16 @@ def get_release_adoption(
"""

raise NotImplementedError()

def check_has_health_data(
self, projects_list: Sequence[Union[ProjectId, ProjectRelease]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two problems with this type:

  • There's no relation between input and output.. type signature implies that one can pass in releases, and get out just projects. Or that one passes in projects and gets out releases (sometimes?)
  • Union at this position implies that you can freely mix ProjectId and ProjectRelease within the same sequence but the implementation doesn't handle that

You can make this generic over a T that is bounded by Union[ProjectId, ProjectRelease]. Then you have the guarantee that the same union variant is returned as is put into it.

T = TypeVar("ProjectOrRelease", bounds=Union[...])

def check_has_health_data(self, projects_list: Sequence[T]) -> Set[T]: ...

This will make the function generic over T: Union[...]. You should probably pick a better name for T though.

Copy link
Contributor Author

@RaduW RaduW Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input, will do.

From what I understand making T=TypeVar("T", bounds=Union[ProjectId,ProjectRelease]) would not improve the semantics in any way, but making the definition T=TypeVar("T", ProjectId, ProjectRelease) would do exactly what you suggest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that may be better. However in any case it will be better, because when you call check_has_health_data you can declare T to be specifically ProjectId either way

if len(projects_list) == 0:
return set()

includes_releases = type(projects_list[0]) == tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
includes_releases = type(projects_list[0]) == tuple
includes_releases = isinstance(projects_list[0], tuple)

I believe this may make some type: ignore obsolete

def extract_raw_info(
row: Mapping[str, Union[int, str]]
) -> Union[ProjectId, ProjectRelease]:
return row.get("project_id") # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return row.get("project_id") # type: ignore
return row["project_id"] # type: ignore

either we're sure of our types or we should use Optional... there's no point in type-safety if everything is strictly typed and nulls still roam around freely IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

if includes_releases:
project_ids = [x[0] for x in projects_list] # type: ignore
else:
project_ids = [x for x in projects_list]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
project_ids = [x for x in projects_list]
project_ids = projects_list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course :-), thanks

from sentry.snuba.sessions import _get_release_adoption, get_current_and_previous_crash_free_rates
from sentry.snuba.sessions import (
_get_release_adoption,
check_has_health_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming this function to lead with underscore, you will find some more places where it's being called. One is inside of sentry.snuba.sessions itself.

You could argue it doesn't matter as sentry.snuba.sessions already implies we're using teh sessions backend, but I think we should call through the service as much as possible in case we really want to implement dual-read-and-compare.

@untitaker untitaker removed the request for review from a team September 23, 2021 10:18
@untitaker untitaker changed the title ref(releasehealth): Implement check_has_health_data on metrics backend ref(releasehealth): Implement check_has_health_data on metrics backend [INGEST-249] Sep 23, 2021
@RaduW RaduW disabled auto-merge September 24, 2021 10:01
where_clause.append(Condition(Column(release_column_name), Op.IN, releases_ids))
column_names = ["project_id", release_column_name]

# def extract_raw_info(row: Mapping[str, Union[int, str]]) -> ProjectOrRelease:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there commented out code in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next PR: #28823

@ahmedetefy
Copy link
Contributor

tbh peppering the PR with # type: ignore seems like it defeats the purpose of using types in the first place

extract_raw_info = extract_raw_info_func(includes_releases)

query_cols = [Column(column_name) for column_name in column_names]
group_by_clause = query_cols
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of defining group_by_clause here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some people do it to signal intent (Query(select=query_cols, groupby=query_cols) looks like a copypaste error) but I wouldn't have done it either

@ahmedetefy
Copy link
Contributor

ahmedetefy commented Sep 24, 2021

Tbh, it feels strange to me seeing releasehealth as one word in a python code base, why is the module name not release_health
And I feel like its a misleading name too .. the abstraction layer is for sessions, no?
I would have just called it sessions_abstraction or sessions_v3

@untitaker
Copy link
Member

@ahmedetefy let's resolve those questions outside of this PR

vuluongj20 pushed a commit that referenced this pull request Sep 30, 2021
…d [INGEST-249] (#28729)

 check_has_health_data implemented for metrics
@github-actions github-actions bot locked and limited conversation to collaborators Oct 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants