Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(releasehealth): Implement get_release_sessions_time_bounds in metrics backend [INGEST-249] #28824

Merged
merged 14 commits into from
Sep 28, 2021

Conversation

untitaker
Copy link
Member

Implement another function in terms of the metrics backend.

@untitaker untitaker requested review from jjbayer and RaduW September 24, 2021 13:37
)
rows.extend(
raw_snql_query(
terminal_sessions_query,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function is too long, can we maybe break it down to smaller components.. feel like it there are atleast a couple of independant logical blocks here that can be broken down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just now tried to do this locally and end up with basically this, which I dislike:

def _get_time_bounds_for_initialized_sessions(where: List[Union[BooleanCondition, ...]], org_id: int) -> List[Dict[str, str]]:
def _get_time_bounds_for_terminal_sessions(where: List[Union[BooleanCondition, ...]], org_id: int) -> List[Dict[str, str]]:

I feel like splitting out those separate logical blocks only makes sense if you're going to reuse them, but those queries make no sense outside of this context. I really hope we don't reuse them and instead manage to do one query that is as simple as your original one, sometime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedetefy is there anything specific that's maybe hard to understand that we should split out and document as a sep. function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so to me the initial sessions query and the terminal sessions query logic seems close enough to me that I would just extract the logic that generates the query for either initial or sessions into its own function and just call that function twice here once for the initial and one for the terminal to return the actual query (could potentially be simple enough that it takes a flag param initial or terminal, and return the appropriate query)
It will make the function much easier to read and follow
Or even one level deeper make the function just generate the query based on inputs like EntityKey, column, etc ...

but the implementation is up to you.. just throwing ideas

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm already deduplicating most of the logic by pre-constructing a where variable that I use in both queries so I'll skip this for now. I think with those kinds of refactors you'll just have to try it out and see if you like the end result, and I couldn't come up with a version I liked.

@@ -23,13 +23,27 @@ class CurrentAndPreviousCrashFreeRate(TypedDict):
CurrentAndPreviousCrashFreeRates = Mapping[ProjectId, CurrentAndPreviousCrashFreeRate]


class _TimeBounds(TypedDict):
sessions_lower_bound: FormattedIsoTime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this class look like this instead

class _TimeBounds(TypedDict):
    sessions_lower_bound: Optional[FormattedIsoTime]
    sessions_upper_bound: Optional[FormattedIsoTime]

instead of having the class _NoTimeBounds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are two different types. In my type, either all fields are non-None or all fields are None. In your type lower_bound can be None while upper_bound can be non-None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I see although imo the usefulness of this is debatable but its up to you

Copy link
Member Author

@untitaker untitaker Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it's a pain to define, but it's the correct type to have, in that it simply describes more accurately what can be returned. In practice this means you can write code like:

if data['sessions_lower_bound'] is not None:
    # can assume upper bound is not None in here

instead of having to assert both fields for None/not-None all the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I still wouldn't do the check like this because it might be confusing to read for whoever reads the code after me

I would probably just check like

if set(data.values()) == {None}:
    # do whatever

Condition(Column("timestamp"), Op.GTE, datetime.min),
Condition(Column("timestamp"), Op.LT, datetime.now(pytz.utc)),
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels to me like the try/catch is wrapping much more logic than it actually should

I feel like lines 382 - 391 should be at the top of the function and if no env_filter exists just return {"sessions_lower_bound": None, "sessions_upper_bound": None} early on without raising an error and then catching it unless you want the error to be captured by sentry
to which I would also suggest that if that is your goal with this to maybe use logging.exception or metrics.incr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the exception mgmt is that I can use tag_key and tag_value multiple times without having to check their return-value for None each. In previous PRs I would have to do that explicitly.

Instead of raising MetricIndexNotFound() on line 391 I can also return the same value, but I feel that it would be simply duplication of code.

org_id: OrganizationId,
environments: Optional[Sequence[EnvironmentName]] = None,
) -> ReleaseSessionsTimeBounds:
return _get_release_sessions_time_bounds( # type: ignore
Copy link
Contributor

@ahmedetefy ahmedetefy Sep 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like typing _get_release_sessions_time_bounds would eliminate the need for us to use # type: ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but I really don't want to get into the business of refactoring adjacent modules

Copy link
Contributor

@ahmedetefy ahmedetefy Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but its not adjacent modules.. you are refactoring the very specific module you are calling
Surely if its something big I understand but if its a low hanging fruit like adding something like
def _get_release_sessions_time_bounds(project_id: int, release: str, org_id: int, environments: Optional[List[str]]=None) -> _TimeBounds :

shouldn't be that big of a deal, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't just put the type signature on there, you need to convert the entire file and add it to mypy.ini for mypy to even consider it for typing.

Copy link
Contributor

@RaduW RaduW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.
If you could, please double check the comment about the return value, thinking about it a bit more it makes sense what you are doing there.

On a second thought, I think I wouldn't have bothered with the end time at all and just considered the start_time, if a release is still being used that start_time would probably be good enough, if it is not used anymore that a long running
session might change the end time a bit.

Comment on lines 473 to 479
for row in rows:
if set(row.values()) == {formatted_unix_start_time}:
continue
if lower_bound is None or row["min"] < lower_bound:
lower_bound = row["min"]
if upper_bound is None or row["max"] > upper_bound:
upper_bound = row["max"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by this.
I am not sure what the function really returns since we appear to mix the session start time and the session end time, so (If I get it ) we return something like: min and max times when a session for a particular release was seen ?
That will most likely be something like min(session_start), max(session_end) except for the sessions that are still running and we don't have a session end and for those we would have also a max(session_start).

If that's what the intention is than everything is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are supposed to do min(session_start), max(session_start). where do you see us using session_end?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that the counter session with the "init" tag is pushed at the beginning of the session and the distribution session.duration is pushed at the end of the session so the timestamps for the two queries would represent a combination of the session starts and session ends... I might be missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in slack, all metric timestamps are derived from session.started.

# equivalent to the sessions-table based backend. Example:
#
# 1. Session sid=x is started with timestamp started=n
# 2. Same sid=x is updated with new payload with timestamp started=n - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this would be a bug, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yeah

@untitaker untitaker enabled auto-merge (squash) September 28, 2021 09:03
@untitaker untitaker merged commit 538b229 into master Sep 28, 2021
@untitaker untitaker deleted the feat/release-health-time-bounds-metrics branch September 28, 2021 09:27
vuluongj20 pushed a commit that referenced this pull request Sep 30, 2021
…rics backend [INGEST-249] (#28824)

Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Oct 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants