Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve errors with time to convert bins #5283

Merged
merged 4 commits into from
Jul 22, 2021
Merged

Resolve errors with time to convert bins #5283

merged 4 commits into from
Jul 22, 2021

Conversation

neilkakkar
Copy link
Contributor

@neilkakkar neilkakkar commented Jul 22, 2021

Changes

Fixes #5116 (patch) and no. 6: #5249 (comment)

For now, we discard all negative and null values in time to convert analysis.

We aren't doing this at a lower level (in, say, the original funnel class) because the other things work fine, and the original safeguards in those places (increasing ordering of event times) ensure we don't run into this issue.

It's hard to write a deterministic test for this, since the issue is intermittent. And if I un-skip the new test now, it will be terribly flakey. I don't want to delete this test, because coming back to it after ages involves re-loading all the context and trying to write a proper test for it.

Checklist

  • All querysets/queries filter by Organization, by Team, and by User
  • Django backend tests
  • Jest frontend tests
  • Cypress end-to-end tests
  • Migrations are safe to run at scale (e.g. PostHog Cloud) – present proof if not obvious
  • New/changed UI is decent on smartphones (viewport width around 360px)

@timgl timgl temporarily deployed to posthog-pr-5283 July 22, 2021 13:12 Inactive
@neilkakkar neilkakkar requested review from EDsCODE and macobo July 22, 2021 13:12
query = f"""
WITH
step_runs AS (
{steps_per_person_query}
SELECT * FROM (
{steps_per_person_query}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why not push the predicates in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't doing this at a lower level (in, say, the original funnel class) because the other things work fine, and the following safeguards in those places (increasing ordering of event times) ensure we don't run into this issue.

Good question! I see good arguments for either, but opted for "where the problem shows up", as it also keeps things in one place. Doing it in steps_per_person_query would also mean doing it in all types of funnel orderings.

@timgl timgl temporarily deployed to posthog-pr-5283 July 22, 2021 13:25 Inactive
@@ -60,10 +60,19 @@ def get_query(self) -> str:
]
steps_average_conversion_time_expression_sum = " + ".join(steps_average_conversion_time_identifiers)

steps_average_conditional_for_invalid_values = [
f"{identifier} >= 0" for identifier in steps_average_conversion_time_identifiers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are NULLs an issue, should we add a NOT NULL check before >= 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL values ought to be removed, and this check implies NULL values won't make it through, either!

This is kinda tested by existing tests: test_auto_bin_count_total - step_1 time is > 0, while step_2 is NULL, and as expected, it's removed from consideration.

steps_average_conditional_for_invalid_values = [
f"{identifier} >= 0" for identifier in steps_average_conversion_time_identifiers
]
# this is protection against the CH bug: https://github.com/ClickHouse/ClickHouse/issues/26580
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Suggestion on comment style:

# :HACK: Protect against CH bug https://github.com/ClickHouse/ClickHouse/issues/26580
#   once the issue is resolved, stop skipping the test: test_auto_bin_count_single_step_duplicate_events
#   and remove this comment

I use the following meta-comments: :TRICKY: :TODO:. They are easier to grep for and convey reader should look out clearer than free-form text.

@timgl timgl temporarily deployed to posthog-pr-5283 July 22, 2021 13:44 Inactive
@neilkakkar neilkakkar enabled auto-merge (squash) July 22, 2021 13:44
@neilkakkar neilkakkar merged commit ca724e1 into master Jul 22, 2021
@neilkakkar neilkakkar deleted the timeconverch branch July 22, 2021 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Negative funnel time conversion bin times
3 participants