-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve errors with time to convert bins #5283
Conversation
query = f""" | ||
WITH | ||
step_runs AS ( | ||
{steps_per_person_query} | ||
SELECT * FROM ( | ||
{steps_per_person_query} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Why not push the predicates in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't doing this at a lower level (in, say, the original funnel class) because the other things work fine, and the following safeguards in those places (increasing ordering of event times) ensure we don't run into this issue.
Good question! I see good arguments for either, but opted for "where the problem shows up", as it also keeps things in one place. Doing it in steps_per_person_query
would also mean doing it in all types of funnel orderings.
@@ -60,10 +60,19 @@ def get_query(self) -> str: | |||
] | |||
steps_average_conversion_time_expression_sum = " + ".join(steps_average_conversion_time_identifiers) | |||
|
|||
steps_average_conditional_for_invalid_values = [ | |||
f"{identifier} >= 0" for identifier in steps_average_conversion_time_identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are NULLs an issue, should we add a NOT NULL check before >= 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL values ought to be removed, and this check implies NULL values won't make it through, either!
This is kinda tested by existing tests: test_auto_bin_count_total
- step_1 time is > 0, while step_2 is NULL, and as expected, it's removed from consideration.
steps_average_conditional_for_invalid_values = [ | ||
f"{identifier} >= 0" for identifier in steps_average_conversion_time_identifiers | ||
] | ||
# this is protection against the CH bug: https://github.com/ClickHouse/ClickHouse/issues/26580 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Suggestion on comment style:
# :HACK: Protect against CH bug https://github.com/ClickHouse/ClickHouse/issues/26580
# once the issue is resolved, stop skipping the test: test_auto_bin_count_single_step_duplicate_events
# and remove this comment
I use the following meta-comments: :TRICKY:
:TODO:
. They are easier to grep for and convey reader should look out clearer than free-form text.
Changes
Fixes #5116 (patch) and no. 6: #5249 (comment)
For now, we discard all negative and null values in time to convert analysis.
We aren't doing this at a lower level (in, say, the original funnel class) because the other things work fine, and the original safeguards in those places (increasing ordering of event times) ensure we don't run into this issue.
It's hard to write a deterministic test for this, since the issue is intermittent. And if I un-skip the new test now, it will be terribly flakey. I don't want to delete this test, because coming back to it after ages involves re-loading all the context and trying to write a proper test for it.
Checklist