Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(experiments): apply new count method and fix continuous #27639

Merged
merged 13 commits into from
Jan 23, 2025

Conversation

andehen
Copy link
Contributor

@andehen andehen commented Jan 17, 2025

Problem

  • We are currently not applying the new count method as intended due to this check if self.query.count_query.series[0].math:. This is also set for count metrics. F.ex, when selecting Total count, the value here is total.
  • The input to calculate credible intervals and significance for the continues method is currently the total sum, but they expect the mean. Same problem as addressed in this PR, but these two were missed then.

Changes

  • Apply the new count stats method for count metrics by modifying the condition.
  • Adjust continuous methodology to work with total sum as input

Note: I introduced a new ExperimentMetricType. This is only used in the posthog.hogql_queries.experiments module at the moment, so it lives there for now. But it can easily be pulled out into f.ex posthog.schema if/when we want to use this more broadly, f.ex in the front-end.

How did you test this code?

  • tested locally by simulating many experiments
  • added more tests
  • tests pass

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Size Change: 0 B

Total Size: 1.16 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.16 MB

compressed-size-action

@andehen andehen changed the title fix(experiments): use correct stats method for count metrics fix(experiments): apply new count methodology for count metrics Jan 20, 2025
@andehen andehen force-pushed the experiment-fix-count-stats-method branch from 45355e5 to ddfb8cf Compare January 20, 2025 17:12
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@andehen andehen marked this pull request as ready for review January 21, 2025 07:58
@andehen andehen force-pushed the experiment-fix-count-stats-method branch from e32516c to 1000c41 Compare January 21, 2025 07:58
@andehen andehen changed the title fix(experiments): apply new count methodology for count metrics fix(experiments): apply new count method and fix continous Jan 21, 2025
@andehen andehen changed the title fix(experiments): apply new count method and fix continous fix(experiments): apply new count method and fix continuous Jan 21, 2025
@andehen andehen requested a review from a team January 21, 2025 08:20
)
credible_intervals = calculate_credible_intervals_v2_count([control_variant, *test_variants])
case _:
raise ValueError(f"Unsupported metric type: {self._get_metric_type()}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we shouldn't return results for unsupported metric types. However, there are likely some experiments with unsupported metric types that are currently returning results but will start throwing errors after this PR is merged. Have you thought about how to handle any complaints from these users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_metric_type() is defaulting to count, so we won't throw errors. So this is a safe guard for future work to make sure all metric types are handled. Does that make sense?

@jurajmajerik
Copy link
Contributor

Nice work 🙏


Meta feedback on this PR: I think it would benefit from a clearer problem description.

Now:

Apply the new count stats method for count metrics

Better:

Problem

Currently, we're applying the continuous calculation for any Trend metric which contains the "math" field. This is incorrect, we should only apply it to the valid continuous math types, and throw an error for the rest.

Changes

...

@@ -2363,3 +2366,45 @@ def test_validate_event_variants_no_exposure(self):
}
)
self.assertEqual(cast(list, context.exception.detail)[0], expected_errors)

def test_get_metric_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this


# Test: ~$105 mean with narrow interval due to old implementation
self.assertAlmostEqual(intervals["test"][0], 103, delta=3)
self.assertAlmostEqual(intervals["test"][1], 107, delta=3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I intentionally didn't change the v1 values previously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was a little confused here. But as the old implementation is also getting the "total" as input (what it gets from the query runner), not the mean, the test cases should be updated to reflect that. And hence the assertions had to be updated.

Does that make sense? I think the values in the assertions make more sense now as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the explanation makes sense. Just flagging that I intentionally didn't change the behavior / return values for v1. I'm not strongly opposed to doing so, but the original intent was to keep v1 exactly how it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. To be clear, the implementation for v1 has not changed here. Only the input for the test cases that were added here 1979d74. And the reason is to reflect what the behavior is and has been in production: the queries does not return the mean, but the total.

@andehen andehen force-pushed the experiment-fix-count-stats-method branch from 1000c41 to 4df1c30 Compare January 22, 2025 05:41
@andehen andehen force-pushed the experiment-fix-count-stats-method branch from 4df1c30 to 0e4ce77 Compare January 22, 2025 16:46
@andehen andehen merged commit 8e3b930 into master Jan 23, 2025
99 checks passed
@andehen andehen deleted the experiment-fix-count-stats-method branch January 23, 2025 07:50
timgl pushed a commit that referenced this pull request Jan 28, 2025
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants