-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Flag rollout %s are not accurate #8001
Comments
Related to #1610, previously what I would do is normalize experiments by doing a ratio of the metric I care about among active users with/without the feature flag, but not ideal |
This might just be the result of the hashing algorithm not giving a perfect distribution over smaller numbers. See code. When running over <10k identifiers it starts of with quite a big gap and then closes, which is what I suspect is what's happening here as well. |
I don't think this is the case. (As one data point, there's ~10k events in the examples above^) Take the multivariate example above: All 3 variants are very close to each other (same hashing algorithm), and then there's this pile of leftover |
To test the above hypothesis: This multivariate breakdown on an event that happens pretty late (so hopefully not influenced by As expected, the number of Nones go down, but a few still exist, which means there's at least one more problem with giving FFs that I'm not aware of. And, kindof interesting: |
Wherever precision is required, we ought to use multivariate, not simple flags. Chalking up the None's remaining in multivariates to |
Bug description
(I don't know if there's a way to solve this well): When setting a rollout % on a FeatureFlag to 50%, I'd expect half the users to see it.
This isn't the case. The ratio is more like 45-55, and gets worse with time..
Here's an example
Here's a multivariate example.
The cause has been explored well in: #6043 . This isn't a problem with just multivariate testing.
Now, this is fine for rolling out new things (mostly), since the numbers don't have to be precise. But for running manual experiments, this is terrible (also why our Experimentation product is built off of multivariates only) - and gives false confidence to users who want to try running experiments. Even internally. For example: our experiment for product-cues
Environment
Additional context
I don't know if we can solve this at all. But, probs need some way to tell users not to try running experiments with "normal" FFs.
Thank you for your bug report – we love squashing them!
The text was updated successfully, but these errors were encountered: