Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correlation analysis for cohorts #7875

Closed
joethreepwood opened this issue Jan 4, 2022 · 5 comments
Closed

Correlation analysis for cohorts #7875

joethreepwood opened this issue Jan 4, 2022 · 5 comments
Labels
enhancement New feature or request feature/cohorts Feature Tag: Cohorts feature/correlation-analysis Feature Tag: Correlation analysis

Comments

@joethreepwood
Copy link
Contributor

Is your feature request related to a problem?

This came up as part of a hackathon project I was working on, which was a tool for suggesting query ideas in PostHog. One of the things it suggested was basically 'Can you see what users in X have in common?'

This seems like a worthy idea, but it's hard to get an answer in PostHog right now.

Describe the solution you'd like

We have cohorts. We have correlation analysis. It would be great to bring these tools together.

For example, we may set up a cohort based on churned customers and want to find out if this action correlates to something else so we can make improvements to our funnel or support process. Or, we may set up a cohort for users with time-out errors and use correlation analysis to debug the problem. We could also set up a cohort based on feature use, or first-users in an org to go deeper on behaviour. Or specific types of customers.

One specific thing I'd love to look at from a growth perspective would be creating cohorts for first-users in orgs that are successful at scaling vs those that are not. How do they differ, in terms of behaviours?

The possibilities are, if not limitless, probably unquantifiable at this point!

Describe alternatives you've considered

Not doing this. Letting the Engineering and Product team do other things.

Additional context

Just going to mention @neilkakkar (per a Slack discussion) and @marcushyett-ph from a product perspective.

Thank you for your feature request – we love each and every one!

You're welcome.

@joethreepwood joethreepwood added the enhancement New feature or request label Jan 4, 2022
@neilkakkar neilkakkar added feature/cohorts Feature Tag: Cohorts feature/correlation-analysis Feature Tag: Correlation analysis labels Jan 4, 2022
@marcushyett-ph
Copy link
Contributor

Thanks for this idea @joethreepwood. It feels might also be related to automated clustering / segmentation of users.

One other related point discussed before was using correlation analysis for retention as well as "funnel conversion", e.g. what do users that retain best have in common.

@neilkakkar
Copy link
Contributor

Trying to understand this better. The kind of questions analysis like this can answer are:

Given a cohort of 100 people, here are the top 5 person properties common to all these people.

and similarly, given a cohort of 100 people, here are the top 5 events these people commonly do.

Does this make sense?

(I was a bit confused because this isn't quite correlation, but just an analysis along one dimension)

How do they differ, in terms of behaviours?

What information are you looking for to answer this question? Does the above suffice, or are you looking for something else?

@joethreepwood
Copy link
Contributor Author

@neilkakkar 100% that's what I had in mind. Apologies on confusion - I saw this as working similarly to the current correlation analysis:

People in this cohort were also 10% more likely to complete the 'Rageclick' event
People in this cohort were also 5% more likely to complete the 'Timeout' event

etc. Though perhaps '10% of people in this cohort also completed the 'Rageclick' event' is a simpler way to convey such information.

What information are you looking for to answer this question? Does the above suffice, or are you looking for something else?

The current model of correlated events and correlated properties (or common events, common properties, if you prefer) is basically what I had in mind.

@neilkakkar
Copy link
Contributor

Perfect, that makes a lot of sense! In fact, I'd say this is a separate problem to consider. it's one dimensional analysis.

Given a group of users / events / a cohort / a person modal -> tell me what are the most common events these people did / what are the most common properties among these people.

Actually, I like your phrasing of this as well: "10% of people in this cohort also did RageClick / are from the States"

Hmm, I like it. This is a lot easier / more straightforward / much faster to compute than correlation analysis, which means it can be done in a lot more places.

@davidfurlong
Copy link

My use case:

What actions do the cohort of users who retain from week0 to week1 take that users who don't retain from week0 to week1 don't do.

The correlation analysis of this should give me a good idea of what user actions to experiment with as this correlation could potentially be the cause of retention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature/cohorts Feature Tag: Cohorts feature/correlation-analysis Feature Tag: Correlation analysis
Projects
None yet
Development

No branches or pull requests

4 participants