Correlation analysis for cohorts #7875

joethreepwood · 2022-01-04T11:54:38Z

Is your feature request related to a problem?

This came up as part of a hackathon project I was working on, which was a tool for suggesting query ideas in PostHog. One of the things it suggested was basically 'Can you see what users in X have in common?'

This seems like a worthy idea, but it's hard to get an answer in PostHog right now.

Describe the solution you'd like

We have cohorts. We have correlation analysis. It would be great to bring these tools together.

For example, we may set up a cohort based on churned customers and want to find out if this action correlates to something else so we can make improvements to our funnel or support process. Or, we may set up a cohort for users with time-out errors and use correlation analysis to debug the problem. We could also set up a cohort based on feature use, or first-users in an org to go deeper on behaviour. Or specific types of customers.

One specific thing I'd love to look at from a growth perspective would be creating cohorts for first-users in orgs that are successful at scaling vs those that are not. How do they differ, in terms of behaviours?

The possibilities are, if not limitless, probably unquantifiable at this point!

Describe alternatives you've considered

Not doing this. Letting the Engineering and Product team do other things.

Additional context

Just going to mention @neilkakkar (per a Slack discussion) and @marcushyett-ph from a product perspective.

Thank you for your feature request – we love each and every one!

You're welcome.

marcushyett-ph · 2022-01-04T13:28:20Z

Thanks for this idea @joethreepwood. It feels might also be related to automated clustering / segmentation of users.

One other related point discussed before was using correlation analysis for retention as well as "funnel conversion", e.g. what do users that retain best have in common.

neilkakkar · 2022-01-04T13:36:07Z

Trying to understand this better. The kind of questions analysis like this can answer are:

Given a cohort of 100 people, here are the top 5 person properties common to all these people.

and similarly, given a cohort of 100 people, here are the top 5 events these people commonly do.

Does this make sense?

(I was a bit confused because this isn't quite correlation, but just an analysis along one dimension)

How do they differ, in terms of behaviours?

What information are you looking for to answer this question? Does the above suffice, or are you looking for something else?

joethreepwood · 2022-01-04T14:29:41Z

@neilkakkar 100% that's what I had in mind. Apologies on confusion - I saw this as working similarly to the current correlation analysis:

People in this cohort were also 10% more likely to complete the 'Rageclick' event
People in this cohort were also 5% more likely to complete the 'Timeout' event

etc. Though perhaps '10% of people in this cohort also completed the 'Rageclick' event' is a simpler way to convey such information.

What information are you looking for to answer this question? Does the above suffice, or are you looking for something else?

The current model of correlated events and correlated properties (or common events, common properties, if you prefer) is basically what I had in mind.

neilkakkar · 2022-01-04T14:47:06Z

Perfect, that makes a lot of sense! In fact, I'd say this is a separate problem to consider. it's one dimensional analysis.

Given a group of users / events / a cohort / a person modal -> tell me what are the most common events these people did / what are the most common properties among these people.

Actually, I like your phrasing of this as well: "10% of people in this cohort also did RageClick / are from the States"

Hmm, I like it. This is a lot easier / more straightforward / much faster to compute than correlation analysis, which means it can be done in a lot more places.

davidfurlong · 2023-01-17T13:11:14Z

My use case:

What actions do the cohort of users who retain from week0 to week1 take that users who don't retain from week0 to week1 don't do.

The correlation analysis of this should give me a good idea of what user actions to experiment with as this correlation could potentially be the cause of retention

joethreepwood added the enhancement New feature or request label Jan 4, 2022

neilkakkar added feature/cohorts Feature Tag: Cohorts feature/correlation-analysis Feature Tag: Correlation analysis labels Jan 4, 2022

mariusandra mentioned this issue Aug 17, 2022

Team east Sprint 1.39-2 bug bash #11344

Closed

91 tasks

neilkakkar mentioned this issue Dec 7, 2022

2023 Q1 Team Experiments OKRs PostHog/posthog.com#4828

Merged

4 tasks

joethreepwood closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correlation analysis for cohorts #7875

Correlation analysis for cohorts #7875

joethreepwood commented Jan 4, 2022

marcushyett-ph commented Jan 4, 2022

neilkakkar commented Jan 4, 2022

joethreepwood commented Jan 4, 2022

neilkakkar commented Jan 4, 2022

davidfurlong commented Jan 17, 2023

Correlation analysis for cohorts #7875

Correlation analysis for cohorts #7875

Comments

joethreepwood commented Jan 4, 2022

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Thank you for your feature request – we love each and every one!

marcushyett-ph commented Jan 4, 2022

neilkakkar commented Jan 4, 2022

joethreepwood commented Jan 4, 2022

neilkakkar commented Jan 4, 2022

davidfurlong commented Jan 17, 2023