Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphaned users for persons on events #11488

Closed
2 tasks
timgl opened this issue Aug 24, 2022 · 10 comments
Closed
2 tasks

Orphaned users for persons on events #11488

timgl opened this issue Aug 24, 2022 · 10 comments
Assignees
Labels
bug Something isn't working right

Comments

@timgl
Copy link
Collaborator

timgl commented Aug 24, 2022

Bug description

If you alias multiple distinct ids together, you create an orphan person_id.

Example:

  • distinct_id DB0FD0F1-90EE-4F0E-B0AE-3115DB2ED254 existed first, with person_id 0182afac-4b8c-0000-d745-e85707d9208b
  • distinct_id 8EDA981A-32DA-4B7C-9558-293334D6FA92, used to have person_id 0182c2b9-1a27-0000-090b-69a06b56097b
  • DB0FD0F1-90EE-4F0E-B0AE-3115DB2ED254 was aliased to 96129, continued using person_id ...08b
  • 8EDA981A-32DA-4B7C-9558-293334D6FA92 was aliased to 96129, now also using person_id ...08b

The problem is if you do a query which has results for ...FA92, you get results with no distinct ids. Clicking on a data point in trends means way fewer people appear.

image

image

If you click on a retention person modal this appears, and clicking on one of these goes to app.posthog.com/persons/undefined
image

How to reproduce

Environment

  • PostHog Cloud
  • self-hosted PostHog, version/commit: please provide

Additional context

Thank you for your bug report – we love squashing them!

@timgl timgl added the bug Something isn't working right label Aug 24, 2022
@neilkakkar
Copy link
Contributor

cc: @macobo @tiina303 - how does the underlying data look like when this happens? Need to understand this better to figure out what to do query side.

When we alias, we move distinctIDs, and delete the old person. I guess this change wouldn't reflect in events, where upto a point, ...FA92 is linked to ..6097b, but later on ..FA92 is linked to ...08b. So unique persons would be off by one, since ..6097b isn't a valid person anymore. Am I missing something?

@neilkakkar
Copy link
Contributor

neilkakkar commented Aug 25, 2022

Let me create an easier to follow example:

distinctID: first, linked to personID: 1. Events with distinctID first have person_id=1

distinctID: second, linked to personID: 2. Events with distinctID second have person_id=2

Now, first is aliased to second.

Now, second and first are both linked to personID 2. Events with either distinctID have person_id=2.

The entire events table looks something like:

event1: distinctID first, person_id=1 (came in pre-alias)
event2: distinctID first, person_id=2 (came in post-alias)
event3: distinctID second, person_id=2

Now without person on events, total count = 3, and unique persons = 1 (since they're joined to person_distinct_id table which has the right new mappings).

But, with person on events, total count = 3, and unique persons = 2 (two different person IDs), and the person modal will have only 1 person ( I think?, since the other would be lost when doing the postgres filtering)

@neilkakkar
Copy link
Contributor

@timgl is this a fair representation^ ?

@macobo
Copy link
Contributor

macobo commented Aug 25, 2022

If that's the scenario, from data side this is expected behavior under person-on-events and is one of the two breaking changes introduced by it.

See https://github.com/PostHog/meta/blob/main/requests-for-comments/2022-01-27-future-of-person-model.md#1-after-the-first-session-we-will-no-longer-merge-events-prior-to-login-with-the-main-persons-events

The conversion buffer now exists to avoid creating extra person_ids on their first landing but subsequent logins/logouts would indeed end up with separate person_ids. This is something we've yet to document and need to do a good job on!

Note not sure how the person modal now works - we probably be should not showing "invalid" links here.

@neilkakkar
Copy link
Contributor

D'oh, of course.

I think this is more a UX issue: Not only should we have docs about the change, but also show it in practice. When there's a count mismatch, we shouldn't show these invalid users (that seems specific to retention, something borked, because I'm pretty sure the trend graph just doesn't show those people).

So when lower than expected people, point them to a link explaining why it's so? (@clarkus for better ideas). It can be because of person on events (expected), CH person mismatch (this is on us) - including person deletions (deleted from postgres but not CH yet).

@neilkakkar
Copy link
Contributor

Oh, and the other (new) issue with person-on-events will be that our current pagination logic wouldn't work, since persons can be fewer than the limit. (cc: @EDsCODE )

@tiina303
Copy link
Contributor

cc @pjhul for documentation part - we would see lower counts in insights not higher when persons were merged based on the discussion ^

btw I highlighted this problem in https://docs.google.com/document/d/19KN-WKFH19bpTdKw8nApf8qT9rMPNb9KDUr8fxP2CLY/edit?disco=AAAAdHt6c0Q

@clarkus
Copy link
Contributor

clarkus commented Aug 25, 2022

So when lower than expected people, point them to a link explaining why it's so? (@clarkus for better ideas). It can be because of person on events (expected), CH person mismatch (this is on us) - including person deletions (deleted from postgres but not CH yet).

Documentation with a prominent link in product seems like a good place to start.

@timgl
Copy link
Collaborator Author

timgl commented Aug 25, 2022

When there's a count mismatch, we shouldn't show these invalid users (that seems specific to retention, something borked, because I'm pretty sure the trend graph just doesn't show those people).

Yeah, even in trends it'd be good to somehow acknowledge "40 users in this graph, of which 10 were orphaned"

@tiina303
Copy link
Contributor

tiina303 commented Feb 6, 2023

Since PoE embrace joins we don't run into this via merges + the UI now reflects this (for person deletions without data deletions). I'm closing this.

@tiina303 tiina303 closed this as completed Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

5 participants