Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxonomy #3228

Closed
jamesefhawkins opened this issue Feb 6, 2021 · 3 comments
Closed

Taxonomy #3228

jamesefhawkins opened this issue Feb 6, 2021 · 3 comments
Labels
concept Ideas that need some shaping up still enhancement New feature or request stale

Comments

@jamesefhawkins
Copy link
Collaborator

Is your feature request related to a problem?

In line with our growth strategy, an idea for a potential team ("collaborate") feature is data taxonomy.

As the person who didn't implement the events, it is hard to understand the long list of events in PostHog currently.

A few examples:

Screenshot 2021-02-06 at 14 00 04

Things that I can't tell from this:

  • Does this mean they signed up for a demo?
  • Does it mean they signed up just for cloud?
  • Does it mean their account creation was successful or could it mean they clicked the sign up button but ie the password wasn't validated?

Screenshot 2021-02-06 at 14 02 53

  • What does the $ sign mean?
  • Left which page? Any? In app or on posthog.com?

We already have some basic plugins for this which allow a developer to edit their taxonomy a bit themselves. However, I think we shouldn't get so far as to provide a full UX for this through our frontend without it being a team feature (the need for a real solution here comes from a less te

Describe the solution you'd like

A few ideas:

Screenshot 2021-02-06 at 14 12 23

Screenshot 2021-02-06 at 14 12 37

Screenshot 2021-02-06 at 14 15 19

Some of the things above that seem cool:

  • Seeing which file in the code has the tracking
  • Ability to add a description to the event
  • IDE integration looks cool - I wonder how much this helps get it into the natural flow for engineers vs is a gimmick (genuinely)
  • I could also see some way of raising questions about the events when a user has a question "hey Paolo, does this mean X or Y?" and capturing the thread in the product, so others could also see this

Describe alternatives you've considered

Integrate with tools that do this.

Additional context

I'm suggesting this as it came from some calls with larger enterprises who are in the specific situation of moving from log based to event based analytics. The risk is we lose clients because they think they have to go outside posthog to another technology that feels like it'll enable higher quality of event capture, and then that project goes on forever.

This would partly help with this fear. We'd also have to think about validation: surfacing to users failed events, successful events and therefore giving a clear sense of data quality, but that feels trivial. I'm then wondering what else we'd actually be missing if anything to reach best in class from a data in perspective.

@paolodamico
Copy link
Contributor

Additional context

I believe the original issue post includes a bunch of context so just adding incremental stuff.

  • In a general sense I think this feature encompasses two main use cases. One, is defining data taxonomy to ensure data consistency, reliability, detect anomalies (event ingestion). Two, is increasing team collaboration from this (e.g. understand events that you didn't implement). This second use case addresses what Dashboard collaboration / sharing metrics #3344 does for dashboards.
  • As reported/confirmed by some customer calls, even the most organized teams have an issue with this, because your spreadsheet/Jira/doc is never up-to-date. There's an uncanny number of things that cause this: events not implemented yet, properties changing, bug in implementation, changes to the product that affected events, timing/priorities.
  • Goals:
    1. Increase value of PostHog by increasing quality of data coming in.
    2. Increase team collaboration and engagement of PostHog.
  • Assumptions:
    • Team members not involved in the planning or implementation of events cannot (or at least have a very hard time) driving value from PostHog due to a lack of context.
    • These companies spend time and resources what events to track, thinking about analytics need in advance (opposite argument to auto-capture).
    • There is at least one person in the team concerned with keeping event records up-to-date.

Prioritization notes

  • Target roles: Technical (facilitate event definition, detect anomalies), Non-technical (obtain value out of PH from what others implemented), Analysts (data consistency), Enterprise Procurement (compliance, privacy, ...)
  • Target customers: Scaleup (3.5) & Enterprise (3)
  • Revenue consideration: Could bring immediate new revenue stream (can be a premium feature, it's commonly billed separately, maybe increase the PPE if you use it?). Worth noting that while this would mean higher long-term value for customers, by design this feature would probably reduce event usage (eliminating redundancies, malformed events, up to turning off autocapture).

Ideas

(Ideas from the main issue included in this list for simplicity)

  • Enable you to register events, add a description and define properties (with property types and descriptions). Properties could also be set as required.
    • We should pay special attention to the UX here, doing this needs to be very simple, very fast. Otherwise it will be dropped after one use.
  • Prefill event and properties information and update it based on the events we're actually receiving (e.g. register new events when they come in). This can also be useful for documenting when properties change on an event (e.g. you add a new property) to keep a version history and raise alerts.
  • Extend events stats based on taxonomy (e.g. % of events not inline with policy).
  • Discussion functionality. Ask questions, raise issues, generally comment on events.
  • Tag events (same as with Dashboard collaboration / sharing metrics #3344) it can help identify teams/owners, product sections where it belongs, client vs. server, deprecated events, ...
  • IDE integration of taxonomy definitions.
  • Anomaly detection & alerting. Examples: too many events coming in malformed, new properties introduced, stopped receiving the usual number of events, etc.
  • Identify PII and allow obfuscation/removal. Particularly relevant to identify more sensitive stuff like SSNs, credit card numbers, ...
  • We should probably complement this with some best practices on naming events, convetions, properties to track, ... seems like most companies have this in some shape or form. We've also been asked for this in some ocassions by our own users.

What others are doing

  • Segment Protocols offers a very complete product that encompasses most of the topics discussed here. Their main selling points: keep everyone in the team aligned on events, validate/enforce data for consistency, and even transform data (basically our plugins functionality).
  • Segment also offers a Privacy feature that among other things, detects PII, masks it in any user-facing interfaces and allows tracking the level of sensitivity on tracked properties.
  • Intersting to note that as an earlier stage approach, Segment has also published a spreadsheet template to track this stuff.
  • Tangentially related, Mixpanel introduced intelligent anomaly detection which contrary to what we're discussing here does not make use of a defined schema.
  • mParticle has a similar functionality in which you have the concept of "catalog" (what currently is), "plans" (what you want it to be), which you activate on events progressively to enforce a schema. When a plan is enforced, you can monitor violations. You have the option of blocking events in violation of policies.
  • Amplitude lets you define a similar schema with Govern and choose to reject unplanned properties and even event types. An interesting feature that Amplitude offers is bulk editing schemas with CSV.
  • Heap offers functionality in which event definitions follow an approval process. They offer a remedial solution for stale events. They keep a history of definitions.

@posthog-bot
Copy link
Contributor

This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

@posthog-bot
Copy link
Contributor

This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.

@posthog-bot posthog-bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concept Ideas that need some shaping up still enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

3 participants