Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trends date buckets may be incorrect for non-utc timezones #11492

Closed
1 of 5 tasks
benjackwhite opened this issue Aug 25, 2022 · 1 comment
Closed
1 of 5 tasks

Trends date buckets may be incorrect for non-utc timezones #11492

benjackwhite opened this issue Aug 25, 2022 · 1 comment
Labels
bug Something isn't working right

Comments

@benjackwhite
Copy link
Contributor

benjackwhite commented Aug 25, 2022

Bug description

The issue is as follows:

  1. For non-utc timezones we do a request to get a time range bucketed by days e.g. (-3d) on 2022-01-14
  2. This results in a timezone shifted query as we want to keep all datetimes UTC for clickhouse
    1. e.g. UTC+2 Berlin will end up with a query from 2022-01-10T22:00:00Z-2022-01-14T21:59:59Z representing 2022-01-11 to 2022-01-14
  3. The results are bucketed by date which means we end up with a response like
    1. 2022-01-10: X
    2. 2022-01-11: X
    3. 2022-01-12: X
    4. 2022-01-13: X
    5. 2022-01-14: X
  4. When it should be returning buckets like
    1. 2022-01-11: X
    2. 2022-01-12: X
    3. 2022-01-13: X
    4. 2022-01-14: X
  5. The buckets may actually look wrong to the non-utc user as each bucket is offset by the number of hours that the zone is offset

Investigation

There are several compound problems here, none of which have a super-straightforward solution...

  1. Filter DateMixin modifies the incoming time (including relatives like -7d) so that date_from is at the start of the interval and date_from is at the end.
  2. For the clickhouse queries the windowing modifies the given date to UTC of the projects timezone which (depending on the direction) results in extra data loaded either one day in the future or in the past
  3. This is compounded again when we perform the clickhouse queries where we sometimes round the date_from down to the start of the interval.
  4. So e.g. for timezone UTC+2 this results in:
    • date_from=2022-01-15 is first converted to start of day => 2022-01-15 00:00:00
    • date is converted to a UTC version => 2022-01-14 22:00:00
    • sql query rounds it to the start of the day => 2022-01-14 00:00:00

I think we need to either:

  1. Convert the datetime fully in Python at the Filter class (would make team a required arg). Then remove the timezone conversion work in the query building
  2. Leave the dates in Filter alone (what to do about -7d that use "timezone.now()" 🤔?) and ensure the conversion happens only in the sql queries
  • Remove date modification from filter
  • Allow the original date value to be gotten (relative date parsing including timezone)
  • Let the conversion happen as before

Eric and or Neil should probably get involved

Environment

  • PostHog Cloud
  • self-hosted PostHog, version/commit: please provide
@EDsCODE
Copy link
Member

EDsCODE commented Sep 22, 2022

Thanks for the write up here! I think 1 is the way to go. I've been staring at this problem today because I was running into bucketing issues here too. I'm thinking we do option 1. This means that we:

  1. Interpret all incoming date arguments as timezone aware (based on whatever the timezone is set on team).
  2. Generate the date_from and date_to according to the timezone
  3. Pass in these timestamps to clickhouse queries using toDateTime(%(timestramp_string)s, %(timezone)s) to let clickhouse know it's working with a timezone aware timestamp.
  4. Make sure the returned timestamp is also timestamp aware

We'll be operating fully in the specified timezone and leave it to clickhouse comparison to handle comparing across UTC <> specified timezone which should be fine. The main challenge here is just refactoring everything that touches filter so team is a required arg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

2 participants