You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For non-utc timezones we do a request to get a time range bucketed by days e.g. (-3d) on 2022-01-14
This results in a timezone shifted query as we want to keep all datetimes UTC for clickhouse
e.g. UTC+2 Berlin will end up with a query from 2022-01-10T22:00:00Z-2022-01-14T21:59:59Z representing 2022-01-11 to 2022-01-14
The results are bucketed by date which means we end up with a response like
2022-01-10: X
2022-01-11: X
2022-01-12: X
2022-01-13: X
2022-01-14: X
When it should be returning buckets like
2022-01-11: X
2022-01-12: X
2022-01-13: X
2022-01-14: X
The buckets may actually look wrong to the non-utc user as each bucket is offset by the number of hours that the zone is offset
Investigation
There are several compound problems here, none of which have a super-straightforward solution...
Filter DateMixin modifies the incoming time (including relatives like -7d) so that date_from is at the start of the interval and date_from is at the end.
For the clickhouse queries the windowing modifies the given date to UTC of the projects timezone which (depending on the direction) results in extra data loaded either one day in the future or in the past
date_from=2022-01-15 is first converted to start of day => 2022-01-15 00:00:00
date is converted to a UTC version => 2022-01-14 22:00:00
sql query rounds it to the start of the day => 2022-01-14 00:00:00
I think we need to either:
Convert the datetime fully in Python at the Filter class (would make team a required arg). Then remove the timezone conversion work in the query building
Leave the dates in Filter alone (what to do about -7d that use "timezone.now()" 🤔?) and ensure the conversion happens only in the sql queries
Remove date modification from filter
Allow the original date value to be gotten (relative date parsing including timezone)
Let the conversion happen as before
Eric and or Neil should probably get involved
Environment
PostHog Cloud
self-hosted PostHog, version/commit: please provide
The text was updated successfully, but these errors were encountered:
Thanks for the write up here! I think 1 is the way to go. I've been staring at this problem today because I was running into bucketing issues here too. I'm thinking we do option 1. This means that we:
Interpret all incoming date arguments as timezone aware (based on whatever the timezone is set on team).
Generate the date_from and date_to according to the timezone
Pass in these timestamps to clickhouse queries using toDateTime(%(timestramp_string)s, %(timezone)s) to let clickhouse know it's working with a timezone aware timestamp.
Make sure the returned timestamp is also timestamp aware
We'll be operating fully in the specified timezone and leave it to clickhouse comparison to handle comparing across UTC <> specified timezone which should be fine. The main challenge here is just refactoring everything that touches filter so team is a required arg
Bug description
The issue is as follows:
2022-01-10T22:00:00Z-2022-01-14T21:59:59Z
representing2022-01-11 to 2022-01-14
Investigation
There are several compound problems here, none of which have a super-straightforward solution...
-7d
) so thatdate_from
is at the start of the interval anddate_from
is at the end.date_from
down to the start of the interval.date_from=2022-01-15
is first converted to start of day =>2022-01-15 00:00:00
2022-01-14 22:00:00
2022-01-14 00:00:00
I think we need to either:
team
a required arg). Then remove the timezone conversion work in the query buildingEric and or Neil should probably get involved
Environment
The text was updated successfully, but these errors were encountered: