Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't store pandas.DataFrame with column of datetime that has timezone #926

Closed
iliatimofeev opened this issue Jun 9, 2018 · 4 comments · Fixed by #928
Closed

Can't store pandas.DataFrame with column of datetime that has timezone #926

iliatimofeev opened this issue Jun 9, 2018 · 4 comments · Fixed by #928

Comments

@iliatimofeev
Copy link
Contributor

iliatimofeev commented Jun 9, 2018

sanitize_dataframe does not understand data type datetime64[ns, UTC]

data = pd.DataFrame(pd.date_range('2000-01-01',periods=5).tz_localize('UTC').rename('tz_datetime'))
print(data.info())
alt.Chart(data).mark_rule().encode(x='tz_datetime:T')

Out:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
tz_datetime    5 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1)
memory usage: 120.0 bytes
None
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/altair/vegalite/v2/api.py in to_dict(self, *args, **kwargs)
    312         copy = self.copy()
    313         original_data = getattr(copy, 'data', Undefined)
--> 314         copy.data = _prepare_data(original_data)
    315 
    316         # We make use of two context markers:

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/altair/vegalite/v2/api.py in _prepare_data(data)
     24         return data
     25     elif isinstance(data, pd.DataFrame):
---> 26         return pipe(data, data_transformers.get())
     27     elif isinstance(data, six.string_types):
     28         return core.UrlData(data)

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/toolz/functoolz.py in pipe(data, *funcs)
    550     """
    551     for func in funcs:
--> 552         data = func(data)
    553     return data
    554 

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    281     def __call__(self, *args, **kwargs):
    282         try:
--> 283             return self._partial(*args, **kwargs)
    284         except TypeError as exc:
    285             if self._should_curry(args, kwargs, exc):

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/altair/vegalite/data.py in default_data_transformer(data, max_rows)
     10 @curry
     11 def default_data_transformer(data, max_rows=5000):
---> 12     return pipe(data, limit_rows(max_rows=max_rows), to_values)
     13 
     14 

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/toolz/functoolz.py in pipe(data, *funcs)
    550     """
    551     for func in funcs:
--> 552         data = func(data)
    553     return data
    554 

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    281     def __call__(self, *args, **kwargs):
    282         try:
--> 283             return self._partial(*args, **kwargs)
    284         except TypeError as exc:
    285             if self._should_curry(args, kwargs, exc):

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/altair/utils/data.py in to_values(data)
    123     check_data_type(data)
    124     if isinstance(data, pd.DataFrame):
--> 125         data = sanitize_dataframe(data)
    126         return {'values': data.to_dict(orient='records')}
    127     elif isinstance(data, dict):

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/altair/utils/core.py in sanitize_dataframe(df)
     89             # convert numpy bools to objects; np.bool is not JSON serializable
     90             df[col_name] = df[col_name].astype(object)
---> 91         elif np.issubdtype(dtype, np.integer):
     92             # convert integers to objects; np.int is not JSON serializable
     93             df[col_name] = df[col_name].astype(object)

~/anaconda/anaconda/envs/rr_dev/lib/python3.5/site-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2)
    724     """
    725     if not issubclass_(arg1, generic):
--> 726         arg1 = dtype(arg1).type
    727     if not issubclass_(arg2, generic):
    728         arg2_orig = arg2

TypeError: data type not understood

Chart({
  data:                 tz_datetime
  0 2000-01-01 00:00:00+00:00
  1 2000-01-02 00:00:00+00:00
  2 2000-01-03 00:00:00+00:00
  3 2000-01-04 00:00:00+00:00
  4 2000-01-05 00:00:00+00:00
})

In:

print('altair',alt.__version__)
print('pandas',pd.__version__)
print('numpy',np.__version__)

Out:

altair 2.0.1
pandas 0.22.0
numpy 1.14.2

Same for

altair 2.0.1
pandas 0.23.0
numpy 1.14.3
@iliatimofeev iliatimofeev changed the title sanitize_dataframe does not understand data type datetime64[ns, UTC] Can't store pandas.DataFrame with column of datetime that has timezone Jun 9, 2018
@iliatimofeev
Copy link
Contributor Author

I think that problem is that according to pansdas.dtypes.py L383

DatetimeTZDtype(PandasExtensionDtype)
THIS IS NOT A REAL NUMPY DTYPE, but essentially a sub-class of np.datetime64[ns]

So np.issubdtype(data.dtypes['tz_datetime'], np.integer) crushes.
But code based on altair/utils/core.py#L126 will work as expexted

if str(data.dtypes['tz_datetime']).startswith('datetime'):
    data['tz_datetime'] = data['tz_datetime'].astype(str).replace('NaT', '')
alt.Chart(data).mark_rule().encode(x='tz_datetime:T')

visualization 12

@tweakimp
Copy link

df.index = df.index.tz_localize("utc").tz_convert("Europe/Berlin")

still crushes for me :/

File "C:\Program Files\Python36\lib\site-packages\altair\vegalite\v2\api.py", line 414, in save
result = save(**kwds)
File "C:\Program Files\Python36\lib\site-packages\altair\utils\save.py", line 60, in save
spec = chart.to_dict()
File "C:\Program Files\Python36\lib\site-packages\altair\vegalite\v2\api.py", line 331, in to_dict
dct = super(TopLevelMixin, copy).to_dict(*args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\altair\utils\schemapi.py", line 245, in to_dict
result = _todict({k: v for k, v in self._kwds.items()
File "C:\Program Files\Python36\lib\site-packages\altair\utils\schemapi.py", line 237, in _todict
return {k: _todict(v) for k, v in val.items()
File "C:\Program Files\Python36\lib\site-packages\altair\utils\schemapi.py", line 238, in
if v is not Undefined}
File "C:\Program Files\Python36\lib\site-packages\altair\utils\schemapi.py", line 235, in _todict
return [_todict(v) for v in val]
File "C:\Program Files\Python36\lib\site-packages\altair\utils\schemapi.py", line 235, in
return [_todict(v) for v in val]
File "C:\Program Files\Python36\lib\site-packages\altair\utils\schemapi.py", line 233, in _todict
return val.to_dict(validate=sub_validate, context=context)
File "C:\Program Files\Python36\lib\site-packages\altair\vegalite\v2\api.py", line 314, in to_dict
copy.data = _prepare_data(original_data)
File "C:\Program Files\Python36\lib\site-packages\altair\vegalite\v2\api.py", line 26, in _prepare_data
return pipe(data, data_transformers.get())
File "C:\Program Files\Python36\lib\site-packages\toolz\functoolz.py", line 552, in pipe
data = func(data)
File "C:\Program Files\Python36\lib\site-packages\toolz\functoolz.py", line 283, in call
return self._partial(*args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\altair\vegalite\data.py", line 12, in default_data_transformer
return pipe(data, limit_rows(max_rows=max_rows), to_values)
File "C:\Program Files\Python36\lib\site-packages\toolz\functoolz.py", line 552, in pipe
data = func(data)
File "C:\Program Files\Python36\lib\site-packages\toolz\functoolz.py", line 283, in call
return self._partial(*args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\altair\utils\data.py", line 121, in to_values
data = sanitize_dataframe(data)
File "C:\Program Files\Python36\lib\site-packages\altair\utils\core.py", line 117, in sanitize_dataframe
elif np.issubdtype(dtype, np.integer):
File "C:\Program Files\Python36\lib\site-packages\numpy\core\numerictypes.py", line 814, in issubdtype
arg1 = dtype(arg1).type
TypeError: data type not understood

@iliatimofeev
Copy link
Contributor Author

It works in master but yet not been released.
In altair 2.0.1 you may use workaround.

df.index = df.index.tz_localize("utc").tz_convert("Europe/Berlin").astype(str)
atl.Chart(df.reset_index()) # you code....

PS: altair do not use index of DataFrame you'll need to reset it or just

df['index'] = df.index.tz_localize("utc").tz_convert("Europe/Berlin").astype(str)

@tweakimp
Copy link

ok, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants