Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cftime.DatetimeNoLeap incorrectly decoded from netCDF file #8298

Open
5 tasks done
TomNicholas opened this issue Oct 12, 2023 · 14 comments · May be fixed by #8942
Open
5 tasks done

cftime.DatetimeNoLeap incorrectly decoded from netCDF file #8298

TomNicholas opened this issue Oct 12, 2023 · 14 comments · May be fixed by #8942

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Oct 12, 2023

What happened?

I have been given a netCDF file (I think it's netCDF3) which when I open it does not decode the time variable in the way I expected it to. The time coordinate created is a numpy object array

Screenshot from 2023-10-12 14-10-29

What did you expect to happen?

I expected it to automatically create a coordinate backed by a CFTimeIndex object, not a CFTimeIndex object wrapped inside another array type.

Minimal Complete Verifiable Example

The original problematic file is 455MB (I can share it if necessary), but I can create a small netCDF file that displays the same issue.

import cftime

time_values = [cftime.DatetimeNoLeap(347, 2, 1, 0, 0, 0, 0, has_year_zero=True)]
time_ds = xr.Dataset(coords={'time': (['time'], time_values)})
print(time_ds)
time_ds.to_netcdf('time_mwe.nc')
<xarray.Dataset>
Dimensions:  (time: 1)
Coordinates:
  * time     (time) object 0347-02-01 00:00:00
Data variables:
    *empty*
ds = xr.open_dataset('time_mwe.nc', engine='netcdf4', decode_times=True, use_cftime=True)
print(ds)
<xarray.Dataset>
Dimensions:  (time: 1)
Coordinates:
  * time     (time) object 0347-02-01 00:00:00
Data variables:
    *empty*

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

cftime 1.6.2
netcdf4 1.6.4
xarray 2023.8.0
@TomNicholas
Copy link
Member Author

This bug is causing me pain, especially in the index. For the array I can just do ds['time'].astype('datetime64[ns]') but I can't actually work out how to coerce the type of the array backing the CFTimeIndex object after it's been created.

I would prefer to fix the bug, but I actually don't know where in xarray to look. Perhaps @kmuehlbauer you might know where to point me? :)

@spencerkclark
Copy link
Member

@TomNicholas sorry for not commenting on this earlier. As far as I can tell this is just an issue with the HTML repr, i.e. it should not affect indexing or otherwise interacting with the time coordinate (ds.indexes["time"] returns a CFTimeIndex). It's definitely something we should fix, but it should not require any type coercion to work around.

@TomNicholas
Copy link
Member Author

Thanks @spencerkclark

this is just an issue with the HTML repr,

Oh really?? So I'm trying to coerce unecessarily? But is the dtype of the array backing the CFTIndex supposed to be object?

@TomNicholas
Copy link
Member Author

Like this looks like the wrong type of array

array([cftime.DatetimeNoLeap(347, 2, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(347, 3, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(362, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
       dtype=object)

@dcherian
Copy link
Contributor

dcherian commented Dec 6, 2023

That's correct :)

@spencerkclark
Copy link
Member

spencerkclark commented Dec 6, 2023

Yes, this is expected. cftime.DatetimeNoLeap objects are not a NumPy dtype, so they have to be stored in an object dtype array.

As an aside, if you need to coerce to datetime64[ns] (though it obviously will not work in this example) I would recommend using ds.indexes["time"].to_datetimeindex(), as ds["time"].astype("datetime64[ns]") is vulnerable to overflow. See also #5107.

@keewis
Copy link
Collaborator

keewis commented Dec 6, 2023

(as an aside, numpy has support for writing external dtypes so cftime could make use of that, though it might be worth waiting until the string dtypes in NEP 55 are polished and accepted into numpy's main)

@spencerkclark
Copy link
Member

spencerkclark commented Dec 7, 2023

The repr looks similar for other types of pandas indexes too, so I'm guessing this was at some level intentional (or some flexible index internals are leaking through the repr):

Screenshot 2023-12-06 at 7 36 29 PM

Either way it could make sense to revisit that given the confusion.

@kmuehlbauer
Copy link
Contributor

@TomNicholas Sorry Tom, was already AFK. Would also have referred to @spencerkclark's expertise, though.

@TomNicholas
Copy link
Member Author

Thank you everyone for your help! ❤️

I am still confused about what the difference between

cftime.datetime(1999, 1, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True)

and

cftime.DatetimeNoLeap(347, 2, 1, 0, 0, 0, 0, has_year_zero=True)

is, but I was now able to do the operation I needed to do.


Should we leave this open to track the repr not being super clear, or just close?

@dcherian
Copy link
Contributor

I think we can close.

Though this could be a good opportunity to update our docs to be more clear about CFTime handling :)

@dcherian dcherian removed the bug label Dec 15, 2023
@spencerkclark
Copy link
Member

For sure! From the perspective of cftime they are the same, but for historical reasons xarray continues to use the calendar-specific subclasses. To adopt the use of calendar-aware cftime.datetime objects internally will require a few changes in xarray, but maybe it's time we make that switch. For context it was not always possible for cftime.datetime instances to be calendar-aware (it used to be a base class with limited functionality).

I am fine with closing. Maybe if this kind of confusion around PandasIndex comes up again we can revisit the repr?

@TomNicholas
Copy link
Member Author

TomNicholas commented Jan 7, 2024

It seems I can't concatenate this index either:

dates = [
    cftime.DatetimeNoLeap.strptime('1999-01', '%Y-%m', calendar='noleap', has_year_zero=True),
    cftime.DatetimeNoLeap.strptime('1999-01', '%Y-%m', calendar='noleap', has_year_zero=True),
]
da = xr.DataArray(data=[0, 1], coords={'time': ('time', dates)}, dims=['time'])
print(da)
<xarray.DataArray (time: 2)>
array([0, 1])
Coordinates:
  * time     (time) object 1999-01-01 00:00:00 1999-01-01 00:00:00
xr.concat([da, da], dim='time')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/IPython/core/formatters.py:344, in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/common.py:172, in AbstractArray._repr_html_(self)
    170 if OPTIONS["display_style"] == "text":
    171     return f"<pre>{escape(repr(self))}</pre>"
--> 172 return formatting_html.array_repr(self)

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/formatting_html.py:322, in array_repr(arr)
    320 if hasattr(arr, "xindexes"):
    321     indexes = _get_indexes_dict(arr.xindexes)
--> 322     sections.append(index_section(indexes))
    324 sections.append(attr_section(arr.attrs))
    326 return _obj_repr(arr, header_components, sections)

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/formatting_html.py:199, in _mapping_section(mapping, name, details_func, max_items_collapse, expand_option_name, enabled)
    192 expanded = _get_boolean_with_default(
    193     expand_option_name, n_items < max_items_collapse
    194 )
    195 collapsed = not expanded
    197 return collapsible_section(
    198     name,
--> 199     details=details_func(mapping),
    200     n_items=n_items,
    201     enabled=enabled,
    202     collapsed=collapsed,
    203 )

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/formatting_html.py:159, in summarize_indexes(indexes)
    158 def summarize_indexes(indexes):
--> 159     indexes_li = "".join(
    160         f"<li class='xr-var-item'>{summarize_index(v, i)}</li>"
    161         for v, i in indexes.items()
    162     )
    163     return f"<ul class='xr-var-list'>{indexes_li}</ul>"

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/formatting_html.py:160, in <genexpr>(.0)
    158 def summarize_indexes(indexes):
    159     indexes_li = "".join(
--> 160         f"<li class='xr-var-item'>{summarize_index(v, i)}</li>"
    161         for v, i in indexes.items()
    162     )
    163     return f"<ul class='xr-var-list'>{indexes_li}</ul>"

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/formatting_html.py:144, in summarize_index(coord_names, index)
    142 index_id = f"index-{uuid.uuid4()}"
    143 preview = escape(inline_index_repr(index))
--> 144 details = short_index_repr_html(index)
    146 data_icon = _icon("icon-database")
    148 return (
    149     f"<div class='xr-index-name'><div>{name}</div></div>"
    150     f"<div class='xr-index-preview'>{preview}</div>"
   (...)
    154     f"<div class='xr-index-data'>{details}</div>"
    155 )

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/formatting_html.py:136, in short_index_repr_html(index)
    133 if hasattr(index, "_repr_html_"):
    134     return index._repr_html_()
--> 136 return f"<pre>{escape(repr(index))}</pre>"

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/indexes.py:851, in PandasIndex.__repr__(self)
    850 def __repr__(self):
--> 851     return f"PandasIndex({repr(self.index)})"

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/coding/cftimeindex.py:356, in CFTimeIndex.__repr__(self)
    348     end_str = format_times(
    349         self.values[-REPR_ELLIPSIS_SHOW_ITEMS_FRONT_END:],
    350         display_width,
    351         offset=offset,
    352         first_row_offset=offset,
    353     )
    354     datastr = "\n".join([front_str, f"{' '*offset}...", end_str])
--> 356 attrs_str = format_attrs(self)
    357 # oneliner only if smaller than display_width
    358 full_repr_str = f"{klass_name}([{datastr}], {attrs_str})"

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/coding/cftimeindex.py:276, in format_attrs(index, separator)
    270 def format_attrs(index, separator=", "):
    271     """Format attributes of CFTimeIndex for __repr__."""
    272     attrs = {
    273         "dtype": f"'{index.dtype}'",
    274         "length": f"{len(index)}",
    275         "calendar": f"'{index.calendar}'",
--> 276         "freq": f"'{index.freq}'" if len(index) >= 3 else None,
    277     }
    279     attrs_str = [f"{k}={v}" for k, v in attrs.items()]
    280     attrs_str = f"{separator}".join(attrs_str)

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/coding/cftimeindex.py:704, in CFTimeIndex.freq(self)
    701 """The frequency used by the dates in the index."""
    702 from xarray.coding.frequencies import infer_freq
--> 704 return infer_freq(self)

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/coding/frequencies.py:98, in infer_freq(index)
     95         index = CFTimeIndex(index.values)
     97 if isinstance(index, CFTimeIndex):
---> 98     inferer = _CFTimeFrequencyInferer(index)
     99     return inferer.get_freq()
    101 return pd.infer_freq(index)

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/coding/frequencies.py:107, in _CFTimeFrequencyInferer.__init__(self, index)
    105 def __init__(self, index):
    106     self.index = index
--> 107     self.values = index.asi8
    109     if len(index) < 3:
    110         raise ValueError("Need at least 3 dates to infer frequency")

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/coding/cftimeindex.py:685, in CFTimeIndex.asi8(self)
    681 from xarray.core.resample_cftime import exact_cftime_datetime_difference
    683 epoch = self.date_type(1970, 1, 1)
    684 return np.array(
--> 685     [
    686         _total_microseconds(exact_cftime_datetime_difference(epoch, date))
    687         for date in self.values
    688     ],
    689     dtype=np.int64,
    690 )

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/coding/cftimeindex.py:686, in <listcomp>(.0)
    681 from xarray.core.resample_cftime import exact_cftime_datetime_difference
    683 epoch = self.date_type(1970, 1, 1)
    684 return np.array(
    685     [
--> 686         _total_microseconds(exact_cftime_datetime_difference(epoch, date))
    687         for date in self.values
    688     ],
    689     dtype=np.int64,
    690 )

File /glade/work/tnicholas/conda-envs/kerchunk3.9/lib/python3.9/site-packages/xarray/core/resample_cftime.py:495, in exact_cftime_datetime_difference(a, b)
    461 def exact_cftime_datetime_difference(a: CFTimeDatetime, b: CFTimeDatetime):
    462     """Exact computation of b - a
    463 
    464     Assumes:
   (...)
    493     datetime.timedelta
    494     """
--> 495     seconds = b.replace(microsecond=0) - a.replace(microsecond=0)
    496     seconds = int(round(seconds.total_seconds()))
    497     microseconds = b.microsecond - a.microsecond

File src/cftime/_cftime.pyx:1574, in cftime._cftime.datetime.__sub__()

TypeError: cannot compute the time difference between dates with different calendars

EDIT: And if I try to drop the index first I get a dataarray object which can be printed as a string but can't be displayed using the HTML repr:

concatted = xr.concat([da.drop_indexes('time'), da.drop_indexes('time')], dim='time', 

@spencerkclark
Copy link
Member

spencerkclark commented Jan 8, 2024

@TomNicholas from the xarray point of view this essentially is the same issue since cftime.DatetimeNoLeap.strptime (admittedly awkwardly) returns a cftime.datetime instance:

>>> import cftime
>>> cftime.DatetimeNoLeap.strptime('1999-01', '%Y-%m', calendar='noleap', has_year_zero=True)
cftime.datetime(1999, 1, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True)

Concatenation should work OK with indexes composed of cftime.DatetimeNoLeap instances.

@spencerkclark spencerkclark linked a pull request Apr 14, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants