Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: 'epoch' date format in to_json #57987

Merged
merged 12 commits into from
Apr 19, 2024
11 changes: 2 additions & 9 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1949,13 +1949,6 @@ Writing in ISO date format, with microseconds:
json = dfd.to_json(date_format="iso", date_unit="us")
json

Epoch timestamps, in seconds:

.. ipython:: python

json = dfd.to_json(date_format="epoch", date_unit="s")
json

Writing to a file, with a date index and a date column:

.. ipython:: python
Expand All @@ -1965,7 +1958,7 @@ Writing to a file, with a date index and a date column:
dfj2["ints"] = list(range(5))
dfj2["bools"] = True
dfj2.index = pd.date_range("20130101", periods=5)
dfj2.to_json("test.json")
dfj2.to_json("test.json", date_format="iso")

with open("test.json") as fh:
print(fh.read())
Expand Down Expand Up @@ -2140,7 +2133,7 @@ Dates written in nanoseconds need to be read back in nanoseconds:
.. ipython:: python

from io import StringIO
json = dfj2.to_json(date_unit="ns")
json = dfj2.to_json(date_format="iso", date_unit="ns")

# Try to parse timestamps as milliseconds -> Won't Work
dfju = pd.read_json(StringIO(json), date_unit="ms")
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@ Other Deprecations
- Deprecated :meth:`Timestamp.utcnow`, use ``Timestamp.now("UTC")`` instead (:issue:`56680`)
- Deprecated allowing non-keyword arguments in :meth:`Series.to_markdown` except ``buf``. (:issue:`57280`)
- Deprecated allowing non-keyword arguments in :meth:`Series.to_string` except ``buf``. (:issue:`57280`)
- Deprecated using ``epoch`` date format in :meth:`DataFrame.to_json` and :meth:`Series.to_json`, use ``iso`` instead.
-

.. ---------------------------------------------------------------------------
Expand Down
25 changes: 25 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2328,6 +2328,11 @@ def to_json(
'iso' = ISO8601. The default depends on the `orient`. For
``orient='table'``, the default is 'iso'. For all other orients,
the default is 'epoch'.

.. deprecated:: 3.0.0
'epoch' date format is deprecated and will be removed in a future
version, please use 'iso' instead.

double_precision : int, default 10
The number of decimal places to use when encoding
floating point values. The possible maximal value is 15.
Expand Down Expand Up @@ -2530,6 +2535,26 @@ def to_json(
date_format = "iso"
elif date_format is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need to warn for anything that is not currently iso, including when date_format is None (although the message will be different)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a warning for the date_format=None case that previously defaulted to "epoch"; this should in the future default to "iso"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the to_json use cases don't involve dates and wouldn't be affected by the date_format value, throwing a warning in these cases might be unnecessary, essentially they will need to pass date_format='iso' for no reason to silence this warning, are you sure we should do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be more specific we need to warn when date_format=None and we actually serialize timestamp types. I agree no point in warning if a DataFrame has no timestamp type, but if users are relying on the default epoch behavior they need to be warned of the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd curious how would users get the old behavior? It would be good to add that in the warning message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior as in just an integer? I think the problem with that is it was an implementation detail of pandas spilling out into the JSON serializer. Historically our timestamps were exclusively nanoseconds since the Unix epoch, but with all the work @jbrockmendel has been doing that is no longer true (and _usually not true).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior as in just an integer?

Yeah. Just checking if we can still offer a suggestion for a migration path if they want to keep the old behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Especially with our auto-inferencing of resolutions I don't see how it would be usable at all roundtripping through JSON

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sounds good

date_format = "epoch"
dtypes = (
self.dtypes.values
if self.ndim == 2
else np.array([self.dtype], dtype=object)
)
if any(lib.is_np_dtype(dtype, "mM") for dtype in dtypes):
warnings.warn(
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
elif date_format == "epoch":
# GH#57063
warnings.warn(
"'epoch' date format is deprecated and will be removed in a future "
"version, please use 'iso' date format instead.",
FutureWarning,
stacklevel=find_stack_level(),
)

config.is_nonnegative_int(indent)
indent = indent or 0
Expand Down
11 changes: 8 additions & 3 deletions pandas/tests/io/json/test_json_table_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -451,12 +451,17 @@ def test_to_json_categorical_index(self):
assert result == expected

def test_date_format_raises(self, df_table):
msg = (
error_msg = (
"Trying to write with `orient='table'` and `date_format='epoch'`. Table "
"Schema requires dates to be formatted with `date_format='iso'`"
)
with pytest.raises(ValueError, match=msg):
df_table.to_json(orient="table", date_format="epoch")
warning_msg = (
"'epoch' date format is deprecated and will be removed in a future "
"version, please use 'iso' date format instead."
)
with pytest.raises(ValueError, match=error_msg):
with tm.assert_produces_warning(FutureWarning, match=warning_msg):
df_table.to_json(orient="table", date_format="epoch")

# others work
df_table.to_json(orient="table", date_format="iso")
Expand Down
147 changes: 127 additions & 20 deletions pandas/tests/io/json/test_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ def test_frame_non_unique_index_raises(self, orient):
with pytest.raises(ValueError, match=msg):
df.to_json(orient=orient)

@pytest.mark.filterwarnings("ignore::FutureWarning")
@pytest.mark.parametrize("orient", ["split", "values"])
@pytest.mark.parametrize(
"data",
Expand Down Expand Up @@ -761,7 +762,12 @@ def test_series_with_dtype(self):
)
def test_series_with_dtype_datetime(self, dtype, expected):
s = Series(["2000-01-01"], dtype="datetime64[ns]")
data = StringIO(s.to_json())
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
data = StringIO(s.to_json())
result = read_json(data, typ="series", dtype=dtype)
tm.assert_series_equal(result, expected)

Expand Down Expand Up @@ -807,12 +813,18 @@ def test_convert_dates(self, datetime_series, datetime_frame):
df = datetime_frame
df["date"] = Timestamp("20130101").as_unit("ns")

json = StringIO(df.to_json())
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
json = StringIO(df.to_json())
result = read_json(json)
tm.assert_frame_equal(result, df)

df["foo"] = 1.0
json = StringIO(df.to_json(date_unit="ns"))
with tm.assert_produces_warning(FutureWarning, match=msg):
json = StringIO(df.to_json(date_unit="ns"))

result = read_json(json, convert_dates=False)
expected = df.copy()
Expand All @@ -822,7 +834,8 @@ def test_convert_dates(self, datetime_series, datetime_frame):

# series
ts = Series(Timestamp("20130101").as_unit("ns"), index=datetime_series.index)
json = StringIO(ts.to_json())
with tm.assert_produces_warning(FutureWarning, match=msg):
json = StringIO(ts.to_json())
result = read_json(json, typ="series")
tm.assert_series_equal(result, ts)

Expand All @@ -835,15 +848,23 @@ def test_date_index_and_values(self, date_format, as_object, date_typ):
data.append("a")

ser = Series(data, index=data)
result = ser.to_json(date_format=date_format)

expected_warning = None
if date_format == "epoch":
expected = '{"1577836800000":1577836800000,"null":null}'
expected_warning = FutureWarning
else:
expected = (
'{"2020-01-01T00:00:00.000":"2020-01-01T00:00:00.000","null":null}'
)

msg = (
"'epoch' date format is deprecated and will be removed in a future "
"version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(expected_warning, match=msg):
result = ser.to_json(date_format=date_format)

if as_object:
expected = expected.replace("}", ',"a":"a"}')

Expand Down Expand Up @@ -940,7 +961,12 @@ def test_date_unit(self, unit, datetime_frame):
df.iloc[2, dl] = Timestamp("21460101 20:43:42")
df.iloc[4, dl] = pd.NaT

json = df.to_json(date_format="epoch", date_unit=unit)
msg = (
"'epoch' date format is deprecated and will be removed in a future "
"version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
json = df.to_json(date_format="epoch", date_unit=unit)

# force date unit
result = read_json(StringIO(json), date_unit=unit)
Expand All @@ -950,6 +976,34 @@ def test_date_unit(self, unit, datetime_frame):
result = read_json(StringIO(json), date_unit=None)
tm.assert_frame_equal(result, df)

@pytest.mark.parametrize(
"df, warn",
[
(DataFrame({"A": ["a", "b", "c"], "B": np.arange(3)}), None),
(DataFrame({"A": [True, False, False]}), None),
(
DataFrame(
{"A": ["a", "b", "c"], "B": pd.to_timedelta(np.arange(3), unit="D")}
),
FutureWarning,
),
(
DataFrame(
{"A": pd.to_datetime(["2020-01-01", "2020-02-01", "2020-03-01"])}
),
FutureWarning,
),
],
)
def test_default_epoch_date_format_deprecated(self, df, warn):
# GH 57063
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(warn, match=msg):
df.to_json()

@pytest.mark.parametrize("unit", ["s", "ms", "us"])
def test_iso_non_nano_datetimes(self, unit):
# Test that numpy datetimes
Expand Down Expand Up @@ -1019,7 +1073,12 @@ def test_doc_example(self):
dfj2["bools"] = True
dfj2.index = date_range("20130101", periods=5)

json = StringIO(dfj2.to_json())
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
json = StringIO(dfj2.to_json())
result = read_json(json, dtype={"ints": np.int64, "bools": np.bool_})
tm.assert_frame_equal(result, result)

Expand Down Expand Up @@ -1056,19 +1115,26 @@ def test_timedelta(self):
ser = Series([timedelta(23), timedelta(seconds=5)])
assert ser.dtype == "timedelta64[ns]"

result = read_json(StringIO(ser.to_json()), typ="series").apply(converter)
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
result = read_json(StringIO(ser.to_json()), typ="series").apply(converter)
tm.assert_series_equal(result, ser)

ser = Series([timedelta(23), timedelta(seconds=5)], index=Index([0, 1]))
assert ser.dtype == "timedelta64[ns]"
result = read_json(StringIO(ser.to_json()), typ="series").apply(converter)
with tm.assert_produces_warning(FutureWarning, match=msg):
result = read_json(StringIO(ser.to_json()), typ="series").apply(converter)
tm.assert_series_equal(result, ser)

frame = DataFrame([timedelta(23), timedelta(seconds=5)])
assert frame[0].dtype == "timedelta64[ns]"
tm.assert_frame_equal(
frame, read_json(StringIO(frame.to_json())).apply(converter)
)

with tm.assert_produces_warning(FutureWarning, match=msg):
json = frame.to_json()
tm.assert_frame_equal(frame, read_json(StringIO(json)).apply(converter))

def test_timedelta2(self):
frame = DataFrame(
Expand All @@ -1078,7 +1144,12 @@ def test_timedelta2(self):
"c": date_range(start="20130101", periods=2),
}
)
data = StringIO(frame.to_json(date_unit="ns"))
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
data = StringIO(frame.to_json(date_unit="ns"))
result = read_json(data)
result["a"] = pd.to_timedelta(result.a, unit="ns")
result["c"] = pd.to_datetime(result.c)
Expand Down Expand Up @@ -1106,28 +1177,42 @@ def test_timedelta_to_json(self, as_object, date_format, timedelta_typ):
data.append("a")

ser = Series(data, index=data)
expected_warning = None
if date_format == "iso":
expected = (
'{"P1DT0H0M0S":"P1DT0H0M0S","P2DT0H0M0S":"P2DT0H0M0S","null":null}'
)
else:
expected_warning = FutureWarning
expected = '{"86400000":86400000,"172800000":172800000,"null":null}'

if as_object:
expected = expected.replace("}", ',"a":"a"}')

result = ser.to_json(date_format=date_format)
msg = (
"'epoch' date format is deprecated and will be removed in a future "
"version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(expected_warning, match=msg):
result = ser.to_json(date_format=date_format)
assert result == expected

@pytest.mark.parametrize("as_object", [True, False])
@pytest.mark.parametrize("timedelta_typ", [pd.Timedelta, timedelta])
def test_timedelta_to_json_fractional_precision(self, as_object, timedelta_typ):
data = [timedelta_typ(milliseconds=42)]
ser = Series(data, index=data)
warn = FutureWarning
if as_object:
ser = ser.astype(object)
warn = None

result = ser.to_json()
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(warn, match=msg):
result = ser.to_json()
expected = '{"42":42}'
assert result == expected

Expand Down Expand Up @@ -1209,12 +1294,18 @@ def test_datetime_tz(self):

df_naive = df.copy()
df_naive["A"] = tz_naive
expected = df_naive.to_json()
assert expected == df.to_json()
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
expected = df_naive.to_json()
assert expected == df.to_json()

stz = Series(tz_range)
s_naive = Series(tz_naive)
assert stz.to_json() == s_naive.to_json()
with tm.assert_produces_warning(FutureWarning, match=msg):
assert stz.to_json() == s_naive.to_json()

def test_sparse(self):
# GH4377 df.to_json segfaults with non-ndarray blocks
Expand Down Expand Up @@ -1479,7 +1570,12 @@ def test_to_json_from_json_columns_dtypes(self, orient):
),
}
)
dfjson = expected.to_json(orient=orient)
msg = (
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
dfjson = expected.to_json(orient=orient)

result = read_json(
StringIO(dfjson),
Expand Down Expand Up @@ -1669,7 +1765,17 @@ def test_read_json_with_very_long_file_path(self, compression):
def test_timedelta_as_label(self, date_format, key):
df = DataFrame([[1]], columns=[pd.Timedelta("1D")])
expected = f'{{"{key}":{{"0":1}}}}'
result = df.to_json(date_format=date_format)

expected_warning = None
if date_format == "epoch":
expected_warning = FutureWarning

msg = (
"'epoch' date format is deprecated and will be removed in a future "
"version, please use 'iso' date format instead."
)
with tm.assert_produces_warning(expected_warning, match=msg):
result = df.to_json(date_format=date_format)

assert result == expected

Expand Down Expand Up @@ -1895,6 +2001,7 @@ def test_to_s3(self, s3_public_bucket, s3so):
timeout -= 0.1
assert timeout > 0, "Timed out waiting for file to appear on moto"

@pytest.mark.filterwarnings("ignore::FutureWarning")
def test_json_pandas_nulls(self, nulls_fixture, request):
# GH 31615
if isinstance(nulls_fixture, Decimal):
Expand Down
Loading
Loading