Unexpected exception on column with NaT #17559

lebigot · 2017-09-17T09:07:54Z

Code Sample

import pandas as pd
import datetime

# 4 examples:

df = pd.DataFrame({0: [1, None]})  # Works
df = pd.DataFrame({0: [None, 1]})  # Works

df = pd.DataFrame({0: [None, datetime.datetime.now()]})  # Exception

# Problem demonstration:
df != df.iloc[0]  # Works with numeric column, fails with NaT

Problem description & expected output.

In the above code, the final test raises an exception with the datetime example, but works with the two numeric examples.

I would expect the NaT case to behave like the numeric example.

Note: a column with datetimes but no NaT makes df != df.iloc[0] work as expected.

Expected Output

I expect the result to be, like for numeric values, a dataframe that answers the question "is the value identical to that in the first row?" (as a dataframe with the same shape).

Output of `pd.show_versions()`

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.2.2
pip: 9.0.1
setuptools: 36.3.0
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.5.0a3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
</details>

The text was updated successfully, but these errors were encountered:

jreback · 2017-09-17T14:05:22Z

duplicate of #15697

in generaly comparing directly against a NaN value will fail intuition, if you want to compare use .isnull/.notnull, see docs reference in the other issue.

lebigot · 2017-09-17T14:27:33Z

This issue should definitely be reopened: it is not a duplicate of the issue referenced above, which is about the NaT/NaN semantics. Instead, as described in the original post above, I am talking about the exception being raised when the code should instead just run.

jorisvandenbossche · 2017-09-18T17:01:03Z

This is indeed not a duplicate, as far as I can see.

jorisvandenbossche · 2017-09-18T17:08:57Z

So the fact that seems to make the difference is that here you are comparing to a Series, and not a scalar (the scalar case works 'fine', except from the behavioural bug from #15697 that it should be False instead of True):

In [133]: df
Out[133]: 
                           0
0                        NaT
1 2017-09-18 19:00:27.589018

In [134]: df == pd.NaT
Out[134]: 
      0
0  True
1  True

In [135]: df == df.iloc[0, 0]
Out[135]: 
      0
0  True
1  True

In [136]: df == df.iloc[0]
...
TypeError: Could not operate array([-9223372036854775808]) with block values boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 1

jbrockmendel · 2018-10-23T19:04:08Z

This now works. Most likely closed by #22163.

jorisvandenbossche · 2018-10-23T20:08:21Z

We should still add a test for it?

jbrockmendel · 2018-10-24T00:32:31Z

Yes.

eoveson · 2018-11-19T04:30:00Z

Looks like the following test was added in /pandas/tests/arithmetic/test_datetime64.py, as part of GH22163. Would it cover this issue sufficiently already, or should anything else be tested?

def test_dt64_nat_comparison(self):
    # GH#22242, GH#22163 DataFrame considered NaT == ts incorrectly
    ts = pd.Timestamp.now()
    df = pd.DataFrame([ts, pd.NaT])
    expected = pd.DataFrame([True, False])
     result = df == ts
    tm.assert_frame_equal(result, expected)

jreback closed this as completed Sep 17, 2017

jreback added Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Sep 17, 2017

jreback added this to the No action milestone Sep 17, 2017

jorisvandenbossche reopened this Sep 18, 2017

jorisvandenbossche removed this from the No action milestone Sep 18, 2017

jorisvandenbossche removed the Duplicate Report Duplicate issue or pull request label Sep 18, 2017

lebigot changed the title ~~Unexpected exception on column with NaT~~ Unexpected exceptions on column with NaT Sep 18, 2017

lebigot changed the title ~~Unexpected exceptions on column with NaT~~ Unexpected exception on column with NaT Sep 18, 2017

jbrockmendel mentioned this issue Dec 19, 2017

DataFrame vs Series vs Index arithmetic Roundup #18824

Closed

59 tasks

nmusolino mentioned this issue Mar 13, 2018

DataFrame[timedelta64] / timedelta64 or pydatetime has wrong dtype and wrong values #20088

Closed

jorisvandenbossche added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Oct 26, 2018

jorisvandenbossche added this to the Contributions Welcome milestone Oct 26, 2018

jreback closed this as completed Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected exception on column with NaT #17559

Unexpected exception on column with NaT #17559

lebigot commented Sep 17, 2017 •

edited

Loading

jreback commented Sep 17, 2017

lebigot commented Sep 17, 2017 •

edited

Loading

jorisvandenbossche commented Sep 18, 2017

jorisvandenbossche commented Sep 18, 2017

jbrockmendel commented Oct 23, 2018

jorisvandenbossche commented Oct 23, 2018

jbrockmendel commented Oct 24, 2018

eoveson commented Nov 19, 2018

Unexpected exception on column with NaT #17559

Unexpected exception on column with NaT #17559

Comments

lebigot commented Sep 17, 2017 • edited Loading

Code Sample

Problem description & expected output.

Expected Output

Output of pd.show_versions()

jreback commented Sep 17, 2017

lebigot commented Sep 17, 2017 • edited Loading

jorisvandenbossche commented Sep 18, 2017

jorisvandenbossche commented Sep 18, 2017

jbrockmendel commented Oct 23, 2018

jorisvandenbossche commented Oct 23, 2018

jbrockmendel commented Oct 24, 2018

eoveson commented Nov 19, 2018

lebigot commented Sep 17, 2017 •

edited

Loading

Output of `pd.show_versions()`

lebigot commented Sep 17, 2017 •

edited

Loading