Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected exception on column with NaT #17559

Closed
Tracked by #18824
lebigot opened this issue Sep 17, 2017 · 8 comments
Closed
Tracked by #18824

Unexpected exception on column with NaT #17559

lebigot opened this issue Sep 17, 2017 · 8 comments
Labels
Datetime Datetime data dtype good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Tests Unit test(s) needed to prevent regressions

Comments

@lebigot
Copy link
Contributor

lebigot commented Sep 17, 2017

Code Sample

import pandas as pd
import datetime

# 4 examples:

df = pd.DataFrame({0: [1, None]})  # Works
df = pd.DataFrame({0: [None, 1]})  # Works

df = pd.DataFrame({0: [None, datetime.datetime.now()]})  # Exception

# Problem demonstration:
df != df.iloc[0]  # Works with numeric column, fails with NaT

Problem description & expected output.

In the above code, the final test raises an exception with the datetime example, but works with the two numeric examples.

I would expect the NaT case to behave like the numeric example.

Note: a column with datetimes but no NaT makes df != df.iloc[0] work as expected.

Expected Output

I expect the result to be, like for numeric values, a dataframe that answers the question "is the value identical to that in the first row?" (as a dataframe with the same shape).

Output of pd.show_versions()

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.2.2
pip: 9.0.1
setuptools: 36.3.0
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.5.0a3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
</details>
@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

duplicate of #15697

in generaly comparing directly against a NaN value will fail intuition, if you want to compare use .isnull/.notnull, see docs reference in the other issue.

@jreback jreback closed this as completed Sep 17, 2017
@jreback jreback added Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Sep 17, 2017
@jreback jreback added this to the No action milestone Sep 17, 2017
@lebigot
Copy link
Contributor Author

lebigot commented Sep 17, 2017

This issue should definitely be reopened: it is not a duplicate of the issue referenced above, which is about the NaT/NaN semantics. Instead, as described in the original post above, I am talking about the exception being raised when the code should instead just run.

@jorisvandenbossche
Copy link
Member

This is indeed not a duplicate, as far as I can see.

@jorisvandenbossche jorisvandenbossche removed this from the No action milestone Sep 18, 2017
@jorisvandenbossche jorisvandenbossche removed the Duplicate Report Duplicate issue or pull request label Sep 18, 2017
@jorisvandenbossche
Copy link
Member

So the fact that seems to make the difference is that here you are comparing to a Series, and not a scalar (the scalar case works 'fine', except from the behavioural bug from #15697 that it should be False instead of True):

In [133]: df
Out[133]: 
                           0
0                        NaT
1 2017-09-18 19:00:27.589018

In [134]: df == pd.NaT
Out[134]: 
      0
0  True
1  True

In [135]: df == df.iloc[0, 0]
Out[135]: 
      0
0  True
1  True

In [136]: df == df.iloc[0]
...
TypeError: Could not operate array([-9223372036854775808]) with block values boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 1

@lebigot lebigot changed the title Unexpected exception on column with NaT Unexpected exceptions on column with NaT Sep 18, 2017
@lebigot lebigot changed the title Unexpected exceptions on column with NaT Unexpected exception on column with NaT Sep 18, 2017
@jbrockmendel
Copy link
Member

This now works. Most likely closed by #22163.

@jorisvandenbossche
Copy link
Member

We should still add a test for it?

@jbrockmendel
Copy link
Member

Yes.

@jorisvandenbossche jorisvandenbossche added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Oct 26, 2018
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Oct 26, 2018
@eoveson
Copy link
Contributor

eoveson commented Nov 19, 2018

Looks like the following test was added in /pandas/tests/arithmetic/test_datetime64.py, as part of GH22163. Would it cover this issue sufficiently already, or should anything else be tested?

def test_dt64_nat_comparison(self):
    # GH#22242, GH#22163 DataFrame considered NaT == ts incorrectly
    ts = pd.Timestamp.now()
    df = pd.DataFrame([ts, pd.NaT])
    expected = pd.DataFrame([True, False])
     result = df == ts
    tm.assert_frame_equal(result, expected)

@jreback jreback closed this as completed Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

No branches or pull requests

5 participants