-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equality between DataFrames misbehaves if columns contain NaN #18455
Comments
By the way: it works with
|
I spoke too soon:
|
hmm this should work, we already use
|
More general question: is it a desired feature or a limitation that equality works only on objects with similarly ordered |
see long discussion here: #1134 |
Interesting, but my understanding is that it does not consider the specific issue of having the same labels but in a different order. I understand the reason not to support comparison between different indexes is to avoid |
how could different orderings be considered equal? |
My idea would be something like
|
i am asking why you think this is a good idea to ignore ordering in an ordered array |
Just because aligning is what pandas always does (with arithmetic and logic operations), and hence what users expect. Or in other words: if our
|
I don't think this is a good idea. Most pandas operations already either (1) align arguments or (2) require identical labels. This would add a third type: (3) require same labels, in any order. |
What kind of operations would be left in category |
|
Sure... but these are at the object level. In the same way, By the way, I'm not at all against having equality in your category |
Reminder: when this is fixed, remove workaround in test |
I am not sure this was the reason. Because if comparison operations would align, you would 1) align introducing NaNs in the values and 2) compare and where there are NaNs you just get I think one of the reasons to not let the comparisons align was 1) make series behaviour consistent with dataframe (but of course, we could also have changed the dataframe behaviour to align as well) and 2) people liked the error as a sanity check (as often, when doing a comparison you want to use it for boolean indexing, and then if you get alignment, that might give unexpected results). One example use case that Wes gave: |
Good point: comparison of NaNs is well defined.
Exactly
True. My idea of introducing NaNs would have provided this sanity check... but it's just too inconsistent. And while I would rather not have this sanity check, changing it now would be too disruptive. I still think we could just allow for different order of indexes, in unique indexes with same elements, not to matter. |
Code Sample, a copy-pastable example if possible
Problem description
While it is true that
np.nan != np.nan
, pandas disregards this in indexes (indeed,s.loc[:, np.nan]
works), so it should be coherent.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: b45325e
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.22.0.dev0+201.gb45325e28.dirty
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: