Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: Remove bitwise operations on dtype=object? #16873

Open
eyurtsev opened this issue Jul 10, 2017 · 3 comments
Open

ERR: Remove bitwise operations on dtype=object? #16873

eyurtsev opened this issue Jul 10, 2017 · 3 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@eyurtsev
Copy link

eyurtsev commented Jul 10, 2017

Code Sample, a copy-pastable example if possible

1) This is okay:

~pd.Series([False, False, True, False], dtype=bool)

Out[76]:
0     True
1     True
2    False
3     True
dtype: bool

2) This looks like a problem:

~pd.Series([False, False, True, False], dtype=bool).shift(1).dropna()
1    -1
2    -1
3    -2
dtype: object

Problem description

.shift and .dropna are common pandas operations.

.shift(1) converts dtype from bool to object, so the bitwise operation is applied to each object (~False).

The output in the latter case is extremely surprising. It might be safer to raise an Exception rather than allow bitwise operations implemented on objects.

~pd.Series([1.0], dtype=object)

Expected Output

Exception!

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.1
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
xarray: None
IPython: 5.3.0
sphinx: 1.2.2
patsy: 0.4.0
dateutil: 2.6.0
pytz: 2015.6
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.4.6
matplotlib: 2.0.1
openpyxl: 2.3.3
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: 1.4.2
sqlalchemy: 1.0.8
pymysql: None
psycopg2: None
jinja2: 2.9.6
boto: 2.20.1
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 10, 2017

yeah this is a tough one. We don't normally infer object dtypes before other ops. And of course this is object because we don't have first class NA for bools :< Though these are bitwise ops so we could infer and if not bool raise a TypeError.

do you want to have a go and see how much impact this would have? IOW add some tests and make a change and see what else breaks?

@jreback jreback added Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Jul 10, 2017
@jreback jreback added this to the Next Major Release milestone Jul 10, 2017
@jreback jreback changed the title Remove bitwise operations on dtype=object? ERR: Remove bitwise operations on dtype=object? Jul 10, 2017
@david-zwicker
Copy link

I just stumbled across a similar and likely connected error when dealing with boolean data that was for some reason stored with dtype=object. Here is a short sample demonstrating the problem:

>>> ~pd.DataFrame([True, True], dtype=object)
      0
0    -2
1    -2

This is of course rather unexpected, in particular since boolean indexing cannot be used with this result.

@toobaz
Copy link
Member

toobaz commented Aug 14, 2018

Might be worth mentioning that while Series and dtype-specific indexes typically support the same arithmetic operations, the standard Index class (so dtype=object) does not try to make inference and just fails:

In [2]: -pd.Index([2, 4, 6], dtype='object')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-7c929da771a5> in <module>()
----> 1 -pd.Index([2, 4, 6], dtype='object')

~/nobackup/repo/pandas/pandas/core/ops.py in invalid_op(self, other)
    183     def invalid_op(self, other=None):
    184         raise TypeError("cannot perform {name} with this index type: "
--> 185                         "{typ}".format(name=name, typ=type(self).__name__))
    186 
    187     invalid_op.__name__ = name

TypeError: cannot perform __neg__ with this index type: Index

The choice we take for Series(., dtype=object) and for Index(., dtype=object) should probably be consistent (although this is not, as of now, a direct concern for ~, which is entirely unsupported for indexes - see #22336)

@jbrockmendel jbrockmendel added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Effort Medium labels Oct 16, 2019
@mroeschke mroeschke added the Bug label Jun 12, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

6 participants