Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas dataFrame - selecting data - 'Interval' Recursion error #25283

Closed
cruzzoe opened this issue Feb 12, 2019 · 1 comment · Fixed by #25338
Closed

Pandas dataFrame - selecting data - 'Interval' Recursion error #25283

cruzzoe opened this issue Feb 12, 2019 · 1 comment · Fixed by #25338
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@cruzzoe
Copy link
Contributor

cruzzoe commented Feb 12, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'SPCName': ['IntervalXxxxxxxxxxxxxxxxxxxxxxxxx']})
df[df.SPCName == 'IntervalA']

Problem description

I am getting a recursion error from the above code ( RuntimeError: maximum recursion depth exceeded while calling a Python object )

This issue did not occur in version 0.23.4 but occurs when upgrading to 0.24.1.

I am not aware why using 'IntervalA' in a Boolean lookup or when using the loc indexer should result in RuntimeError

This issue appears to occur regardless of what follows the str 'Interval' for example 'IntervalABC' also fails.

Full traceback

python2.7/site-packages/pandas/core/ops.pyc in wrapper(self, other, axis)
1726
1727 elif (is_extension_array_dtype(self) or
-> 1728 (is_extension_array_dtype(other) and not is_scalar(other))):
1729 # Note: the not is_scalar(other) condition rules out
1730 # e.g. other == "category"

python2.7/site-packages/pandas/core/dtypes/common.pyc in is_extension_array_dtype(arr_or_dtype)
1747 dtype = getattr(arr_or_dtype, 'dtype', arr_or_dtype)
1748 return (isinstance(dtype, ExtensionDtype) or
-> 1749 registry.find(dtype) is not None)
1750
1751

python2.7/site-packages/pandas/core/dtypes/dtypes.pyc in find(self, dtype)
87 for dtype_type in self.dtypes:
88 try:
---> 89 return dtype_type.construct_from_string(dtype)
90 except TypeError:
91 pass

python2.7/site-packages/pandas/core/dtypes/dtypes.pyc in construct_from_string(cls, string)
936 (string.startswith('interval') or
937 string.startswith('Interval'))):
--> 938 return cls(string)
939
940 msg = "a string needs to be passed, got type {typ}"

python2.7/site-packages/pandas/core/dtypes/dtypes.pyc in new(cls, subtype)
897
898 try:
--> 899 subtype = pandas_dtype(subtype)
900 except TypeError:
901 raise TypeError("could not construct IntervalDtype")

python2.7/site-packages/pandas/core/dtypes/common.pyc in pandas_dtype(dtype)
2002
2003 # registered extension types
-> 2004 result = registry.find(dtype)
2005 if result is not None:
2006 return result

... last 4 frames repeated, from the frame below ...

python2.7/site-packages/pandas/core/dtypes/dtypes.pyc in find(self, dtype)
87 for dtype_type in self.dtypes:
88 try:
---> 89 return dtype_type.construct_from_string(dtype)
90 except TypeError:
91 pass

Expected Output

Empty dataframe because the str 'IntervalA' doesnt exist in the dataframe col SPCName.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-642.6.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US
LOCALE: None.None

pandas: 0.24.1
pytest: 3.6.4
pip: 18.0
setuptools: 40.2.0
Cython: None
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.12
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jschendel
Copy link
Member

jschendel commented Feb 12, 2019

Thanks, the cause the error looks to be in IntervalDtype when a close but invalid string is passed:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0.dev0+86.g1d1b14c7e'

In [2]: pd.api.types.IntervalDtype('IntervalA')
---------------------------------------------------------------------------
RecursionError: maximum recursion depth exceeded in __instancecheck__

In [3]: pd.api.types.IntervalDtype.construct_from_string('IntervalA')
---------------------------------------------------------------------------
RecursionError: maximum recursion depth exceeded while calling a Python object

In [4]: pd.api.types.IntervalDtype.is_dtype('IntervalA')
---------------------------------------------------------------------------
RecursionError: maximum recursion depth exceeded

The correct output is: [2] and [3] should raise, and [4] should return False.

The fix is to more strongly enforce what the IntervalDtype looks for when parsing strings. The valid string representations for an IntervalDtype are 'interval' or 'interval[<subtype>]' (e.g. 'interval[int64]') with the leading 'i' optionally being capitalized.

The RecursionError in IntervalDtype itself is a bug that was present in 0.23.4 and it looks like some changes in 0.24.0 surfaced this bug in the code responsible for filtering.

@jschendel jschendel added Bug Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version Interval Interval data type labels Feb 12, 2019
@jschendel jschendel added this to the Contributions Welcome milestone Feb 12, 2019
@zangell44 zangell44 mentioned this issue Feb 15, 2019
7 tasks
@jreback jreback modified the milestones: Contributions Welcome, 0.24.2 Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants