Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.rmod called with scalar returns inconsistent results based on size of series #29602

Closed
Tracked by #3
kmacdough opened this issue Nov 13, 2019 · 8 comments · Fixed by #50665
Closed
Tracked by #3
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@kmacdough
Copy link

Code Sample, a copy-pastable example if possible

In

s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))

Out

   long_series  short_series
0           -1             1
1           -1             1
2           -1             1
3           -1             1
4           -1             1
5           -1             1
6           -1             1
7           -1             1
8           -1             1
9           -1             1

Problem description

Series.rmod called on a scalar behaves differently depending on the size of the series.
On my machine:

  • length <= 10,000 sign matches divisor (matches pythons % operator)
  • length > 10,000 sign matches dividend

Appears to behave properly when called on other Series.
Series.mod does not appear to have any such issues.

Expected Output

   long_series  short_series
0           1             1
1           1             1
2           1             1
3           1             1
4           1             1
5           1             1
6           1             1
7           1             1
8           1             1
9           1             1

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.1
setuptools : 41.0.1
Cython : 0.29.14
pytest : 5.0.1
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.3.2 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2

@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 13, 2019
@TomAugspurger
Copy link
Contributor

Seems to be fixed on master.

s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))

## -- End pasted text --
   long_series  short_series
0            1             1
1            1             1
2            1             1
3            1             1
4            1             1
5            1             1
6            1             1
7            1             1
8            1             1
9            1             1

@kmacdough can you try with master?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Nov 14, 2019
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Apr 1, 2020

Seems to be fixed on master.

i'm getting the same issue on master (and also getting the incorrect behaviour on '0.26.0.dev0+910.g71b78685a' (master on 14/11/2019))

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> s1 = pd.Series([2] * 10001, name="long_series").rmod(-1)
>>> s2 = pd.Series([2] * 10000, name="short_series").rmod(-1)
>>> print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
   long_series  short_series
0           -1             1
1           -1             1
2           -1             1
3           -1             1
4           -1             1
5           -1             1
6           -1             1
7           -1             1
8           -1             1
9           -1             1

INSTALLED VERSIONS


commit : 49bc8d8
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : None.None

pandas : 1.1.0.dev0+1068.g49bc8d8c9
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200311
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.6.0
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.0
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

@simonjayhawkins simonjayhawkins added Bug and removed Needs Info Clarification about behavior needed to assess issue labels Apr 1, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Apr 1, 2020
@simonjayhawkins
Copy link
Member

with uint64 casts to float, Int64 works as expected and UInt64 raises TypeError

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("int64").rmod(-1)
0       -1
1       -1
2       -1
3       -1
4       -1
        ..
9996    -1
9997    -1
9998    -1
9999    -1
10000   -1
Name: long_series, Length: 10001, dtype: int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("uint64").rmod(-1)
0        1.0
1        1.0
2        1.0
3        1.0
4        1.0
        ...
9996     1.0
9997     1.0
9998     1.0
9999     1.0
10000    1.0
Name: long_series, Length: 10001, dtype: float64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("Int64").rmod(-1)
0        1
1        1
2        1
3        1
4        1
        ..
9996     1
9997     1
9998     1
9999     1
10000    1
Name: long_series, Length: 10001, dtype: Int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("UInt64").rmod(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 518, in flex_wrapper
    return op(self, other)
  File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
    return right % left
  File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
    return method(self, other)
  File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 440, in wrapper
    result = arithmetic_op(lvalues, rvalues, op, str_rep)
  File "C:\Users\simon\pandas\pandas\core\ops\array_ops.py", line 205, in arithmetic_op
    res_values = op(lvalues, rvalues)
  File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
    return right % left
  File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
    return method(self, other)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 679, in integer_arithmetic_method
    return self._maybe_mask_result(result, mask, other, op_name)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 608, in _maybe_mask_result
    return type(self)(result, mask, copy=False)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 348, in __init__
    "values should be integer numpy array. Use "
TypeError: values should be integer numpy array. Use the 'integer_array' function instead
>>>

@jbrockmendel
Copy link
Member

@simonjayhawkins didnt you have a recent numexpr-related PR on a issue similar to this?

@mroeschke
Copy link
Member

I'm getting the correct behavior on master again. Maybe worth having a test to see if it's not platform or package related

In [3]: s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
   ...: s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
   ...: print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
   long_series  short_series
0            1             1
1            1             1
2            1             1
3            1             1
4            1             1
5            1             1
6            1             1
7            1             1
8            1             1
9            1             1

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 23, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@codamuse
Copy link
Contributor

codamuse commented Dec 4, 2022

@phofl not sure if I should be pinging you on this but I have a test written that's pretty straightforward:

    result = pd.Series([2] * 10001).rmod(-1)
    expected = pd.Series([1] * 10001)

    tm.assert_series_equal(result, expected)

two questions: 1) does the test look okay, 2) I'm trying to test against 1.1.0 to confirm it fails (it passes on the latest main) but having some issues building.

To test against 1.1.0 I'm checking out d9fff2792bf16178d4e450fe7384244e50635733 and then when running python setup.py build_ext -j 4 I get the following:

pandas/_libs/writers.c:5092:5: error: ‘_PyUnicode_get_wstr_length’ is deprecated [-Werror=deprecated-declarations]

Any pointers? Thanks!

@phofl
Copy link
Member

phofl commented Dec 5, 2022

Test looks ok I think,

You don't have to build from source there. You can simply create another environment and install the released pandas 1.1.0

@jbrockmendel
Copy link
Member

in tests.arithmetic.switch_numexpr_min_elements we change the numexpr cutoff so that we shouldn't need a long Series to test this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants