Series.rmod called with scalar returns inconsistent results based on size of series #29602

kmacdough · 2019-11-13T21:33:00Z

Code Sample, a copy-pastable example if possible

In

s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))

Out

   long_series  short_series
0           -1             1
1           -1             1
2           -1             1
3           -1             1
4           -1             1
5           -1             1
6           -1             1
7           -1             1
8           -1             1
9           -1             1

Problem description

Series.rmod called on a scalar behaves differently depending on the size of the series.
On my machine:

length <= 10,000 sign matches divisor (matches pythons % operator)
length > 10,000 sign matches dividend

Appears to behave properly when called on other Series.
Series.mod does not appear to have any such issues.

Expected Output

   long_series  short_series
0           1             1
1           1             1
2           1             1
3           1             1
4           1             1
5           1             1
6           1             1
7           1             1
8           1             1
9           1             1

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.1
setuptools : 41.0.1
Cython : 0.29.14
pytest : 5.0.1
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.3.2 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-11-14T02:27:29Z

Seems to be fixed on master.

s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))

## -- End pasted text --
   long_series  short_series
0            1             1
1            1             1
2            1             1
3            1             1
4            1             1
5            1             1
6            1             1
7            1             1
8            1             1
9            1             1

@kmacdough can you try with master?

simonjayhawkins · 2020-04-01T12:42:49Z

Seems to be fixed on master.

i'm getting the same issue on master (and also getting the incorrect behaviour on '0.26.0.dev0+910.g71b78685a' (master on 14/11/2019))

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> s1 = pd.Series([2] * 10001, name="long_series").rmod(-1)
>>> s2 = pd.Series([2] * 10000, name="short_series").rmod(-1)
>>> print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
   long_series  short_series
0           -1             1
1           -1             1
2           -1             1
3           -1             1
4           -1             1
5           -1             1
6           -1             1
7           -1             1
8           -1             1
9           -1             1

INSTALLED VERSIONS

commit : 49bc8d8
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : None.None

pandas : 1.1.0.dev0+1068.g49bc8d8c9
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200311
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.6.0
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.0
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

simonjayhawkins · 2020-04-01T13:24:50Z

with uint64 casts to float, Int64 works as expected and UInt64 raises TypeError

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("int64").rmod(-1)
0       -1
1       -1
2       -1
3       -1
4       -1
        ..
9996    -1
9997    -1
9998    -1
9999    -1
10000   -1
Name: long_series, Length: 10001, dtype: int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("uint64").rmod(-1)
0        1.0
1        1.0
2        1.0
3        1.0
4        1.0
        ...
9996     1.0
9997     1.0
9998     1.0
9999     1.0
10000    1.0
Name: long_series, Length: 10001, dtype: float64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("Int64").rmod(-1)
0        1
1        1
2        1
3        1
4        1
        ..
9996     1
9997     1
9998     1
9999     1
10000    1
Name: long_series, Length: 10001, dtype: Int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("UInt64").rmod(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 518, in flex_wrapper
    return op(self, other)
  File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
    return right % left
  File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
    return method(self, other)
  File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 440, in wrapper
    result = arithmetic_op(lvalues, rvalues, op, str_rep)
  File "C:\Users\simon\pandas\pandas\core\ops\array_ops.py", line 205, in arithmetic_op
    res_values = op(lvalues, rvalues)
  File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
    return right % left
  File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
    return method(self, other)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 679, in integer_arithmetic_method
    return self._maybe_mask_result(result, mask, other, op_name)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 608, in _maybe_mask_result
    return type(self)(result, mask, copy=False)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 348, in __init__
    "values should be integer numpy array. Use "
TypeError: values should be integer numpy array. Use the 'integer_array' function instead
>>>

jbrockmendel · 2020-10-11T15:40:35Z

@simonjayhawkins didnt you have a recent numexpr-related PR on a issue similar to this?

mroeschke · 2021-07-23T04:27:59Z

I'm getting the correct behavior on master again. Maybe worth having a test to see if it's not platform or package related

In [3]: s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
   ...: s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
   ...: print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
   long_series  short_series
0            1             1
1            1             1
2            1             1
3            1             1
4            1             1
5            1             1
6            1             1
7            1             1
8            1             1
9            1             1

codamuse · 2022-12-04T22:12:36Z

@phofl not sure if I should be pinging you on this but I have a test written that's pretty straightforward:

    result = pd.Series([2] * 10001).rmod(-1)
    expected = pd.Series([1] * 10001)

    tm.assert_series_equal(result, expected)

two questions: 1) does the test look okay, 2) I'm trying to test against 1.1.0 to confirm it fails (it passes on the latest main) but having some issues building.

To test against 1.1.0 I'm checking out d9fff2792bf16178d4e450fe7384244e50635733 and then when running python setup.py build_ext -j 4 I get the following:

pandas/_libs/writers.c:5092:5: error: ‘_PyUnicode_get_wstr_length’ is deprecated [-Werror=deprecated-declarations]

Any pointers? Thanks!

phofl · 2022-12-05T15:52:20Z

Test looks ok I think,

You don't have to build from source there. You can simply create another environment and install the released pandas 1.1.0

jbrockmendel · 2022-12-05T19:57:42Z

in tests.arithmetic.switch_numexpr_min_elements we change the numexpr cutoff so that we shouldn't need a long Series to test this

jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 13, 2019

TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Nov 14, 2019

simonjayhawkins added Bug and removed Needs Info Clarification about behavior needed to assess issue labels Apr 1, 2020

simonjayhawkins added this to the Contributions Welcome milestone Apr 1, 2020

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 23, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

phofl mentioned this issue Nov 16, 2022

TST: Fixed issues that need tests noatamir/pyladies-berlin-sprints#3

Open

17 tasks

codamuse mentioned this issue Dec 4, 2022

TST: test to confirm rmod works for large series #50061

Closed

5 tasks

tabeak mentioned this issue Jan 10, 2023

Add test for checking inconsistencies for large series using Series.rmod #50665

Merged

1 task

phofl closed this as completed in #50665 Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series.rmod called with scalar returns inconsistent results based on size of series #29602

Series.rmod called with scalar returns inconsistent results based on size of series #29602

kmacdough commented Nov 13, 2019

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

TomAugspurger commented Nov 14, 2019

simonjayhawkins commented Apr 1, 2020 •

edited

Loading

simonjayhawkins commented Apr 1, 2020

jbrockmendel commented Oct 11, 2020

mroeschke commented Jul 23, 2021

codamuse commented Dec 4, 2022 •

edited

Loading

phofl commented Dec 5, 2022

jbrockmendel commented Dec 5, 2022

Series.rmod called with scalar returns inconsistent results based on size of series #29602

Series.rmod called with scalar returns inconsistent results based on size of series #29602

Comments

kmacdough commented Nov 13, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

TomAugspurger commented Nov 14, 2019

simonjayhawkins commented Apr 1, 2020 • edited Loading

simonjayhawkins commented Apr 1, 2020

jbrockmendel commented Oct 11, 2020

mroeschke commented Jul 23, 2021

codamuse commented Dec 4, 2022 • edited Loading

phofl commented Dec 5, 2022

jbrockmendel commented Dec 5, 2022

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

simonjayhawkins commented Apr 1, 2020 •

edited

Loading

codamuse commented Dec 4, 2022 •

edited

Loading