BUG: fix raise of TypeError when subtracting timedelta array #22054

illegalnumbers · 2018-07-25T19:49:53Z

closes #21980

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jbrockmendel · 2018-07-25T20:31:21Z

pandas/_libs/tslibs/timedeltas.pyx

                # raise rathering than letting numpy return wrong answer
                return NotImplemented
-            return op(self.to_timedelta64(), other)
+            try:
+                converted_other = other.astype('datetime64[ns]')


Converting is wrong here, since it could be mixed, eg [Timestamp, Timedelta].

The mixed test I added passes but I'm guessing this case is why I'm getting problems in other tests? Is there another approach I should take?

Ah there was a bug in my test. Still looking around but definitely curious on another approach. I was thinking I could iterate over the entire array and do piecemeal conversions but that also seems wrong.

jbrockmendel · 2018-07-25T20:32:08Z

pandas/_libs/tslibs/timedeltas.pyx

@@ -929,7 +933,7 @@ cdef class _Timedelta(timedelta):
    def nanoseconds(self):
        """
        Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
-       
+


Nice. Want to add a line to lint.sh that checks for trailing white space in cython files?

illegalnumbers · 2018-07-25T20:51:55Z

Actually I'm a little confused now @jbrockmendel, checking on one of the failing tests that I have it is the one for test_ops_series_object (GH #13043), it seems like based on the functionality that I'm changing the expectation there should change? Should the dtype now actually be a <M8[ns] when you do an addition rather than a O since we do the conversion properly now?

jbrockmendel · 2018-07-25T21:29:39Z

rather than a O since we do the conversion properly now?

The thing is that the line converted_other = other.astype('datetime64[ns]') doesn't belong at all. Among other things the case to test is other = np.array([pd.Timestamp.now(), pd.Timedelta('1D')]). That has an object-dtype, __radd__ and __rsub__ should both be valid, and the astype call would fail.

illegalnumbers · 2018-07-25T21:46:59Z

Yea adding in the test for other = np.array([pd.Timestamp.now(), pd.Timedelta('1D')]) exposes the failure you mentioned. I can remove that but moving forward I'm not sure how to implement that since the O type object fails to apply the lambda since Numpy just errors on the application of the lambda.

jbrockmendel · 2018-07-25T22:22:50Z

not sure how to implement that since the O type object fails to apply the lambda since Numpy just errors on the application of the lambda.

I'm not sure what lambda you're referring to, but I imagine this will end up looking something like:

if other.dtype.kind in ['m', 'M']:
     [do what it does now]
elif other.dtype.kind == 'O':
    return np.array([op(self, x) for x in other])
raise TypeError(...)

illegalnumbers · 2018-07-25T22:30:08Z

Oh dang nice! I was just working through something similar in a REPL. I'll push up a revision in the next few minutes. Thanks for the help! EDIT: The lambda I was referring to was op function that gets passed in.

illegalnumbers · 2018-07-26T00:13:48Z

Ok so I'm not sure why my build wouldn't output anything for the tests that did fail on TravisCI - @jbrockmendel is there a way I can retrigger travis? Is that known to be flakey? Or should I look into something else? I appreciate the help.

jbrockmendel · 2018-07-26T02:18:18Z

Travis error looks unrelated. When this happens I usually find some typo somewhere to fix and make a dummy commit to force it to re-run.

jbrockmendel · 2018-07-26T04:52:28Z

ci/lint.sh

@@ -49,6 +49,11 @@ if [ "$LINT" ]; then
    if [ $? -ne "0" ]; then
        RET=1
    fi
+
+    if [[ -n $(find **/*.pyx -type f -exec egrep -l " +$" {} \;) ]]


Will this print the list of offending files? Can you also include .pxd, .pxi, and .pxi.in? Thanks for stepping up on this, linting these files is really tough.

Yea no worries! It won't display them as output I think just from this test but I can include the same find / grep dance under it or save it as a variable and echo it.

but I can include the same find / grep dance under it or save it as a variable and echo it.

Great. If something breaks, we definitely want to know where to look for it.

jbrockmendel · 2018-07-26T04:53:45Z

doc/source/whatsnew/v0.24.0.txt

@@ -441,7 +441,7 @@ Datetimelike
 Timedelta
 ^^^^^^^^^

-
+- Fixed bug where array of timestamp and deltas raised a TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'Timedelta' (:issue:`21980`)


Take a look at how quotation marks and backticks are used elsewhere In this file (iOS keyboard doesn’t make it easy to give an example directly)

I'll take a look!

jbrockmendel · 2018-07-26T04:55:27Z

pandas/tests/series/test_timeseries.py

@@ -1012,3 +1012,22 @@ def test_get_level_values_box(self):
        index = MultiIndex(levels=levels, labels=labels)

        assert isinstance(index.get_level_values(0)[0], Timestamp)
+
+    def test_diff_sub_timedelta(self):


These tests go in tests/scalar/timedelta/test_arithmetic.py

jbrockmendel · 2018-07-26T04:56:59Z

pandas/tests/series/test_timeseries.py

+        res = arr - pd.Timedelta('1D')
+        tm.assert_numpy_array_equal(res, exp)
+
+    def test_diff_sub_timedelta_mixed(self):


This PR probably also fixes addition right? If so, pls include a test. Possibly also for reversed ops?

It should and I was thinking about reversed this morning actually so I'll include both today.

illegalnumbers · 2018-07-26T17:02:01Z

@jbrockmendel hmm upon further investigation this AM it seems that when doing the reverse subtract ie

        arr = np.array([Timestamp('20130101 9:01'),
                        Timestamp('20121230 9:02')])
        exp = np.array([Timestamp('20121231 9:01'),
                        Timestamp('20121229 9:02')])
        res = pd.Timedelta('1D') - arr
        tm.assert_numpy_array_equal(res, exp)

I get TypeError: descriptor '__sub__' requires a 'datetime.datetime' object but received a 'Timedelta' in pandas/_libs/tslibs/timestamps.pyx:330. Still investigating but it seems like I might have to do a bigger change than I thought.

jbrockmendel · 2018-07-26T17:04:31Z

I get TypeError: descriptor 'sub' requires a 'datetime.datetime' object but received a 'Timedelta' in pandas/_libs/tslibs/timestamps.pyx:330

The operation being run is Timedelta - Timestamp right? That should raise a TypeError, you're OK.

illegalnumbers · 2018-07-26T17:06:45Z

Hmm actually, it appears that rsub doesn't work originally with Timedelta and Timestamps...maybe this is an existing issue?

>>> pd.Timedelta('1d') - Timestamp('20130101 9:01')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 332, in pandas._libs.tslibs.timestamps._Timestamp.__sub__
TypeError: descriptor '__sub__' requires a 'datetime.datetime' object but received a 'Timedelta'

illegalnumbers · 2018-07-26T17:07:32Z

Ok so in that case I'll just check that these raise appropriately then.

Thanks for all the help!

pep8speaks · 2018-07-26T18:07:42Z

Hello @illegalnumbers! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/tests/scalar/timedelta/test_arithmetic.py !

Comment last updated on September 06, 2018 at 04:04 Hours UTC

codecov · 2018-07-26T18:08:36Z

Codecov Report

Merging #22054 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #22054      +/-   ##
==========================================
+ Coverage   92.04%   92.05%   +<.01%     
==========================================
  Files         169      169              
  Lines       50782    50782              
==========================================
+ Hits        46744    46745       +1     
+ Misses       4038     4037       -1

Flag	Coverage Δ
#multiple	`90.46% <ø> (ø)`	⬆️
#single	`42.26% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/ops.py	`97.04% <0%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46abe18...20c93d2. Read the comment docs.

illegalnumbers · 2018-07-27T17:02:52Z

@jbrockmendel lemme know if there's anything else I need to do to get this guy merged! :)

jbrockmendel · 2018-07-27T18:30:12Z

doc/source/whatsnew/v0.24.0.txt

@@ -441,7 +441,7 @@ Datetimelike
 Timedelta
 ^^^^^^^^^

-
+- Fixed bug where array of timestamp and deltas raised a TypeError: unsupported operand type(s) for -: ``numpy.ndarray`` and ``Timedelta`` (:issue:`21980`)


timestamp --> :class:Timestamp
double back ticks around TypeError
don't need the full error message

Can you clarify what "array of timestamp and deltas" means? e.g.

Bug where subtracting :class:`Timedelta` from an object-dtyped array would raise ``TypeError``

jbrockmendel · 2018-07-27T18:31:03Z

ci/lint.sh

+    then
+        RET=1
+        echo $trailing_space_pxd
+    fi


This looks great, thanks. Any other ideas you have for linting cython files will be a big hit with the maintainers (separate PR(s))

Sounds great! I am not sure if I'll have much time in the next little bit after this gets merged but I'll do my best.

Gonna move this out, see other comments.

jbrockmendel · 2018-07-27T18:32:54Z

pandas/tests/scalar/timedelta/test_arithmetic.py

@@ -616,3 +690,35 @@ def test_rdivmod_invalid(self):

        with pytest.raises(TypeError):
            divmod(np.array([22, 24]), td)
+
+    def test_td_div_timedelta_timedeltalike_array(self):


Would a valid case here be object-dtyped array containing all Timedelta objects (or some mix of timedelta, np.timedelta64)?

Probably a good call to do a mix.

Done. (see below this test)

jbrockmendel · 2018-07-27T18:34:28Z

pandas/tests/scalar/timedelta/test_arithmetic.py

+        with pytest.raises(TypeError):
+            pd.Timedelta('1D') * arr
+
+    def test_td_rmult_timedelta_mixed_timedeltalike_array(self):


Some of these you can probably de-duplicate using pytest.mark.parametrize. Not a deal-breaker.

If there's a graceful way to work "object_dtype" into the test names that'd be ideal.

It might be a lot more effor to get this de-duped than I initially thought, would it be ok to submit as-is? I added in object_dtype to the method titles.

pls parameterize in this PR.

@illegalnumbers the idea here is to notice that all four of these tests are special cases of a single test function. In particular, you can write arr * pd.Timedelta('1D') as operator.mul(arr, pd.Timedelta('1D') and similarly for the others with operator.truediv, ops.rdiv, ops.rmul. Then these can be parametrized as:

@pytest.mark.parametrize('op', [operator.mul, ops.rmul, operator.truediv, ops.rdiv]) def test_td_mul_object_array(self, op): arr = np.array([pd.Timestamp.now(), pd.Timedelta('1D')]) with pytest.raises(TypeError): op(arr, pd.Timedelta('1D'))

illegalnumbers · 2018-09-02T19:35:20Z

Gonna see if I can finish this up today!

illegalnumbers · 2018-09-05T00:57:45Z

Took a little while extra but it should be ready for review again. Barring any lint failures of course.

illegalnumbers · 2018-09-05T05:37:19Z

I'm not sure if these are related to my ps? Kind of hard to read the build output.

EDIT: Seems like most are in the excel writer?

jbrockmendel · 2018-09-05T15:21:49Z

EDIT: Seems like most are in the excel writer?

Yes. I think this is fixed now. If you rebase/push it should go through alright. I'll go over this one more time today, after which jreback will be called in for the final OK.

illegalnumbers · 2018-09-05T16:06:17Z

Done! Like I said on the parameterize, I did the best I could without getting stuck. There were a few tests that seemed really similar but I kept getting exceptions when I tried to parameterize them so it seemed like more work than it was worth considering my experience.

pandas/tests/scalar/timedelta/test_arithmetic.py

illegalnumbers · 2018-09-06T23:03:22Z

@jbrockmendel I think this should be good again!

jorisvandenbossche · 2018-09-07T14:04:21Z

pandas/_libs/tslibs/timedeltas.pyx

+            if other.dtype.kind in ['m', 'M']:
+                return op(self.to_timedelta64(), other)
+            elif other.dtype.kind == 'O':
+                return np.array([op(self, x) for x in other])


@jbrockmendel I am a bit confused here why simply returning NotImplemented is not sufficient ?
(I tested it, and that doesn't seem to work. Although with a datetime.timedelta it does, and that one does return NotImplemented ..)

The pd.Timedelta version fails because both arr.__add__(td) and td.__radd__(arr) return NotImplemented. arr.__add__(td.to_pytimedelta()) returns OK, so presumably it is something on the numpy implementation.

Was playing a bit with it, and I think Timedelta behaves differently as datetime.timedelta because of the __array_priority__ we add to the _Timedelta class (by commenting it out, the simple example works), which is needed to get other behaviors working I suppose

@jorisvandenbossche ok otherwise this PR looks ok, are you suggesting that we remove that?

I'm also confused.

jbrockmendel · 2018-09-07T19:44:24Z

The tests could be parametrized a bit further, but at some point that just becomes equivalent to re-writing the method this PR is implementing.

@jreback LGTM.

illegalnumbers · 2018-09-08T00:34:37Z

I'm super happy to have contributed! This was fun. Sorry it took so long.

pandas/tests/scalar/timedelta/test_arithmetic.py

closes pandas-dev#21980

jreback · 2018-09-18T13:25:28Z

thanks!

…2054) closes pandas-dev#21980

The old pandas versions available for Py34 cannot subtract timedeltas from ndarrays. Subtracting them individually works and was used in the fix for later pandas versions: pandas-dev/pandas#22054

jbrockmendel reviewed Jul 25, 2018

View reviewed changes

gfyoung added Bug Timedelta Timedelta data type labels Jul 25, 2018

illegalnumbers force-pushed the GH-21980 branch from 6385152 to fd7bad7 Compare July 25, 2018 23:21

jbrockmendel reviewed Jul 26, 2018

View reviewed changes

illegalnumbers force-pushed the GH-21980 branch from fd7bad7 to 3e8e98e Compare July 26, 2018 18:07

illegalnumbers force-pushed the GH-21980 branch 2 times, most recently from 0208713 to bbf1a06 Compare July 26, 2018 20:42

jbrockmendel reviewed Jul 27, 2018

View reviewed changes

illegalnumbers force-pushed the GH-21980 branch 2 times, most recently from 9a1bd6f to db48fd7 Compare September 5, 2018 00:56

illegalnumbers force-pushed the GH-21980 branch from db48fd7 to 479b064 Compare September 5, 2018 16:05

jbrockmendel reviewed Sep 5, 2018

View reviewed changes