Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Ignore division by 0 when merging empty dataframes (#17776) #17846

Merged
merged 3 commits into from
Nov 27, 2017

Conversation

yeemey
Copy link
Contributor

@yeemey yeemey commented Oct 11, 2017

@codecov
Copy link

codecov bot commented Oct 11, 2017

Codecov Report

Merging #17846 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17846      +/-   ##
==========================================
- Coverage   91.25%   91.21%   -0.05%     
==========================================
  Files         163      163              
  Lines       50038    50040       +2     
==========================================
- Hits        45661    45642      -19     
- Misses       4377     4398      +21
Flag Coverage Δ
#multiple 89.01% <100%> (-0.03%) ⬇️
#single 40.26% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.28% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e159ae...ff8e1bb. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 11, 2017

Codecov Report

Merging #17846 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17846      +/-   ##
==========================================
- Coverage   91.25%   91.21%   -0.05%     
==========================================
  Files         163      163              
  Lines       50038    50040       +2     
==========================================
- Hits        45661    45642      -19     
- Misses       4377     4398      +21
Flag Coverage Δ
#multiple 89.01% <100%> (-0.03%) ⬇️
#single 40.26% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.28% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e159ae...ff8e1bb. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 11, 2017

Codecov Report

Merging #17846 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17846      +/-   ##
==========================================
- Coverage   91.25%   91.21%   -0.05%     
==========================================
  Files         163      163              
  Lines       50038    50040       +2     
==========================================
- Hits        45661    45642      -19     
- Misses       4377     4398      +21
Flag Coverage Δ
#multiple 89.01% <100%> (-0.03%) ⬇️
#single 40.26% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.28% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e159ae...ff8e1bb. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 11, 2017

Codecov Report

Merging #17846 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17846      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         163      163              
  Lines       49801    49802       +1     
==========================================
- Hits        45496    45487       -9     
- Misses       4305     4315      +10
Flag Coverage Δ
#multiple 89.13% <100%> (-0.01%) ⬇️
#single 40.79% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.29% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.61% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 674fb96...138d886. Read the comment docs.

@gfyoung gfyoung added Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 12, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add the test from the original issue

@@ -1514,6 +1514,7 @@ def _sort_labels(uniques, left, right):


def _get_join_keys(llab, rlab, shape, sort):
np.seterr(divide='ignore')

# how many levels can be done without overflow
pred = lambda i: not is_int64_overflow_possible(shape[:i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the context manager here (and only surround the actual divide statement)

Copy link
Contributor Author

@yeemey yeemey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jreback ! I made the changes here. I haven't done a review before, so I'm not entirely sure how this works.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2017

need you to rebase on master

git rebase -i upstream/master

then fix any conflicts

git push yourupstream yourbranch -f

@yeemey yeemey force-pushed the dev branch 2 times, most recently from ccbf927 to 7aee0b2 Compare October 14, 2017 20:57
lkey += llab[i] * stride
rkey += rlab[i] * stride
np.seterr(divide='warn')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary

@@ -861,6 +861,14 @@ def test_validation(self):
result = merge(left, right, on=['a', 'b'], validate='1:1')
assert_frame_equal(result, expected_multi)

def test_merge_two_empty_df_no_division_error(self):
# GH17776, PR #17846
import warnings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test should mimic the one from the issue, except use a context manager (to avoid setting it globally)

@@ -1514,6 +1514,7 @@ def _sort_labels(uniques, left, right):


def _get_join_keys(llab, rlab, shape, sort):
np.seterr(divide='ignore')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

@yeemey
Copy link
Contributor Author

yeemey commented Oct 16, 2017

@jreback, thanks, I've made the changes. How can I address the failed AppVeyor build? This test passed when I ran pytest pandas on my computer:

================================== FAILURES ===================================
_______________________ TestDataFrameAnalytics.test_sum _______________________
[gw0] win32 -- Python 2.7.14 C:\Miniconda3_64\envs\pandas\python.exe
self = <pandas.tests.frame.test_analytics.TestDataFrameAnalytics object at 0x00000000089BB518>
    def test_sum(self):
        self._check_stat_op('sum', np.sum, has_numeric_only=True)
    
        # mixed types (with upcasting happening)
        self._check_stat_op('sum', np.sum,
                            frame=self.mixed_float.astype('float32'),
                            has_numeric_only=True, check_dtype=False,
>                           check_less_precise=True)
pandas\tests\frame\test_analytics.py:449: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\tests\frame\test_analytics.py:753: in _check_stat_op
    check_less_precise=check_less_precise)
pandas\util\testing.py:1272: in assert_series_equal
    obj='{obj}'.format(obj=obj))
pandas\_libs\testing.pyx:59: in pandas._libs.testing.assert_almost_equal
    cpdef assert_almost_equal(a, b,
pandas\_libs\testing.pyx:173: in pandas._libs.testing.assert_almost_equal
    raise_assert_detail(obj, msg, lobj, robj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Series', message = 'Series values are different (25.0 %)'
left = '[-13.4449, 0.13886, 9.09849, -0.00019373]'
right = '[-13.4448623657, 0.13885974884, 9.098487854, -0.000193253159523]'
diff = None
    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{obj} are different
    
    {message}
    [left]:  {left}
    [right]: {right}""".format(obj=obj, message=message, left=left, right=right)
    
        if diff is not None:
            msg += "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (25.0 %)
E       [left]:  [-13.4449, 0.13886, 9.09849, -0.00019373]
E       [right]: [-13.4448623657, 0.13885974884, 9.098487854, -0.000193253159523]
pandas\util\testing.py:1089: AssertionError
 1 failed, 15133 passed, 790 skipped, 13 xfailed, 1 xpassed in 517.30 seconds =
Command exited with code 1

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

can you rebase and add a note in 0.21.1

@jreback jreback added this to the 0.21.1 milestone Nov 26, 2017
@jreback jreback merged commit 262e8ff into pandas-dev:master Nov 27, 2017
@jreback
Copy link
Contributor

jreback commented Nov 27, 2017

thanks @yeemey

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
@yeemey yeemey deleted the dev branch December 18, 2017 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merging two empty dataframes can incur a division by zero
4 participants