-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Allow non-tuples in pandas.merge #34810
Changes from 2 commits
a27b04b
a507714
a567dae
10b94ca
baf2ee4
2d67692
0437e8d
b96df6f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2072,9 +2072,13 @@ def _items_overlap_with_suffix(left: Index, right: Index, suffixes: Tuple[str, s | |
If corresponding suffix is empty, the entry is simply converted to string. | ||
|
||
""" | ||
if not isinstance(suffixes, tuple): | ||
raise TypeError( | ||
f"suffixes should be tuple of (str, str). But got {type(suffixes).__name__}" | ||
if isinstance(suffixes, set): | ||
warnings.warn( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not just raise? also what about a dict or non-list-like for example? do we check is_list_like anywhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we'll need a warning according to our backwards compatibility poilcy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's easy to do it with a warning, so I would also just do that. @TomAugspurger if we don't want to have it set-specific, could also do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that should use what @jorisvandenbossche suggests here , e.g. dict would pass this test but is not acceptable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dicts are ordered though, so they shouldn't be a problem. I thought the original issue was non-deterministic outputs because sets aren't ordered? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually sets are not ordered, but I think we need an actual sequence here (sure .keys() satisfies), but i'd rather be explicit. we do not treat dicts as list-like anywhere else. |
||
"Passing 'suffixes' as a set, which is unordered, may result in " | ||
"unexpected results. Provide 'suffixes' as a tuple instead. In the " | ||
"future a 'TypeError' will be raised.", | ||
FutureWarning, | ||
stacklevel=4, | ||
) | ||
|
||
to_rename = left.intersection(right) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2069,18 +2069,12 @@ def test_merge_suffix_error(col1, col2, suffixes): | |
pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"col1, col2, suffixes", [("a", "a", {"a", "b"}), ("a", "a", None), (0, 0, None)], | ||
) | ||
def test_merge_suffix_type_error(col1, col2, suffixes): | ||
a = pd.DataFrame({col1: [1, 2, 3]}) | ||
b = pd.DataFrame({col2: [3, 4, 5]}) | ||
def test_merge_suffix_set(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. call this non_list_like parmaterize on dict as well |
||
a = pd.DataFrame({"a": [1, 2, 3]}) | ||
b = pd.DataFrame({"b": [3, 4, 5]}) | ||
|
||
msg = ( | ||
f"suffixes should be tuple of \\(str, str\\). But got {type(suffixes).__name__}" | ||
) | ||
with pytest.raises(TypeError, match=msg): | ||
pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes) | ||
with tm.assert_produces_warning(FutureWarning): | ||
pd.merge(a, b, left_index=True, right_index=True, suffixes={"left", "right"}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are not actually passing the parameterized |
||
|
||
|
||
@pytest.mark.parametrize( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are going to do this (I guess @jorisvandenbossche really wants this), then let's update the doc-string of pd.merge to be correct (needs list-like) and the docs in reshape.rst as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
pd.merge
docs already state that we want a sequence.Updated the
DataFrame.merge
docs to be consistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm we don't use sequence elsewhere do we, don't we use list-like everywhere?