-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Allow non-tuples in pandas.merge #34810
Conversation
Closes pandas-dev#34741, while retaining the spirit of the spirit of pandas-dev#34208.
@@ -746,6 +745,7 @@ Deprecations | |||
- :meth:`DataFrame.to_dict` has deprecated accepting short names for ``orient`` in future versions (:issue:`32515`) | |||
- :meth:`Categorical.to_dense` is deprecated and will be removed in a future version, use ``np.asarray(cat)`` instead (:issue:`32639`) | |||
- The ``fastpath`` keyword in the ``SingleBlockManager`` constructor is deprecated and will be removed in a future version (:issue:`33092`) | |||
- Providing ``suffixes`` as a ``set`` in :func:`pandas.merge` is deprecated. Provide a tuple instead (:issue:`33740`, :issue:`34741`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are going to do this (I guess @jorisvandenbossche really wants this), then let's update the doc-string of pd.merge to be correct (needs list-like) and the docs in reshape.rst as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pd.merge
docs already state that we want a sequence.
suffixes : Sequence, default is ("_x", "_y")
A length-2 sequence where
Updated the DataFrame.merge
docs to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm we don't use sequence elsewhere do we, don't we use list-like everywhere?
raise TypeError( | ||
f"suffixes should be tuple of (str, str). But got {type(suffixes).__name__}" | ||
if isinstance(suffixes, set): | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just raise? also what about a dict or non-list-like for example?
do we check is_list_like anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need a warning according to our backwards compatibility poilcy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy to do it with a warning, so I would also just do that.
@TomAugspurger if we don't want to have it set-specific, could also do if not is_list_like(suffixes, allow_sets=False)
, although in practice that might give the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that should use what @jorisvandenbossche suggests here , e.g. dict would pass this test but is not acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dicts are ordered though, so they shouldn't be a problem. I thought the original issue was non-deterministic outputs because sets aren't ordered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually sets are not ordered, but I think we need an actual sequence here (sure .keys() satisfies), but i'd rather be explicit. we do not treat dicts as list-like anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that passes the suffixes as a list? (to ensure we don't accidentally remove this again)
@@ -746,6 +745,7 @@ Deprecations | |||
- :meth:`DataFrame.to_dict` has deprecated accepting short names for ``orient`` in future versions (:issue:`32515`) | |||
- :meth:`Categorical.to_dense` is deprecated and will be removed in a future version, use ``np.asarray(cat)`` instead (:issue:`32639`) | |||
- The ``fastpath`` keyword in the ``SingleBlockManager`` constructor is deprecated and will be removed in a future version (:issue:`33092`) | |||
- Providing ``suffixes`` as a ``set`` in :func:`pandas.merge` is deprecated. Provide a tuple instead (:issue:`33740`, :issue:`34741`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm we don't use sequence elsewhere do we, don't we use list-like everywhere?
raise TypeError( | ||
f"suffixes should be tuple of (str, str). But got {type(suffixes).__name__}" | ||
if isinstance(suffixes, set): | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that should use what @jorisvandenbossche suggests here , e.g. dict would pass this test but is not acceptable
def test_merge_suffix_type_error(col1, col2, suffixes): | ||
a = pd.DataFrame({col1: [1, 2, 3]}) | ||
b = pd.DataFrame({col2: [3, 4, 5]}) | ||
def test_merge_suffix_set(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this non_list_like parmaterize on dict as well
thanks @TomAugspurger |
with pytest.raises(TypeError, match=msg): | ||
pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes) | ||
with tm.assert_produces_warning(FutureWarning): | ||
pd.merge(a, b, left_index=True, right_index=True, suffixes={"left", "right"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not actually passing the parameterized suffixes
here, so I think that the dict case is not tested, and will also not raise a warning (as dicts are considered list-like)
Closes #34741, while
retaining the spirit of the spirit of
#34208.