Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARN introduce FutureWarning for value_counts behaviour change #49640

Closed
wants to merge 20 commits into from

Conversation

MarcoGorelli
Copy link
Member

As discussed in the dev call:

  • the new behaviour would be more desirable than the current one;
  • warning about it if users don't pass name= would be very noisy
  • if we just warn for 1.5.2, then there's at least some warning and it won't be there for too long as 2.0.0 should be out in a few months' time

Note that no warning is introduced for

df.groupby(col, as_index=False).value_counts()

as that's the one that the other value_counts will conform to

@@ -604,7 +604,19 @@ def value_counts(
ascending: bool = False,
bins=None,
dropna: bool = True,
*,
name: Hashable | None = None,
Copy link
Member

@rhshadrach rhshadrach Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding name to the API, which would presumably be deprecated in 2.0. So users would be adding name to silence the warning, then removing it to silence the new deprecation warning.

My understanding of the proposal was to just add a warning - no arguments - that the user will see on every use. While that is very noisy (maybe we could do a DeprecationWarning instead?), I think it's better than having users change code only to change it back.

cc @jreback

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @rhshadrach for taking a look - yeah if it's OK to emit a warning each time then that'd be better, I'm also not keen on adding something to the API and then immediately deprecating it. I'll do that

@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 11, 2022 19:04
@MarcoGorelli MarcoGorelli added this to the 1.5.2 milestone Nov 11, 2022
@MarcoGorelli MarcoGorelli mentioned this pull request Nov 11, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this going to be extremely noisy as implemented

@@ -9,6 +9,7 @@ The above statement is simply passing a ``Series`` of ``True``/``False`` objects
returning all rows with ``True``.

.. ipython:: python
:okwarning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the docs rather that ignoring the warnings

@@ -289,6 +289,7 @@ Furthermore ``datetime64[ns]`` columns are created by default, when passed datet
(:issue:`2809`, :issue:`2810`)

.. ipython:: python
:okwarning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

older release notes ok to ignore (or convert to code blocks )

@@ -33,7 +33,7 @@ Bug fixes

Other
~~~~~
-
- Introduced ``FutureWarning`` notifying about behaviour change in :meth:`DataFrame.value_counts`, :meth:`Series.value_counts`, :meth:`DataFrameGroupBy.value_counts`, :meth:`SeriesGroupBy.value_counts` - the resulting series will by default now be named ``'counts'`` (or ``'proportion'`` if ``normalize=True``), and the index (if present) will be taken from the original object's name (:issue:`49497`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this warrants a full on section and before / after on how to fix

with warnings.catch_warnings():
msg = "In pandas 2.0.0, the name of the resulting Series"
warnings.filterwarnings("ignore", msg, FutureWarning)
objcounts = data.value_counts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix it here? the fact that u have to catch is a problem

warnings.warn(
"In pandas 2.0.0, the name of the resulting Series will be "
"'count' (or 'proportion' if `normalize=True`).",
FutureWarning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to be extremely noisy

is there a way to do so that it doesn't show the warnjng

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding to #49640 (comment) as well.

I thought @jreback expressed the idea of having the warning (with no way to silence) on the call; it seems like I misunderstood. My preferred route would be no warning; another option is to just have a DeprecationWarning - anyone running an interactive session won't see the warning, but those testing via pytest would. While I dislike the idea of adding name, I'm not -1 on it.

From the linked issue on adding name, I'm taking some liberties here but it seems @jorisvandenbossche is +1, @jbrockmendel is -0.5, @Dr-Irv is +1, and @MarcoGorelli was -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume there will be a 1.5.3 release before 2.0.0? If so, perhaps let's put this off until then, don't want to rush a decision

I wouldn't be keen on adding name, but I'd still prefer that to not making the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did a DeprecationWarnjng though then this just is really a breaking change and we should just do it

so either am ok with either

  • just change it in 2.0
  • engineer a nice way to warn in 1.5.x but cannot be so noisy it's everywhere

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Nov 12, 2022

there's conflicting reviews now 😄 #49640 (comment)

Having been through adjusting all the warnings in this PR, I am concerned about noise: df.groupby(col).apply(lambda df: df.value_counts()) in particular would emit multiple warnings

@rhshadrach would you be OK with adding a name parameter to ease the deprecation cycle? The only other alternative (other than not doing this) I can think of would be to make the change without warning

@MarcoGorelli
Copy link
Member Author

closing for now as it's not the right solution, will a comment to the issue with further thoughts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants