Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN/INT: remove usage of _internal_get_values #27165

Closed
jorisvandenbossche opened this issue Jul 1, 2019 · 6 comments
Closed

CLN/INT: remove usage of _internal_get_values #27165

jorisvandenbossche opened this issue Jul 1, 2019 · 6 comments
Labels
Clean Internals Related to non-user accessible pandas implementation

Comments

@jorisvandenbossche
Copy link
Member

PR #26409 introduced a _internal_get_values method for DataFrame, Series, Index, SparseArray and Categorical (in order to deprecate the public get_values).

There are some remaining cases where _internal_get_values is used internally. Eventually, we should try to remove all those cases. For each of those usages, it will need to be inspected what object is actually being handled to choose the appropriate alternative.

@jorisvandenbossche jorisvandenbossche added Internals Related to non-user accessible pandas implementation Clean labels Jul 1, 2019
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Jul 1, 2019
@jbrockmendel
Copy link
Member

I'm looking into this now, could use some help from @TomAugspurger in understanding the intent in a couple of cases:

  • in core.strings.str_repeat it looks like the arr argument should always be object-dtype. is that accurate?
  • In sparse.array.make_sparse the docstring says arr is an ndarray, but in the tests I'm seeing Series being passed to it from the constructor. can we just do extract_array at the top of the constructor?

@TomAugspurger
Copy link
Contributor

arr argument should always be object-dtype. is that accurate?

I don't think so. I think you can have a StringArray as well. The implementation for non-scalar repeats looks buggy though, with the np.asarray.

In [11]: s = pd.Series(['a', None], dtype="string")

In [12]: s.str.repeat([1, 2])
TypeError: descriptor '__mul__' requires a 'str' object but received a 'NAType'

will open a separate issue.

can we just do extract_array at the top of the constructor?

Seems reasonable, or we can require the caller to do that, whichever is easier.

@jbrockmendel
Copy link
Member

We're on the verge of having removed all usages of _internal_get_values with the exception of Categorical._internal_get_values, which is used in 7 places where it is actually needed. Luckily most of those places are preceeded by a is_categorical_dtype check.

This leaves open a few questions:

  • should we rename Categorical._internal_get_values to something more informative?
  • For dt64tz it returns a DatetimeIndex; should we return a DatetimeArray instead?
    • AFAICT the only user-facing effect would be in Series.values, which we could patch if we really wanted to continue returning DTI.
  • For integers-and-NAs it returns an ndarray[object]; should we return IntegerArray instead?
    • Doing this ATM would break one __repr__ test because a "NaN" becomes a "". Are we OK with this?
  • Should we keep the extant if is_categorical_dtype(obj.dtype): obj = obj._internal_get_values() or implement something like extract_array to make this a one-liner?

@jbrockmendel
Copy link
Member

@WillAyd it looks like objToJSON is calling _internal_get_values in get_values, isnt obvious to me which classes values are caught by that call. any ideas?

@WillAyd
Copy link
Member

WillAyd commented Mar 15, 2020

It will be used for any object that has it defined. I think could just remove that though; I can't think of a situation off the top of my head where it should matter

@jbrockmendel
Copy link
Member

We've gotten rid of all of the relevant usages; closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

No branches or pull requests

4 participants