Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Series.strides, .base, .data, .itemsize, .flags (numpy) attributes ? #20419

Closed
jorisvandenbossche opened this issue Mar 20, 2018 · 10 comments · Fixed by #20721
Closed
Labels
API Design Deprecate Functionality to remove in pandas
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Related to #18262, but since this is a very specific class of attributes, thought to open a specific issue about it.

We have a bunch of attributes on the Series class that stem from the time it was a numpy array subclass, and now just pass through the attribute of the underlying numpy array. It are typically attributes describing the data layout specific to the numpy array, which I don't think necessarily makes sense for a Series:

  • Series.base
  • Series.data
  • Series.strides
  • Series.itemsize
  • Series.flags

and potentially also:

  • Series.real and Series.imag

So deprecating those can potentially remove 7 entries from the Series namespace.

Are there good reasons to keep them? Is this somehow useful for "compatibility" (writing code that works for both series as numpy array)
(I personally can't think of a usecase where you would want one of the above, unless you explicitly know will deal with numpy arrays)

One of the problems might be that if we refer users to Series.values.<attribute> that this will depend on the underlying array type if that will work or not (eg if .values starts giving an ExtensionArray, it will also not have those attributes)

cc @shoyer I don't think you kept those for "compatibility" in DataArray in xarray?
And in dask I think itemsize, real and imag is provided.

@jorisvandenbossche jorisvandenbossche added API Design Deprecate Functionality to remove in pandas labels Mar 20, 2018
@shoyer
Copy link
Member

shoyer commented Mar 20, 2018

In xarray we have real/imag but all these other low level NumPy attributes are dropped (.data means something else entirely). I agree with deprecating these for Series.

@jorisvandenbossche
Copy link
Member Author

What are people's feeling about imag and real? They were added back explicitly (#4819) after they disappeared with the Series refactor (for backwards compatibility).

@jreback jreback added this to the 0.23.0 milestone Apr 19, 2018
@TomAugspurger
Copy link
Contributor

I'm fine with keeping real and imag.

@RubenAstudillo
Copy link

I am hitting the deprecation warning on using the .data parameter which doesn't tell me the right path on where to look at for operating on the array (I needed to do boxcox on it, but the index were of no use). Maybe adding a hint towards Series.values.data? maybe a link on why shouldn't I use .data?

@jgerity
Copy link

jgerity commented Apr 10, 2019

It also looks like this planned deprecation is not mentioned in the docs, and definitely should be if it's raising FutureWarning

@jreback
Copy link
Contributor

jreback commented Apr 10, 2019

this was deprecated several versions ago

@jorisvandenbossche
Copy link
Member Author

Maybe adding a hint towards Series.values.data? maybe a link on why shouldn't I use .data?

Series.values is also not guaranteed to return a numpy array (eg for categoricals), so np.asarray(Series).data might be more reliable (although for the cases Series.values is not a numpy array, that method will also point to the data of a copy ..).

May I ask why you need .data to do boxcox on it? I would think you just need the numpy array to ignore the index?

It also looks like this planned deprecation is not mentioned in the docs, and definitely should be if it's raising FutureWarning

That's a good point! PR to add this to the docstrings very welcome.

@RubenAstudillo
Copy link

Yeah, it was not the most straight forward way to do boxcox. It was still confusing.

@kaelzhang
Copy link

But seems there are no documents about what are the replacements of those attributes?

Since pandas 1.0.0, how to get strides of a pandas Series?

@jorisvandenbossche
Copy link
Member Author

There is no explicit replacement. A pandas Series is not guaranteed to have strides. You can convert the Series to a numpy array (np.asarray(s) or s.to_numpy()) to know the strides of this array (but it's not necessarily a zero copy conversion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants