-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparseSeries.__array__ only returns non-fills #14167
Comments
|
I agree with @jnothman -- I think this is a bug with |
I think you're going to confuse users when they pass things to objects that |
I understand current impl is for applying ufunc little efficient. I agree that |
but correctness/appropriateness depends on which ufunc anyway, I think. |
@sinhrks are you saying you did this intentionally for ufunc support? This is not the right way to do that (though the right way may not be possible until/if |
OK, I am going to reopen this as something we should discuss fixing for pandas 2.0. If it isn't possible to implement ufuncs on sparse objects with an incorrect implementation of |
Well if we can agree what exactly But surely should be fixed long before 2.0. Deferring things that can clearly be fixed beforehand is not useful. |
I see two viable options for what
|
I can't remember the original reasoning for having |
I was going to submit a PR changing array to be dense... but the problem I see is this: if you are to call a numpy function like I think it'd probably be best to document that array returns only the dense elements and leave it at that. |
@hexgnu this is a bit more complicated here, |
Yea I was digging into array_wrap although couldn't figure it out today while at the coffee shop... maybe I'll pick it up after I rejigger some of my other PR's outstanding. |
So I was digging into this a bit more. This causes some other pretty major issues with groupby. For instance df = pd.DataFrame({'a': [0, 1, 0, 0], 'b': [0, 1, 0, 0]})
sdf = df.to_sparse(fill_value=0)
sdf.groupby('a').mean() # returns only dense portions because the group by functionality relies on
np.asarray
sdf.groupby('a').count() # causes segfault Cross ref: #5078 |
Regarding to the last comment, note that
Exactly because of this "only returning dense", it is pointing to the wrong values. |
This was fixed by #22325 |
Code Sample, a copy-pastable example if possible
Expected Output
this should really be consistent with
Series
rather than just returning the non-fill values (i.e. rather than being equivalent tonp.array(ps.SparseArray([np.nan, 1]))
.output of
pd.show_versions()
Pandas 0.18.1
should alone be relevant.
Apologies I've not checked if this is fixed in master. Just passing on issues from
scikit-learn/scikit-learn#7352.
The text was updated successfully, but these errors were encountered: