-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add sparse Acessor #23183
API: Add sparse Acessor #23183
Conversation
* Adds a Series.sparse accessor * Adds several methods to SparseArray to for use via the accessor
Hello @TomAugspurger! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23183 +/- ##
==========================================
- Coverage 92.23% 92.22% -0.01%
==========================================
Files 169 169
Lines 50924 50962 +38
==========================================
+ Hits 46968 47001 +33
- Misses 3956 3961 +5
Continue to review full report at Codecov.
|
Latest commit should fix the doc rendering for Series.sparse.to_coo and Series.sparse.from_coo (we just need to use the regular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some minor comments.
doc/source/whatsnew/v0.24.0.txt
Outdated
Some new warnings are issued for operations that require or are likely to materialize a large dense array: | ||
|
||
- A :class:`errors.PerformanceWarning` is issued when using fillna with a ``method``, as a dense array is constructed to create the filled array. Filling with a ``value`` is the efficient way to fill a sparse array. | ||
- A :class:`errors.PerformanceWarning` is now issued when concatenating sparse Series with differing fill values. The fill value from the first sparse array continues to be used. | ||
|
||
In addition to these API breaking changes, many :ref:`performance improvements and bug fixes have been made <whatsnew_0240.bug_fixes.sparse>`. | ||
|
||
Finally, a ``Series.sparse`` accessor has added to provide sparse-specific methods and attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add link to docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't link to a generic Series.sparse
location, but I can do
to provide sparse-specific methods like :meth:`Series.sparse.from_coo`.
and people can navigate from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the idea of a sparse accessor. I only have some hesitation about some of the attributes exposed there, as this enforces them as public API.
(as I think I already mentioned before, promoting this from SparseSeries to Series is like more officially supporting this as a stable part of pandas).
For example, do we want to expose the IntIndex and BlockIndex officially as API?
@@ -581,98 +580,13 @@ def combine_first(self, other): | |||
return dense_combined.to_sparse(fill_value=self.fill_value) | |||
|
|||
def to_coo(self, row_levels=(0, ), column_levels=(1, ), sort_labels=False): | |||
""" | |||
Create a scipy.sparse.coo_matrix from a SparseSeries with MultiIndex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep those docstrings here (or share it with the delegate method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is to deprecate SparseSeries entirely for 0.24.0. But I suppose we'll want a docstring for the deprecation period, so I'll use shared docs.
I'm happy to remove those. |
Forgot to remove sp_index and kind from the api.rst This should be ready once it passes. |
thanks @TomAugspurger |
…_pr2 * repo_org/master: DOC: Add docstring validations for "See Also" section (pandas-dev#23143) TST: Fix test assertion (pandas-dev#23357) BUG: Handle Period in combine (pandas-dev#23350) REF: SparseArray imports (pandas-dev#23329) CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992) DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051) Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341) TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304) API: Add sparse Acessor (pandas-dev#23183) PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
…ndas * repo_org/master: (23 commits) DOC: Add docstring validations for "See Also" section (pandas-dev#23143) TST: Fix test assertion (pandas-dev#23357) BUG: Handle Period in combine (pandas-dev#23350) REF: SparseArray imports (pandas-dev#23329) CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992) DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051) Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341) TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304) API: Add sparse Acessor (pandas-dev#23183) PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235) fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302) BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317) TST: re-enable gbq tests (pandas-dev#23303) Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311) isort imports-io (pandas-dev#23332) DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279) REF: Make PeriodArray an ExtensionArray (pandas-dev#22862) DOC: Added Examples for Series max (pandas-dev#23298) API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644) BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291) ...
…xamples * repo_org/master: (83 commits) DOC: Add docstring validations for "See Also" section (pandas-dev#23143) TST: Fix test assertion (pandas-dev#23357) BUG: Handle Period in combine (pandas-dev#23350) REF: SparseArray imports (pandas-dev#23329) CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992) DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051) Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341) TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304) API: Add sparse Acessor (pandas-dev#23183) PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235) fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302) BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317) TST: re-enable gbq tests (pandas-dev#23303) Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311) isort imports-io (pandas-dev#23332) DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279) REF: Make PeriodArray an ExtensionArray (pandas-dev#22862) DOC: Added Examples for Series max (pandas-dev#23298) API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644) BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291) ...
Closes #23148.
This should provide all the methods / attributes that were available on SparseSeries, but not Series.
Right now the docs for
.sparse.from_coo
andto_coo
seem to be broken. It's a bit strange since they're implemented on the accessor (they don't make sense on the Array)