Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add sparse Acessor #23183

Merged
merged 10 commits into from
Oct 26, 2018
Merged

Conversation

TomAugspurger
Copy link
Contributor

  • Adds a Series.sparse accessor
  • Adds several methods to SparseArray to for use via the accessor

Closes #23148.

This should provide all the methods / attributes that were available on SparseSeries, but not Series.

Right now the docs for .sparse.from_coo and to_coo seem to be broken. It's a bit strange since they're implemented on the accessor (they don't make sense on the Array)

* Adds a Series.sparse accessor
* Adds several methods to SparseArray to for use via the accessor
@TomAugspurger TomAugspurger added API Design Sparse Sparse Data Type labels Oct 16, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 16, 2018
@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Oct 16, 2018

Codecov Report

Merging #23183 into master will decrease coverage by <.01%.
The diff coverage is 88.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23183      +/-   ##
==========================================
- Coverage   92.23%   92.22%   -0.01%     
==========================================
  Files         169      169              
  Lines       50924    50962      +38     
==========================================
+ Hits        46968    47001      +33     
- Misses       3956     3961       +5
Flag Coverage Δ
#multiple 90.65% <88.09%> (-0.01%) ⬇️
#single 42.28% <42.85%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/accessors.py 90.09% <ø> (ø) ⬆️
pandas/core/accessor.py 98.79% <ø> (ø) ⬆️
pandas/core/series.py 93.87% <100%> (+0.01%) ⬆️
pandas/core/sparse/series.py 95.53% <100%> (+0.04%) ⬆️
pandas/core/arrays/sparse.py 91.84% <85.29%> (-0.3%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a2d501...e26d48e. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

Latest commit should fix the doc rendering for Series.sparse.to_coo and Series.sparse.from_coo (we just need to use the regular autosummary directive, instead of the autoaccessor, since the docstring is on the accessor, not the object delegated to).

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some minor comments.

doc/source/sparse.rst Show resolved Hide resolved
Some new warnings are issued for operations that require or are likely to materialize a large dense array:

- A :class:`errors.PerformanceWarning` is issued when using fillna with a ``method``, as a dense array is constructed to create the filled array. Filling with a ``value`` is the efficient way to fill a sparse array.
- A :class:`errors.PerformanceWarning` is now issued when concatenating sparse Series with differing fill values. The fill value from the first sparse array continues to be used.

In addition to these API breaking changes, many :ref:`performance improvements and bug fixes have been made <whatsnew_0240.bug_fixes.sparse>`.

Finally, a ``Series.sparse`` accessor has added to provide sparse-specific methods and attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add link to docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't link to a generic Series.sparse location, but I can do

to provide sparse-specific methods like :meth:`Series.sparse.from_coo`.

and people can navigate from there.

doc/source/api.rst Outdated Show resolved Hide resolved
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the idea of a sparse accessor. I only have some hesitation about some of the attributes exposed there, as this enforces them as public API.
(as I think I already mentioned before, promoting this from SparseSeries to Series is like more officially supporting this as a stable part of pandas).

For example, do we want to expose the IntIndex and BlockIndex officially as API?

@@ -581,98 +580,13 @@ def combine_first(self, other):
return dense_combined.to_sparse(fill_value=self.fill_value)

def to_coo(self, row_levels=(0, ), column_levels=(1, ), sort_labels=False):
"""
Create a scipy.sparse.coo_matrix from a SparseSeries with MultiIndex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep those docstrings here (or share it with the delegate method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is to deprecate SparseSeries entirely for 0.24.0. But I suppose we'll want a docstring for the deprecation period, so I'll use shared docs.

doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

For example, do we want to expose the IntIndex and BlockIndex officially as API?

I'm happy to remove those.

@TomAugspurger
Copy link
Contributor Author

Forgot to remove sp_index and kind from the api.rst

This should be ready once it passes.

@jreback jreback merged commit df4ffc7 into pandas-dev:master Oct 26, 2018
@jreback
Copy link
Contributor

jreback commented Oct 26, 2018

thanks @TomAugspurger

@TomAugspurger TomAugspurger deleted the sparse-accessor branch October 26, 2018 01:31
thoo added a commit to thoo/pandas that referenced this pull request Oct 26, 2018
…_pr2

* repo_org/master:
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…ndas

* repo_org/master: (23 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…xamples

* repo_org/master: (83 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants