make `Table.to_dataframe` create real sparse frames #809

cdiener · 2019-03-07T21:01:54Z

So this is a proposed fix to #808. It basically specifies the fill value for sparse data explicitly. I also added a test to check for that in the future.

Before that the the matrix data was first converted to individual pandas.SparseSeries and passed to pandas.SparseDataFrame. This is now done directly without the intermediate allocations since SparseDataFrame accepts any scipy sparse matrix. If Table.matrix_data can be something else than a numpy matrix or scipy sparse matrix this might not work but for now all tests seem to pass. Happy to change that back to the old behavior if that was on purpose.

As a side effect to_dataframe is now much faster on large sparse data sets. Takes about 20s on my machine for the American Gut biom (was more than 30m before).

wasade · 2019-03-07T21:05:59Z

This is great, thank you! In prior versions of pandas, we couldn't pass scipy sparse objects. Do you know by chance if this effects our present minimum version requirement?

…

On Thu, Mar 7, 2019, 1:01 PM Christian Diener ***@***.***> wrote: So this is a proposed fix to #808 <#808>. It basically specifies the fill value for sparse data explicitly. I also added a test to check for that in the future. Before that the the matrix data was first converted to individual pandas.SparseSeries and passed to pandas.SparseDataFrame. This is now done directly without the intermediate allocations since SparseDataFrame accepts any scipy sparse matrix. If Table.matrix_data can be something else than a numpy matrix or scipy sparse matrix this might not work but for now all tests seem to pass. Happy to change that back to the old behavior if that was on purpose. As a side effect to_dataframe is now much faster on large sparse data sets. Takes about 20s on my machine for the American Gut biom (was more than 30m before). ------------------------------ You can view, comment on, or merge this pull request online at: #809 Commit Summary - specify fill value File Changes - *M* biom/table.py <https://github.com/biocore/biom-format/pull/809/files#diff-0> (9) - *M* biom/tests/test_table.py <https://github.com/biocore/biom-format/pull/809/files#diff-1> (9) Patch Links: - https://github.com/biocore/biom-format/pull/809.patch - https://github.com/biocore/biom-format/pull/809.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#809>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAc8stS8w-kHIobrAjsICjk6qhYEqg97ks5vUX5DgaJpZM4bkGfk> .

wasade

Code seems reasonable, able to add a mention to the ChangeLog by chance?

cdiener · 2019-03-07T21:13:35Z

Was introduced in pandas 0.20.0, so should be fine with the current version requirement (>=0.20.0).

coveralls · 2019-03-07T21:14:40Z

Coverage decreased (-0.3%) to 86.132% when pulling bbf66b3 on cdiener:fix/sparse_to_dataframe into 2932695 on biocore:master.

coveralls · 2019-03-07T21:14:40Z

Coverage remained the same at 86.438% when pulling f9676d8 on cdiener:fix/sparse_to_dataframe into 2932695 on biocore:master.

ElDeveloper · 2019-03-07T23:07:19Z

Looks great!

wasade · 2019-03-08T16:25:05Z

Thanks @cdiener and @ElDeveloper!

specify fill value

fe40ba5

fix flake8

bbf66b3

wasade requested changes Mar 7, 2019

View reviewed changes

cdiener added 2 commits March 7, 2019 13:16

add to changelog

6c11d54

fix test for Python 2

f9676d8

ElDeveloper merged commit ac2f835 into biocore:master Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make `Table.to_dataframe` create real sparse frames #809

make `Table.to_dataframe` create real sparse frames #809

cdiener commented Mar 7, 2019

wasade commented Mar 7, 2019 via email

wasade left a comment

cdiener commented Mar 7, 2019

coveralls commented Mar 7, 2019

coveralls commented Mar 7, 2019 •

edited

Loading

ElDeveloper commented Mar 7, 2019

wasade commented Mar 8, 2019

make Table.to_dataframe create real sparse frames #809

make Table.to_dataframe create real sparse frames #809

Conversation

cdiener commented Mar 7, 2019

wasade commented Mar 7, 2019 via email

wasade left a comment

Choose a reason for hiding this comment

cdiener commented Mar 7, 2019

coveralls commented Mar 7, 2019

coveralls commented Mar 7, 2019 • edited Loading

ElDeveloper commented Mar 7, 2019

wasade commented Mar 8, 2019

make `Table.to_dataframe` create real sparse frames #809

make `Table.to_dataframe` create real sparse frames #809

coveralls commented Mar 7, 2019 •

edited

Loading