-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] to_dataframe
does not produce sparse data frames
#808
Comments
Interesting. So, unlike Do you have a fix by chance? In [1]: import biom
In [2]: print(biom.example_table)
# Constructed from biom file
#OTU ID S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [3]: biom.example_table.to_dataframe()
Out[3]:
S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [4]: biom.example_table.to_dataframe().info()
<class 'pandas.core.sparse.frame.SparseDataFrame'>
Index: 2 entries, O1 to O2
Data columns (total 3 columns):
S1 2 non-null float64
S2 2 non-null float64
S3 2 non-null float64
dtypes: float64(3)
memory usage: 64.0+ bytes
In [5]: biom.example_table.to_dataframe(dense=True)
Out[5]:
S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [6]: biom.example_table.to_dataframe(dense=True).to_sparse()
Out[6]:
S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0
In [7]: biom.example_table.to_dataframe(dense=True).to_sparse().info
Out[7]:
<bound method DataFrame.info of S1 S2 S3
O1 0.0 1.0 2.0
O2 3.0 4.0 5.0>
In [8]: biom.example_table.to_dataframe(dense=True).to_sparse().info()
<class 'pandas.core.sparse.frame.SparseDataFrame'>
Index: 2 entries, O1 to O2
Data columns (total 3 columns):
S1 2 non-null float64
S2 2 non-null float64
S3 2 non-null float64
dtypes: float64(3)
memory usage: 64.0+ bytes
In [9]: biom.example_table.to_dataframe(dense=True).to_sparse().density
Out[9]: 1.0 |
I think you would just have to set |
That would be wonderful, thank you! |
Merged
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I noticed that the
pandas.SparseDataFrame
returned byTable.to_dataframe
is not really sparse. For instance for the American Gut data:This is basically the memory use of the full table including zeros. Also the densities of the original table and the
SparseDataTable
are pretty different (~0% vs 100%).The text was updated successfully, but these errors were encountered: