Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If filter returns an empty table should an exception be raised? #619

Closed
ElDeveloper opened this issue Mar 25, 2015 · 6 comments
Closed

If filter returns an empty table should an exception be raised? #619

ElDeveloper opened this issue Mar 25, 2015 · 6 comments
Labels

Comments

@ElDeveloper
Copy link
Member

The current behavior makes it so that you can have an empty table as a result from filter, however this table cannot be serialized as HDF5 and leads to a rather unreadable error (see for more information biocore/qiime#1963).

@gregcaporaso
Copy link
Contributor

Aggressive filtering strategies can sometimes lead to empty tables, and in those cases the empty table is legitimately the correct filtered table, so raising an exception doesn't seem right (e.g., think about popping all items from a dict or list).

Could this be better handled at the serialization step? Ideally you could write something out that would be read back in as an empty table, but if not the writing step should fail, not the filtering itself.

@antgonza
Copy link
Contributor

antgonza commented Mar 25, 2015 via email

@wasade
Copy link
Member

wasade commented Mar 25, 2015

Agree. An empty Table is valid. I'm not so sure if an empty file is valid?
On Mar 25, 2015 08:34, "Antonio Gonzalez" notifications@github.com wrote:

Agree, what about letting the application/user validate this? For
example, qiime should be raising a warning and not skbio.


Reply to this email directly or view it on GitHub
#619 (comment).

@ElDeveloper
Copy link
Member Author

Got it, thanks. I think the table should be able to be serialized
correctly.

On (Mar-25-15| 7:35), Daniel McDonald wrote:

Agree. An empty Table is valid. I'm not so sure if an empty file is valid?
On Mar 25, 2015 08:34, "Antonio Gonzalez" notifications@github.com wrote:

Agree, what about letting the application/user validate this? For
example, qiime should be raising a warning and not skbio.


Reply to this email directly or view it on GitHub
#619 (comment).


Reply to this email directly or view it on GitHub:
#619 (comment)

@wasade
Copy link
Member

wasade commented Apr 23, 2015

@ElDeveloper, I think the issue lays elsewhere. See here, and below:

In [1]: from biom import Table

In [2]: t = Table([], [], [])

In [3]: t
Out[3]: 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)

In [4]: import h5py

In [5]: with h5py.File('testing.hdf5.biom', 'w') as fp:
   ...:     t.to_hdf5(fp, 'testing')
   ...:     

In [6]: from biom import load_table

In [7]: t_rt = load_table('testing.hdf5.biom')

In [8]: t_rt
Out[8]: 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)

@ElDeveloper
Copy link
Member Author

I think I was able to narrow down the problem a bit more, it seems to be related to the way filtering is done (heavily based on @wdwvt1's example). Problem seems to be that the shape ends up being 0 by X:

In [1]: %paste
from numpy import array
from biom.table import Table
from h5py import File
data = array([[1,2,3],[4,5,6]])
oids = ['otu1', 'otu2']
sids = ['s1', 's2', 's3']
md = [{'taxonomy': ['k__bacteria','p__firmicutes']}, {'taxonomy':['k__eukaryotes', 'p__something']}]
bt = Table(data, oids, sids, observation_metadata=md)

## -- End pasted text --

In [6]: broken = bt.filter({}, 'observation', inplace=False)

In [7]: from biom.util import biom_open

In [8]: with biom_open('not-happening.biom', 'w') as f:
   ...:     broken.to_hdf5(f, ':L')
   ...:     
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-8-f00f1993342a> in <module>()
      1 with biom_open('not-happening.biom', 'w') as f:
----> 2     broken.to_hdf5(f, ':L')
      3 

/Users/yoshikivazquezbaeza/.virtualenvs/qiime-dev/lib/python2.7/site-packages/biom/table.pyc in to_hdf5(self, h5grp, generated_by, compress, format_fs)
   3533                   self.ids(axis='observation'),
   3534                   self.metadata(axis='observation'),
-> 3535                   self.group_metadata(axis='observation'), 'csr', compression)
   3536         axis_dump(h5grp.create_group('sample'), self.ids(),
   3537                   self.metadata(), self.group_metadata(), 'csc', compression)

/Users/yoshikivazquezbaeza/.virtualenvs/qiime-dev/lib/python2.7/site-packages/biom/table.pyc in axis_dump(grp, ids, md, group_md, order, compression)
   3502                 formatter.update(format_fs)
   3503                 # Loop through all the categories
-> 3504                 for category in md[0]:
   3505                     # Create the dataset for the current category,
   3506                     # putting values in id order

IndexError: tuple index out of range

In [9]: broken.shape
Out[9]: (0, 3)

wasade added a commit to wasade/biom-format that referenced this issue Sep 11, 2018
@ElDeveloper ElDeveloper changed the title If filter returns an empty table should an exception should be raised? If filter returns an empty table should an exception be raised? Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants