-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If filter returns an empty table should an exception be raised? #619
Comments
Aggressive filtering strategies can sometimes lead to empty tables, and in those cases the empty table is legitimately the correct filtered table, so raising an exception doesn't seem right (e.g., think about popping all items from a dict or list). Could this be better handled at the serialization step? Ideally you could write something out that would be read back in as an empty table, but if not the writing step should fail, not the filtering itself. |
Agree, what about letting the application/user validate this? For
example, qiime should be raising a warning and not skbio.
|
Agree. An empty Table is valid. I'm not so sure if an empty file is valid?
|
Got it, thanks. I think the table should be able to be serialized On (Mar-25-15| 7:35), Daniel McDonald wrote:
|
@ElDeveloper, I think the issue lays elsewhere. See here, and below: In [1]: from biom import Table
In [2]: t = Table([], [], [])
In [3]: t
Out[3]: 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)
In [4]: import h5py
In [5]: with h5py.File('testing.hdf5.biom', 'w') as fp:
...: t.to_hdf5(fp, 'testing')
...:
In [6]: from biom import load_table
In [7]: t_rt = load_table('testing.hdf5.biom')
In [8]: t_rt
Out[8]: 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense) |
I think I was able to narrow down the problem a bit more, it seems to be related to the way filtering is done (heavily based on @wdwvt1's example). Problem seems to be that the shape ends up being 0 by X: In [1]: %paste
from numpy import array
from biom.table import Table
from h5py import File
data = array([[1,2,3],[4,5,6]])
oids = ['otu1', 'otu2']
sids = ['s1', 's2', 's3']
md = [{'taxonomy': ['k__bacteria','p__firmicutes']}, {'taxonomy':['k__eukaryotes', 'p__something']}]
bt = Table(data, oids, sids, observation_metadata=md)
## -- End pasted text --
In [6]: broken = bt.filter({}, 'observation', inplace=False)
In [7]: from biom.util import biom_open
In [8]: with biom_open('not-happening.biom', 'w') as f:
...: broken.to_hdf5(f, ':L')
...:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-8-f00f1993342a> in <module>()
1 with biom_open('not-happening.biom', 'w') as f:
----> 2 broken.to_hdf5(f, ':L')
3
/Users/yoshikivazquezbaeza/.virtualenvs/qiime-dev/lib/python2.7/site-packages/biom/table.pyc in to_hdf5(self, h5grp, generated_by, compress, format_fs)
3533 self.ids(axis='observation'),
3534 self.metadata(axis='observation'),
-> 3535 self.group_metadata(axis='observation'), 'csr', compression)
3536 axis_dump(h5grp.create_group('sample'), self.ids(),
3537 self.metadata(), self.group_metadata(), 'csc', compression)
/Users/yoshikivazquezbaeza/.virtualenvs/qiime-dev/lib/python2.7/site-packages/biom/table.pyc in axis_dump(grp, ids, md, group_md, order, compression)
3502 formatter.update(format_fs)
3503 # Loop through all the categories
-> 3504 for category in md[0]:
3505 # Create the dataset for the current category,
3506 # putting values in id order
IndexError: tuple index out of range
In [9]: broken.shape
Out[9]: (0, 3) |
The current behavior makes it so that you can have an empty table as a result from filter, however this table cannot be serialized as HDF5 and leads to a rather unreadable error (see for more information biocore/qiime#1963).
The text was updated successfully, but these errors were encountered: