Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure to write biom table with latest version #530

Closed
wdwvt1 opened this issue Aug 18, 2014 · 8 comments
Closed

failure to write biom table with latest version #530

wdwvt1 opened this issue Aug 18, 2014 · 8 comments
Milestone

Comments

@wdwvt1
Copy link
Contributor

wdwvt1 commented Aug 18, 2014

I don't want to publicly share the data so ask me for a copy if you want to check the bug out (creating a smaller subset of data that causes the bug has proven difficult).

from biom import load_table
from biom.table import Table
bt = load_table('/Users/wdwvt1/Desktop/test.biom')
from qiime.util import write_biom_table
write_biom_table(bt, '/Users/wdwvt1/Desktop/test2.biom')

raises

<ipython-input-6-5ae2949ff115> in <module>()
----> 1 write_biom_table(bt, '/Users/wdwvt1/Desktop/test2.biom')

/Users/wdwvt1/src/git_qiime/qiime/util.pyc in write_biom_table(biom_table, biom_table_fp, compress)
    511     with biom_open(biom_table_fp, 'w') as biom_file:
    512         biom_table.to_hdf5(biom_file, get_generated_by_for_biom_tables(),
--> 513                            compress)
    514 
    515 

/Users/wdwvt1/src/git_biom/build/lib.macosx-10.6-intel-2.7/biom/table.pyc in to_hdf5(self, h5grp, generated_by, compress)
   3336                   self.ids(axis='observation'),
   3337                   self.metadata(axis='observation'),
-> 3338                   self.group_metadata(axis='observation'), 'csr', compression)
   3339         axis_dump(h5grp.create_group('sample'), self.ids(),
   3340                   self.metadata(), self.group_metadata(), 'csc', compression)

/Users/wdwvt1/src/git_biom/build/lib.macosx-10.6-intel-2.7/biom/table.pyc in axis_dump(grp, ids, md, group_md, order, compression)
   3308                     # Create the dataset for the current category,
   3309                     # putting values in id order
-> 3310                     formatter[category](grp, category, md, compression)
   3311 
   3312             # Create the group for the group metadata

/Users/wdwvt1/src/git_biom/build/lib.macosx-10.6-intel-2.7/biom/table.pyc in vlen_list_of_str_formatter(grp, header, md, compression)
   3292                             continue
   3293                         value = np.asarray(m[header])
-> 3294                         data[i, :len(value)] = value
   3295                     # Change the None entries on data to empty strings ""
   3296                     data = np.where(data == np.array(None), "", data)

TypeError: len() of unsized object

I have the latest version of biom installed and I have verified the issue with both @josenavas and @ElDeveloper.

@wasade
Copy link
Member

wasade commented Aug 18, 2014

@wdwvt1, can you recreate the issue just using BIOM or does it only manifest when using QIIME's write_biom_table?

@wdwvt1
Copy link
Contributor Author

wdwvt1 commented Aug 18, 2014

sorry - getting tired haha. i meant to say, i can recreate using just biom.

from h5py import File
with File('/Users/wdwvt1/Desktop/test3.biom', 'w') as f:
    bt.to_hdf5(f, 'will')

raises

<ipython-input-24-f85a809f050e> in <module>()
      1 with File('/Users/wdwvt1/Desktop/test3.biom', 'w') as f:
----> 2     bt.to_hdf5(f, 'will')
      3 

/Users/wdwvt1/src/git_biom/build/lib.macosx-10.6-intel-2.7/biom/table.pyc in to_hdf5(self, h5grp, generated_by, compress)
   3336                   self.ids(axis='observation'),
   3337                   self.metadata(axis='observation'),
-> 3338                   self.group_metadata(axis='observation'), 'csr', compression)
   3339         axis_dump(h5grp.create_group('sample'), self.ids(),
   3340                   self.metadata(), self.group_metadata(), 'csc', compression)

/Users/wdwvt1/src/git_biom/build/lib.macosx-10.6-intel-2.7/biom/table.pyc in axis_dump(grp, ids, md, group_md, order, compression)
   3308                     # Create the dataset for the current category,
   3309                     # putting values in id order
-> 3310                     formatter[category](grp, category, md, compression)
   3311 
   3312             # Create the group for the group metadata

/Users/wdwvt1/src/git_biom/build/lib.macosx-10.6-intel-2.7/biom/table.pyc in vlen_list_of_str_formatter(grp, header, md, compression)
   3292                             continue
   3293                         value = np.asarray(m[header])
-> 3294                         data[i, :len(value)] = value
   3295                     # Change the None entries on data to empty strings ""
   3296                     data = np.where(data == np.array(None), "", data)

TypeError: len() of unsized object

@wasade
Copy link
Member

wasade commented Aug 18, 2014

Thanks. @josenavas, any ideas?

@josenavas
Copy link
Member

Yes, I tested this table in my own end and it looks like the JSON parser is returning the taxonomy as a numpy array with a single element, while the in-memory representation of the taxonomy should be a list of strings. I would say that this is a bug on the JSON parser, but I'm not familiar with that code.

@gregcaporaso
Copy link
Contributor

I'm also getting a traceback from vlen_list_of_str_formatter, in my case when running biom add-metadata on the QIIME 1.9.0 AMI (biom version 2.1.3). I have figured out a workaround for this case (see below).

$ biom add-metadata -i $HOME/data/short-read-tax-assignment/data/mock-community/B2/otu_table_mc2_no_pynast_failures.biom -o /home/ubuntu/data/2015.02.11-tax-parameter-sweep/mock-community/B2/gg_13_8_otus/sortmerna/0.76:0.9:5:0.8:1.0/table.biom --observation-metadata-fp /home/ubuntu/data/2015.02.11-tax-parameter-sweep/mock-community/B2/gg_13_8_otus/sortmerna/0.76:0.9:5:0.8:1.0/rep_set_tax_assignments.txt --observation-header otuid,taxonomy
Traceback (most recent call last):
  File "/usr/local/bin/pyqi", line 184, in <module>
    optparse_main(cmd_obj, argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/pyqi/core/interfaces/optparse/__init__.py", line 275, in optparse_main
    result = optparse_cmd(local_argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/pyqi/core/interface.py", line 41, in __call__
    return self._output_handler(cmd_result)
  File "/usr/local/lib/python2.7/dist-packages/pyqi/core/interfaces/optparse/__init__.py", line 250, in _output_handler
    opt_value)
  File "/usr/local/lib/python2.7/dist-packages/biom/interfaces/optparse/output_handler.py", line 80, in write_biom_table
    table.to_hdf5(f, generatedby())
  File "/usr/local/lib/python2.7/dist-packages/biom/table.py", line 3484, in to_hdf5
    self.group_metadata(axis='observation'), 'csr', compression)
  File "/usr/local/lib/python2.7/dist-packages/biom/table.py", line 3456, in axis_dump
    formatter[category](grp, category, md, compression)
  File "/usr/local/lib/python2.7/dist-packages/biom/table.py", line 3440, in vlen_list_of_str_formatter
    data[i, :len(value)] = value
TypeError: len() of unsized object

I had to pass --sc-separated taxonomy to get this to work (so it's working for me now, but wanted to point out the related issue that I had and the solution).

@gregcaporaso
Copy link
Contributor

Note that this issue is still present - it only comes up when writing HDF5-formatted files.

@ekopylova
Copy link

Thanks for the --sc-separated taxonomy solution! I also had the same error thrown when trying similar command above biom add-metadata without that option.

@wasade
Copy link
Member

wasade commented Nov 5, 2016

The metadata formatting and parsing in from_hdf5 and to_hdf5 in general need an overhaul. What I propose is adding a catch in biom add-metadata for this scenario and fail gracefully. Is that amendable here?

@wasade wasade added this to the 2.1.6 milestone Nov 5, 2016
wasade added a commit to wasade/biom-format that referenced this issue Mar 29, 2017
wasade added a commit to wasade/biom-format that referenced this issue Mar 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants