Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent metadata representation between JSON and HDF5 #594

Closed
jairideout opened this issue Jan 30, 2015 · 2 comments
Closed

inconsistent metadata representation between JSON and HDF5 #594

jairideout opened this issue Jan 30, 2015 · 2 comments
Labels

Comments

@jairideout
Copy link
Member

Metadata is handled differently depending on underlying file format (JSON or HDF5).

This is related to a previous issue (#585) and fix (#589). The original issue occurred in QIIME's rarefaction unit tests (biocore/qiime#1918).

Example:

Create an in-memory table with metadata as a list of empty dictionaries. Write this table as JSON and HDF5. Read the two tables back into memory and compare to the original table. The JSON table is equal to the in-memory table, but the HDF5 table is not because the metadata differ (None vs. a list of defaultdicts):

In [1]: from biom.table import Table

In [2]: import numpy as np

In [3]: t = Table(np.array([[2,1,0],[0,5,0],[0,3,0],[1,2,0]]), list('bacd'), list('YXZ'), observation_metadata=[{}, {}, {}, {}], sample_metadata=[{}, {}, {}])

In [4]: with open('json.biom', 'w') as f:
   ...:     t.to_json('me', f)
   ...:

In [5]: from biom.util import biom_open

In [6]: with biom_open('hdf5.biom', 'w') as f:
   ...:     t.to_hdf5(f, 'me', True)
   ...:

In [7]: from biom import load_table

In [8]: json_table = load_table('json.biom')

In [9]: hdf5_table = load_table('hdf5.biom')

In [10]: json_table.descriptive_equality(t)
Out[10]: 'Tables appear equal'

In [11]: hdf5_table.descriptive_equality(t)
Out[11]: 'Observation metadata are not the same'

cc @josenavas @gregcaporaso @Jorge-C

@wasade
Copy link
Member

wasade commented Mar 3, 2015

Ping on this. We're doing a release for #599 relatively soon, so it would be good to lump in other bug fixes if they're attainable on short order

@wasade
Copy link
Member

wasade commented Nov 5, 2016

I'm really not sure what the best solution is here. I think the JSON formatter is actually incorrect as there isn't metadata to store since the dicts are empty. We don't have a way to represent this in HDF5 as the metadata are datasets named by their key, which doesn't exist.

I see two options here, either add a check into the JSON formatter for this edge case, or add a check into the constraints on table metadata such that, if all data are empty, that we set the metadata on an axis to be None. The latter is kind of nice but without immutability, I don't know if we can actually enforce it.

wasade added a commit to wasade/biom-format that referenced this issue Sep 17, 2018
@wasade wasade mentioned this issue Sep 17, 2018
ElDeveloper pushed a commit that referenced this issue Sep 18, 2018
* TST: test case for table w/ empty metadata

* BUG: empty metadata is actually now set to None, fixes #594
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants