Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeSequence.dump_text fails if metadata schema is present #1860

Closed
jeromekelleher opened this issue Oct 27, 2021 · 2 comments · Fixed by #2073
Closed

TreeSequence.dump_text fails if metadata schema is present #1860

jeromekelleher opened this issue Oct 27, 2021 · 2 comments · Fixed by #2073
Labels
bug Something isn't working Python API Issue is about the Python API
Milestone

Comments

@jeromekelleher
Copy link
Member

import sys
ts = msprime.sim_ancestry(4, sequence_length=45, random_seed=234)
ts.dump_text(populations=sys.stdout)

gives

id      metadata
Traceback (most recent call last):
  File "tmp.py", line 14, in <module>
    ts.dump_text(populations=sys.stdout)
  File "/home/jk/work/github/tskit/python/tskit/trees.py", line 3759, in dump_text
    metadata = base64.b64encode(metadata).decode(encoding)
  File "/usr/lib/python3.8/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'dict'

I guess the right behaviour here is to make the base64_metadata option default to None, and to output decoded metadata as JSON if a schema is present (by default) and base64 encode it otherwise. If base64_metadata is True, then we output the raw bytes always.

We should move the implementation of this function to text_formats.py while we're fixing it up too, to try and downsize trees.py a bit and keep all the text processing stuff in one place.

@jeromekelleher jeromekelleher added bug Something isn't working Python API Issue is about the Python API labels Oct 27, 2021
@benjeffery benjeffery added this to the Python 0.4.1 milestone Nov 2, 2021
@benjeffery
Copy link
Member

Been working on this - at the tree sequence level we don't have easy access to the raw bytes metadata if there is a schema in place. I'm tempted to remove the base64 option and print the metadata as JSON by default, unless there is no schema in which case we print base64 encoded bytes.

@jeromekelleher
Copy link
Member Author

I'm tempted to remove the base64 option and print the metadata as JSON by default, unless there is no schema in which case we print base64 encoded bytes.

What if we changed the definition slightly to "base64 encode the metadata if no schema is present". That'll still be compatible with old code. We can either ignore base64 if a schema is present or raise an error if it's True, depending on whether the default in the signature is None or True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python API Issue is about the Python API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants