Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: tskit populations blah.ts fails #2050

Closed
grahamgower opened this issue Dec 9, 2021 · 14 comments
Closed

cli: tskit populations blah.ts fails #2050

grahamgower opened this issue Dec 9, 2021 · 14 comments
Labels
bug Something isn't working
Milestone

Comments

@grahamgower
Copy link
Member

I'm not sure what tskit populations cli subcommand is supposed to do. I expected it to show the populations or something. Instead I get:

For a SLiM (bleeding edge version) tree sequence

t490:tmp $ tskit populations slim.ts
id      metadata
Traceback (most recent call last):
  File "/home/grg/.local/bin/tskit", line 8, in <module>
    sys.exit(tskit_main())
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/cli.py", line 274, in tskit_main
    args.runner(args)
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/cli.py", line 111, in run_populations
    tree_sequence.dump_text(populations=sys.stdout)
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/trees.py", line 3955, in dump_text
    metadata = population.metadata
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/metadata.py", line 743, in __get__
    row, "_metadata", row._metadata_decoder(row._metadata)
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/metadata.py", line 166, in decode
    result = json.loads(encoded.decode())
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

For an msprime tree sequence:

t490:tmp $ tskit populations msprime.ts
id      metadata
Traceback (most recent call last):
  File "/home/grg/.local/bin/tskit", line 8, in <module>
    sys.exit(tskit_main())
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/cli.py", line 274, in tskit_main
    args.runner(args)
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/cli.py", line 111, in run_populations
    tree_sequence.dump_text(populations=sys.stdout)
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/trees.py", line 3957, in dump_text
    metadata = base64.b64encode(metadata).decode(encoding)
  File "/usr/lib/python3.9/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'dict'

This was from:

t490:tmp $ tskit --version
python3 -m tskit 0.4.0b1

But the output is the same with tskit 0.3.7

@benjeffery
Copy link
Member

Any chance you can attach the .trees?

@grahamgower
Copy link
Member Author

Any chance you can attach the .trees?

Sure. Github made me change the extensions so they could be uploaded.

slim.pptx
msprime.pptx

SLiM script.

initialize() {
	initializeMutationRate(1e-7);
	initializeMutationType("m1", 0.5, "f", 0.0);
	initializeGenomicElementType("g1", m1, 1.0);
	initializeGenomicElement(g1, 0, 99999);
	initializeRecombinationRate(1e-8);
	initializeTreeSeq();
}

1 {
	sim.addSubpop("p1", 500);
	p1.name = "awesomepop";
}

2000 late() { sim.treeSeqOutput("/tmp/slim.ts"); }

Msprime command:

$ msp ancestry -r 1e-8 -N 500 500 > /tmp/msprime.ts

@grahamgower
Copy link
Member Author

Also related: MesserLab/SLiM#254

$ tskit info /tmp/slim.ts
╔═════════════════════════╗
║TreeSequence             ║
╠═══════════════╤═════════╣
║Trees          │       18║
╟───────────────┼─────────╢
║Sequence Length│   100000║
╟───────────────┼─────────╢
║Time Units     │    ticks║
╟───────────────┼─────────╢
║Sample Nodes   │     1000║
╟───────────────┼─────────╢
║Total Size     │214.2 KiB║
╚═══════════════╧═════════╝
╔═══════════╤════╤════════╤════════════╗
║Table      │Rows│Size    │Has Metadata║
╠═══════════╪════╪════════╪════════════╣
║Edges      │1888│59.0 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Individuals│ 500│50.6 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Migrations │   0│ 8 Bytes│          No║
╟───────────┼────┼────────┼────────────╢
║Mutations  │ 169│10.7 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Nodes      │1843│69.1 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Populations│   2│ 2.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Provenances│   1│ 2.0 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Sites      │ 169│ 4.0 KiB│          No║
╚═══════════╧════╧════════╧════════════╝

But actually, all of the tskit subcommands fail if they correspond to tables with metadata.

$ tskit nodes /tmp/slim.ts
id      is_sample       time    population      individual      metadata
Traceback (most recent call last):
  File "/home/grg/.local/bin/tskit", line 8, in <module>
    sys.exit(tskit_main())
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/cli.py", line 274, in tskit_main
    args.runner(args)
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/cli.py", line 91, in run_nodes
    tree_sequence.dump_text(nodes=sys.stdout, precision=args.precision)
  File "/home/grg/.local/lib/python3.9/site-packages/tskit/trees.py", line 3838, in dump_text
    metadata = base64.b64encode(metadata).decode(encoding)
  File "/usr/lib/python3.9/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'dict'

@jeromekelleher
Copy link
Member

jeromekelleher commented Dec 10, 2021

This is a dupe of #1860 - those commands are doing a text dump of the tables, and the text dump is broken for tables with decodeable metadata. Should be an easy fix, it's just a question of priorities - would you like to see this fixed for 0.4.0?

@jeromekelleher jeromekelleher added this to the Python 0.4.1 milestone Dec 10, 2021
@jeromekelleher jeromekelleher added the bug Something isn't working label Dec 10, 2021
@grahamgower
Copy link
Member Author

I wonder if there isn't a second bug here. The SLiM case in my opening comment indicates a different error to #1860. @bhaller says this metadata is cool and normal (MesserLab/SLiM#254 (comment)), so that suggests there's some additional problem in tskit (separate to the dump_text() problem).

Should be an easy fix, it's just a question of priorities - would you like to see this fixed for 0.4.0?

It's not a high priority for me.

@bhaller
Copy link

bhaller commented Dec 10, 2021

Actually fixing this may not be urgent, but let's be sure that there is nothing wrong with the .trees that SLiM is now emitting, in terms of its metadata etc., before SLiM 3.7 ships – i.e., pretty much now. :-> So it would be great to have a complete understanding of @grahamgower's "second bug" here, ASAP.

@jeromekelleher
Copy link
Member

What's the second bug @grahamgower? Sorry I'm not following.

@grahamgower
Copy link
Member Author

What's the second bug @grahamgower? Sorry I'm not following.

The two exceptions in my first post are different. I don't have enought insight to decide if they're one or two bugs (but one is certainly the same as #1860).

@petrelharp
Copy link
Contributor

I think that the SLiM message is different because SLiM puts in "empty" metadata for some populations. So - same problem, I think (although the tests should probably include situations like this, where NULL is allowed).

@benjeffery
Copy link
Member

I can confirm that the msprime file's error is resolved. How ever the SLiM file still errors, this is because its populations define a JSON metadata schema but some of the entries are "" which is not valid. I'm sure we talked about this on another issue but I can't find it now.

@petrelharp
Copy link
Contributor

The schema for SLiM's populations have "type":["object","null"], which is why this is allowed.

@jeromekelleher
Copy link
Member

The discussion about whether b'' should be considered a valid instance of this schema is in #2064

@benjeffery
Copy link
Member

benjeffery commented Jan 11, 2022

The schema for SLiM's populations have "type":["object","null"], which is why this is allowed.

Specifying "type":["object","null"] for json codec means you are allowed the strings b"{}" or b"null". I think you're thinking of struct codec where b"" is the null representation and valid.

@benjeffery
Copy link
Member

Closing this as #2064 is the discission of if/how to accommodate sllim's b''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants