Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementing Variant.__repr__ #2695

Merged
merged 1 commit into from
Jan 19, 2023
Merged

implementing Variant.__repr__ #2695

merged 1 commit into from
Jan 19, 2023

Conversation

chriscrsmith
Copy link
Contributor

Description

Returning a string representation of the raw data for a variant without spewing megabytes of text.

Fixes #2694

Notes:

  • If a site has experienced many mutations (as with one of the test cases), the repr() output can be quite large (~6600 characters for one test case). I left this as is for now.
  • added a test, checking that len(repr(variant)) < 10000
  • I didn't add anything to the docs yet

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@codecov
Copy link

codecov bot commented Jan 16, 2023

Codecov Report

Merging #2695 (5daf0af) into main (ba5d66a) will decrease coverage by 13.14%.
The diff coverage is 33.33%.

❗ Current head 5daf0af differs from pull request most recent head d7dd944. Consider uploading reports for the commit d7dd944 to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2695       +/-   ##
===========================================
- Coverage   94.00%   80.86%   -13.14%     
===========================================
  Files          29       29               
  Lines       28070    27782      -288     
  Branches     1585     1312      -273     
===========================================
- Hits        26386    22465     -3921     
- Misses       1648     5221     +3573     
- Partials       36       96       +60     
Flag Coverage Δ
c-tests 92.26% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 71.17% <33.33%> (-0.01%) ⬇️
python-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/genotypes.py 31.63% <33.33%> (-67.42%) ⬇️
python/tskit/cli.py 0.00% <0.00%> (-96.95%) ⬇️
python/tskit/vcf.py 7.87% <0.00%> (-90.56%) ⬇️
python/tskit/text_formats.py 9.87% <0.00%> (-90.13%) ⬇️
python/tskit/drawing.py 10.92% <0.00%> (-88.31%) ⬇️
python/tskit/stats.py 12.99% <0.00%> (-85.58%) ⬇️
python/tskit/intervals.py 19.61% <0.00%> (-80.39%) ⬇️
python/tskit/combinatorics.py 41.49% <0.00%> (-57.88%) ⬇️
python/tskit/trees.py 46.20% <0.00%> (-52.57%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba5d66a...d7dd944. Read the comment docs.

@jeromekelleher
Copy link
Member

Looks great, thanks @chriscrsmith!

Can you paste in an example of what this looks like please?

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chriscrsmith!
Just a couple of minor comments, then it's good to merge.

python/tests/test_genotypes.py Show resolved Hide resolved
python/tskit/genotypes.py Outdated Show resolved Hide resolved
@chriscrsmith
Copy link
Contributor Author

Example output:

Variant({'site': Site(id=4, position=97801505.0, ancestral_state='T', mutations=[Mutation(id=4, site=4, node=30647, derived_state='A', parent=-1, metadata={'mutation_list': []}, time=30.135398854734376, edge=93786)], metadata=b''), 'samples': array([ 0, 1, 2, ..., 22855, 22856, 22857], dtype=int32), 'alleles': ('T', 'A'), 'genotypes': array([0, 0, 0, ..., 0, 0, 0], dtype=int32), 'has_missing_data': False, 'isolated_as_missing': 1})

And a longer example, from tests/test_genotypes.py:

Variant({'site': Site(id=0, position=0.0, ancestral_state='C', mutations=[Mutation(id=0, site=0, node=8, derived_state='G', parent=-1, metadata={'foo': 'n_mutations_0'}, time=2667.608851448006, edge=380), Mutation(id=1, site=0, node=217, derived_state='G', parent=-1, metadata={'foo': 'n_mutations_1'}, time=2425.261482015569, edge=371), Mutation(id=2, site=0, node=8, derived_state='T', parent=0, metadata={'foo': 'n_mutations_2'}, time=2244.539354591147, edge=380), Mutation(id=3, site=0, node=8, derived_state='G', parent=2, metadata={'foo': 'n_mutations_3'}, time=1757.3165010264859, edge=380), Mutation(id=4, site=0, node=187, derived_state='C', parent=1, metadata={'foo': 'n_mutations_4'}, time=1722.3181109326438, edge=354), Mutation(id=5, site=0, node=204, derived_state='A', parent=-1, metadata={'foo': 'n_mutations_5'}, time=1582.8304459003948, edge=370), Mutation(id=6, site=0, node=204, derived_state='T', parent=5, metadata={'foo': 'n_mutations_6'}, time=1528.2758355792398, edge=370), Mutation(id=7, site=0, node=62, derived_state='T', parent=4, metadata={'foo': 'n_mutations_7'}, time=1067.4406919914602, edge=236), Mutation(id=8, site=0, node=80, derived_state='C', parent=1, metadata={'foo': 'n_mutations_8'}, time=1067.3525903283573, edge=338), Mutation(id=9, site=0, node=8, derived_state='T', parent=3, metadata={'foo': 'n_mutations_9'}, time=1014.6272474926118, edge=380), Mutation(id=10, site=0, node=78, derived_state='G', parent=4, metadata={'foo': 'n_mutations_10'}, time=814.0050926266172, edge=179), Mutation(id=11, site=0, node=37, derived_state='A', parent=4, metadata={'foo': 'n_mutations_11'}, time=789.2733680668103, edge=294), Mutation(id=12, site=0, node=32, derived_state='T', parent=1, metadata={'foo': 'n_mutations_12'}, time=737.3299514705911, edge=196), Mutation(id=13, site=0, node=78, derived_state='T', parent=10, metadata={'foo': 'n_mutations_13'}, time=723.4207151671163, edge=179), Mutation(id=14, site=0, node=21, derived_state='G', parent=4, metadata={'foo': 'n_mutations_14'}, time=690.0505206401573, edge=208), Mutation(id=15, site=0, node=17, derived_state='C', parent=6, metadata={'foo': 'n_mutations_15'}, time=682.2665224720806, edge=195), Mutation(id=16, site=0, node=37, derived_state='C', parent=11, metadata={'foo': 'n_mutations_16'}, time=669.4367836154175, edge=294), Mutation(id=17, site=0, node=4, derived_state='G', parent=4, metadata={'foo': 'n_mutations_17'}, time=616.446641249868, edge=156), Mutation(id=18, site=0, node=37, derived_state='A', parent=16, metadata={'foo': 'n_mutations_18'}, time=562.246881209518, edge=294), Mutation(id=19, site=0, node=88, derived_state='C', parent=1, metadata={'foo': 'n_mutations_19'}, time=544.0367311716333, edge=103), Mutation(id=20, site=0, node=8, derived_state='G', parent=9, metadata={'foo': 'n_mutations_20'}, time=533.0561026628599, edge=380), Mutation(id=21, site=0, node=13, derived_state='G', parent=6, metadata={'foo': 'n_mutations_21'}, time=522.472812667567, edge=326), Mutation(id=22, site=0, node=7, derived_state='C', parent=1, metadata={'foo': 'n_mutations_22'}, time=498.5647240606464, edge=204), Mutation(id=23, site=0, node=78, derived_state='C', parent=13, metadata={'foo': 'n_mutations_23'}, time=477.9634290872342, edge=179), Mutation(id=24, site=0, node=13, derived_state='T', parent=21, metadata={'foo': 'n_mutations_24'}, time=399.12490967472513, edge=326), Mutation(id=25, site=0, node=70, derived_state='T', parent=4, metadata={'foo': 'n_mutations_25'}, time=393.88467311832983, edge=101), Mutation(id=26, site=0, node=21, derived_state='A', parent=14, metadata={'foo': 'n_mutations_26'}, time=368.75034368650824, edge=208), Mutation(id=27, site=0, node=7, derived_state='T', parent=22, metadata={'foo': 'n_mutations_27'}, time=314.0656767695335, edge=204), Mutation(id=28, site=0, node=32, derived_state='A', parent=12, metadata={'foo': 'n_mutations_28'}, time=286.55445412227243, edge=196), Mutation(id=29, site=0, node=7, derived_state='C', parent=27, metadata={'foo': 'n_mutations_29'}, time=275.8245274129919, edge=204), Mutation(id=30, site=0, node=21, derived_state='C', parent=26, metadata={'foo': 'n_mutations_30'}, time=214.80423590748185, edge=208), Mutation(id=31, site=0, node=7, derived_state='T', parent=29, metadata={'foo': 'n_mutations_31'}, time=196.1373450749133, edge=204), Mutation(id=32, site=0, node=39, derived_state='G', parent=19, metadata={'foo': 'n_mutations_32'}, time=191.19194691507988, edge=87), Mutation(id=33, site=0, node=28, derived_state='G', parent=19, metadata={'foo': 'n_mutations_33'}, time=170.12282642461471, edge=96), Mutation(id=34, site=0, node=28, derived_state='C', parent=33, metadata={'foo': 'n_mutations_34'}, time=159.88367311045664, edge=96), Mutation(id=35, site=0, node=20, derived_state='C', parent=6, metadata={'foo': 'n_mutations_35'}, time=143.67051844412873, edge=122), Mutation(id=36, site=0, node=30, derived_state='G', parent=23, metadata={'foo': 'n_mutations_36'}, time=35.657691380446494, edge=26), Mutation(id=37, site=0, node=28, derived_state='G', parent=34, metadata={'foo': 'n_mutations_37'}, time=32.67691409795943, edge=96), Mutation(id=38, site=0, node=38, derived_state='G', parent=19, metadata={'foo': 'n_mutations_38'}, time=7.570763846286966, edge=86)], metadata={'foo': 'n_sites_0'}), 'samples': array([  1,   3,   5,   7,   9,  11,  13,  15,  17,  19,  21,  23,  25,
        27,  29,  31,  33,  35,  37,  39,  41,  43,  45,  47,  49,  51,
        53,  55,  57,  59,  61,  63,  65,  67,  69,  71,  73,  75,  77,
        79,  81,  83,  85,  87,  89,  91,  93,  95,  97,  99, 101, 103,
       105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129,
       131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155,
       157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181,
       183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207,
       209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233],
      dtype=int32), 'alleles': ('C', 'G', 'T', 'A', None), 'genotypes': array([ 2,  2,  2,  2,  0,  2,  2,  2,  0,  0,  0,  1,  2,  2,  1,  0,  0,
        2,  3,  1, -1, -1, -1, -1, -1, -1,  0, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1,  0, -1, -1, -1,  1,  0,  1, -1, -1,  2,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  0, -1, -1, -1,
        2, -1, -1, -1, -1, -1, -1, -1,  0, -1, -1, -1, -1, -1, -1, -1,  0,
        0, -1, -1, -1, -1, -1, -1, -1,  0, -1, -1, -1, -1, -1, -1, -1,  2,
       -1, -1,  1, -1, -1,  1,  1, -1, -1, -1,  0, -1, -1, -1, -1],
      dtype=int32), 'has_missing_data': True, 'isolated_as_missing': 1})

@jeromekelleher
Copy link
Member

LGTM, thanks @chriscrsmith !

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the CHANGELOG with your contribution also please @chriscrsmith?

@chriscrsmith
Copy link
Contributor Author

Can you update the CHANGELOG with your contribution also please @chriscrsmith?

done!

@jeromekelleher
Copy link
Member

Great! Just needs commit squash and we're good to merge I think

@chriscrsmith
Copy link
Contributor Author

done

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @chriscrsmith. Almost there, just need the changelog in the right place. As we only just released I haven't made a new section yet - sorry!

python/CHANGELOG.rst Outdated Show resolved Hide resolved
added tests for contents

edit changelog

fix changelog entry
@chriscrsmith
Copy link
Contributor Author

Fixed the changelog, and squashed

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chriscrsmith!

@benjeffery benjeffery added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Jan 19, 2023
@mergify mergify bot merged commit ad4dd4a into tskit-dev:main Jan 19, 2023
@mergify mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Variant.__repr__
3 participants