Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tests to use example tree sequences and parametrize #1804

Open
jeromekelleher opened this issue Oct 17, 2021 · 5 comments
Open

Refactor tests to use example tree sequences and parametrize #1804

jeromekelleher opened this issue Oct 17, 2021 · 5 comments
Labels
Infrastructure and tools Development infrastructure and tools Python API Issue is about the Python API refactoring Issue is about refactoring existing code

Comments

@jeromekelleher
Copy link
Member

We currently have a mishmash of different ways of running tests on a range of different topologies. I think a good pattern will be to use pytest.mark.parametrize with a selection of tree sequences that we create once. The get_example_tree_sequences list in tests/test_highlevel.py is a good place to start for this (and currently used in existing PRs like #1704).

We don't have to do it all at once, but it would be good to refactor a few of the places we use different topologies and set the standard pattern for how we'd to see things done in the future. To close this PR, we should:

  1. Extract the get_example_tree_sequences() code from tests/test_highlevel.py into it's own module (maybe examples.py?) and set it up so that the list of examples gets built the first time its called, and then cached afterwards.
  2. Refactor the test_highlevel.py file to use this function instead of the local examples and fix any breakages (nice to do a bit more "parametrizing" there as well where possible instead of iterating over the examples within the function.
  3. Port the existing "fixture" methods to use the same pattern.

I think pytest fixtures are an anti-pattern because you're defining a global variable that the reader is supposed to understand the meaning of, and know where to find the definition. This approach is much more transparent.

So, code like this should be canonical in test files going forward:

from tests import examples

...

class TestSomeStuff:

     @pytest.mark.parametrize("ts", examples.standard_ts())
     def test_specific_thing_where_topology_doesnt_matter(self, ts):
            ...

     @pytest.mark.parametrize("ts", examples.sample_topologies())
     def test_specific_thing_where_topology_matters_and_we_want_a_variety(self, ts):
            ...
@jeromekelleher jeromekelleher added Python API Issue is about the Python API Infrastructure and tools Development infrastructure and tools refactoring Issue is about refactoring existing code labels Oct 17, 2021
@jeromekelleher
Copy link
Member Author

A nice side effect of this is that'll give us the option of tweaking how extensive the test suite is - we can add some pytest parameters that'll let us control how many different examples we go through in the various lists. So, when adding a new feature that you really want to throw the kitchen sink at, have lots of replicates but usually we keep the number of samples quite small and selective so the suite runs more quickly.

@benjeffery
Copy link
Member

Sounds great, I think it is worth keeping the fixture that we have which is a tree sequence where all fields have data (ts_fixture) as when the tests don't depend on topology this is all you need.

@jeromekelleher
Copy link
Member Author

I agree with should keep it, but I'd prefer to call it examples.complete_ts() or something. Mysterious global variables that you have to pull out of the pytest config just seems like an unnecessarily opaque way of doing things --- this is much more discoverable for a new person coming in to the project who doesn't know much about pytest.

@jeromekelleher
Copy link
Member Author

jeromekelleher commented Oct 27, 2021

We should add some arguments here to filter the list for things we're interested in:

  • discrete_genome (True, just discrete, False just continuous, None, both)
  • single_root

I guess we also want a few different types of sites on these.

@petrelharp
Copy link
Contributor

Some more things to make sure we hit (maybe with variables maybe not):

  • infinite_alleles
  • non-single-character alleles
  • has_mutations
  • missing_data
  • polytomies
  • ancestral_samples
  • metadata everywhere
  • discrete_times
  • sites with no mutations, individuals with no nodes, populations with no nodes
  • unsorted tables to the extent allowed by the requirements (eg shuffle up the nodes)
  • unary nodes

Most of these could be hit by a few WF simulations on a small genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure and tools Development infrastructure and tools Python API Issue is about the Python API refactoring Issue is about refactoring existing code
Projects
None yet
Development

No branches or pull requests

3 participants