Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear regression #116

Merged
merged 13 commits into from
Dec 8, 2021
Merged

Linear regression #116

merged 13 commits into from
Dec 8, 2021

Conversation

znicholls
Copy link
Collaborator

@znicholls znicholls commented Nov 30, 2021

  • Tests added
  • Passes isort . && black . && flake8
  • Fully documented, including CHANGELOG.rst

@codecov-commenter
Copy link

codecov-commenter commented Nov 30, 2021

Codecov Report

Merging #116 (23cece3) into master (f86a87a) will increase coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   78.42%   78.55%   +0.13%     
==========================================
  Files          26       27       +1     
  Lines        1293     1301       +8     
==========================================
+ Hits         1014     1022       +8     
  Misses        279      279              
Flag Coverage Δ
unittests 78.55% <100.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mesmer/core/linear_regression.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f86a87a...23cece3. Read the comment docs.

@znicholls znicholls requested a review from mathause November 30, 2021 02:04
Copy link
Member

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started! Looks good but I don't entirely get why you split your tests in integration and unit (?) tests. In my opinion your integration tests are also unit tests. And they could IMHO all live in test_linear_regression.py. But maybe I am missing your point. Should we quickly discuss this over zoom?

@znicholls
Copy link
Collaborator Author

I don't entirely get why you split your tests in integration and unit (?) tests. In my opinion your integration tests are also unit tests

The distinction I make (and it's pretty arbitrary so happy to go with a different convention for MESMER) is that unit tests should never fail because of an external library i.e. they should only ever call things which have been mocked. Everything that is in integration actually calls the underlying sklearn package (so would fail if sklearn ever changed its API) while everything in unit mocks out sklearn so would pass even if sklearn completely re-wrote its API.

Should we quickly discuss this over zoom?

Happy to if you want. Also happy to just go with whatever your favourite convention is.

Copy link
Member

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought is to not distinguish between integration and unit tests and try to avoid mocks if possible. My second thought was - maybe I should give it a go so I am actually entitled to an opinion 😄

Can you point me to a test or two where you think it added a lot of benefit? E.g. from netcdf-scm?

@znicholls
Copy link
Collaborator Author

Can you point me to a test or two where you think it added a lot of benefit?

An obvious one is here where you can test processing of responses without needing the internet https://gitlab.com/netcdf-scm/netcdf-scm/-/blob/master/tests/unit/test_citing.py#L176

This https://gitlab.com/netcdf-scm/netcdf-scm/-/blob/master/tests/unit/test_stitching.py#L232 allowed me to do a bunch of recursive path checking without actually needing to setup paths.

The other big advantage of mocking is that it forces you to write small functions. If it's not easy to mock, there's probably something wrong with your code.

@mathause
Copy link
Member

mathause commented Dec 7, 2021

Thanks for the examples. I definitely agree with mocking web requests and can also see merit for mocking glob. Otherwise I am not so sure

If it's not easy to mock, there's probably something wrong with your code.

I would rather say - if you have to use mock there's probably something wrong with your code 😉 E.g. instead of mocking glob I would try to write a pure function that takes a list of files as input, which you can then test to your heart's desire.

def get_parent_file_path(path):
    files = glob.glob(path)
    return _get_parent_file_path(files)

def _get_parent_file_path(files):
    # do stuff
    return parent

@znicholls
Copy link
Collaborator Author

znicholls commented Dec 7, 2021

I would rather say - if you have to use mock there's probably something wrong with your code 😉 E.g. instead of mocking glob I would try to write a pure function that takes a list of files as input, which you can then test to your heart's desire.

That's a good example. In that case I would say that you test _get_parent_file_path however you want. Nonetheless, you still have the issue of how to test get_parent_file_path. In my experience there are two options:

  1. Test get_parent_file_path without mocking. In that case you are implicitly testing _get_parent_file_path at the same time.
  2. Test get_parent_file_path with mocking i.e. make sure that it calls glob and then passes the output to _get_parent_file_path. Test _get_parent_file_path separately to make sure it does the right thing with a list of file paths.

If there's only two layers the first one is almost always fine. However, if you have a command-line interface, which triggers off a whole bunch of other processes (say 50) then it quickly becomes impossible to test all the paths through all the different layers. In my experience, getting mocking into the testing early makes it much simpler to build the tests and be confident that all paths are covered. If I know the two paths through my first function work correctly, and the two paths through the second function work, and the two paths through my last function work, then I can conclude that all 8 combinations of paths will behave. However if I test the first function without mocking, I have to explicitly test that all 8 paths will behave correctly in order to be sure that this function is actually hooked up correctly. You can imagine that this quickly fails once it's 10 layers rather than 3. As an example of where I made a complete mess of this and wish I had used mocking instead (and written my code better), see this horrendous test (https://github.com/openscm/scmdata/blob/292d71619872744227072c0b094b300e0c761c8b/tests/integration/test_processing.py#L1007) which is way too long and ends up being run ~500 times to cover all the combinations (related issue is here openscm/scmdata#178).

My two cents, let's discuss later.

@znicholls
Copy link
Collaborator Author

@mathause the latest commits hopefully clean things up a bit. I've made #118 for when I get to writing docs.

Copy link
Member

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and the discussion! Feel free to merge anytime.

@znicholls znicholls merged commit df6dba2 into master Dec 8, 2021
@znicholls znicholls deleted the linear-regression branch December 8, 2021 23:54
@mathause mathause mentioned this pull request Jun 10, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants