Linear regression #116

znicholls · 2021-11-30T01:54:23Z

Tests added
Passes isort . && black . && flake8
Fully documented, including CHANGELOG.rst

codecov-commenter · 2021-11-30T01:59:14Z

Codecov Report

Merging #116 (23cece3) into master (f86a87a) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   78.42%   78.55%   +0.13%     
==========================================
  Files          26       27       +1     
  Lines        1293     1301       +8     
==========================================
+ Hits         1014     1022       +8     
  Misses        279      279

Flag	Coverage Δ
unittests	`78.55% <100.00%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mesmer/core/linear_regression.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f86a87a...23cece3. Read the comment docs.

mathause

Thanks for getting this started! Looks good but I don't entirely get why you split your tests in integration and unit (?) tests. In my opinion your integration tests are also unit tests. And they could IMHO all live in test_linear_regression.py. But maybe I am missing your point. Should we quickly discuss this over zoom?

CHANGELOG.rst

tests/integration/test_core_integration.py

tests/unit/test_core.py

mesmer/core/linear_regression.py

znicholls · 2021-12-06T00:25:52Z

I don't entirely get why you split your tests in integration and unit (?) tests. In my opinion your integration tests are also unit tests

The distinction I make (and it's pretty arbitrary so happy to go with a different convention for MESMER) is that unit tests should never fail because of an external library i.e. they should only ever call things which have been mocked. Everything that is in integration actually calls the underlying sklearn package (so would fail if sklearn ever changed its API) while everything in unit mocks out sklearn so would pass even if sklearn completely re-wrote its API.

Should we quickly discuss this over zoom?

Happy to if you want. Also happy to just go with whatever your favourite convention is.

mathause

My first thought is to not distinguish between integration and unit tests and try to avoid mocks if possible. My second thought was - maybe I should give it a go so I am actually entitled to an opinion 😄

Can you point me to a test or two where you think it added a lot of benefit? E.g. from netcdf-scm?

CHANGELOG.rst

tests/unit/test_core.py

znicholls · 2021-12-07T02:15:11Z

Can you point me to a test or two where you think it added a lot of benefit?

An obvious one is here where you can test processing of responses without needing the internet https://gitlab.com/netcdf-scm/netcdf-scm/-/blob/master/tests/unit/test_citing.py#L176

This https://gitlab.com/netcdf-scm/netcdf-scm/-/blob/master/tests/unit/test_stitching.py#L232 allowed me to do a bunch of recursive path checking without actually needing to setup paths.

The other big advantage of mocking is that it forces you to write small functions. If it's not easy to mock, there's probably something wrong with your code.

mathause · 2021-12-07T13:57:43Z

Thanks for the examples. I definitely agree with mocking web requests and can also see merit for mocking glob. Otherwise I am not so sure

If it's not easy to mock, there's probably something wrong with your code.

I would rather say - if you have to use mock there's probably something wrong with your code 😉 E.g. instead of mocking glob I would try to write a pure function that takes a list of files as input, which you can then test to your heart's desire.

def get_parent_file_path(path):
    files = glob.glob(path)
    return _get_parent_file_path(files)

def _get_parent_file_path(files):
    # do stuff
    return parent

znicholls · 2021-12-07T23:54:20Z

I would rather say - if you have to use mock there's probably something wrong with your code 😉 E.g. instead of mocking glob I would try to write a pure function that takes a list of files as input, which you can then test to your heart's desire.

That's a good example. In that case I would say that you test _get_parent_file_path however you want. Nonetheless, you still have the issue of how to test get_parent_file_path. In my experience there are two options:

Test get_parent_file_path without mocking. In that case you are implicitly testing _get_parent_file_path at the same time.
Test get_parent_file_path with mocking i.e. make sure that it calls glob and then passes the output to _get_parent_file_path. Test _get_parent_file_path separately to make sure it does the right thing with a list of file paths.

If there's only two layers the first one is almost always fine. However, if you have a command-line interface, which triggers off a whole bunch of other processes (say 50) then it quickly becomes impossible to test all the paths through all the different layers. In my experience, getting mocking into the testing early makes it much simpler to build the tests and be confident that all paths are covered. If I know the two paths through my first function work correctly, and the two paths through the second function work, and the two paths through my last function work, then I can conclude that all 8 combinations of paths will behave. However if I test the first function without mocking, I have to explicitly test that all 8 paths will behave correctly in order to be sure that this function is actually hooked up correctly. You can imagine that this quickly fails once it's 10 layers rather than 3. As an example of where I made a complete mess of this and wish I had used mocking instead (and written my code better), see this horrendous test (https://github.com/openscm/scmdata/blob/292d71619872744227072c0b094b300e0c761c8b/tests/integration/test_processing.py#L1007) which is way too long and ends up being run ~500 times to cover all the combinations (related issue is here openscm/scmdata#178).

My two cents, let's discuss later.

znicholls · 2021-12-08T10:30:35Z

@mathause the latest commits hopefully clean things up a bit. I've made #118 for when I get to writing docs.

mathause

Thanks for the PR and the discussion! Feel free to merge anytime.

znicholls added 3 commits November 30, 2021 11:39

Add basic unit tests

e06e876

Pass unit tests

b901cc2

Add integration tests

1ff147b

znicholls added 2 commits November 30, 2021 13:02

Format

d71e223

CHANGELOG

ba93cc7

znicholls requested a review from mathause November 30, 2021 02:04

mathause reviewed Dec 1, 2021

View reviewed changes

mathause reviewed Dec 3, 2021

View reviewed changes

mesmer/core/linear_regression.py Outdated Show resolved Hide resolved

znicholls added 3 commits December 6, 2021 10:58

Fix use of atol

31cc23a

Switch to 2d fitting

869a7df

Fix unit tests

481ea14

znicholls added 2 commits December 6, 2021 11:35

Add tests from @mathause

86ef188

Add extra tests of shape

23cece3

mathause reviewed Dec 6, 2021

View reviewed changes

CHANGELOG.rst Show resolved Hide resolved

tests/unit/test_core.py Outdated Show resolved Hide resolved

znicholls added 3 commits December 8, 2021 21:24

Move mislabelled tests

37fc3ce

Move unit tests into integration tests folder

5f184c7

Rename file

67becac

mathause approved these changes Dec 8, 2021

View reviewed changes

mathause mentioned this pull request Dec 8, 2021

xarray wrapper for linear_regression #119

Closed

znicholls merged commit df6dba2 into master Dec 8, 2021

znicholls deleted the linear-regression branch December 8, 2021 23:54

mathause mentioned this pull request Jun 10, 2022

split out statistical functions #149

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear regression #116

Linear regression #116

znicholls commented Nov 30, 2021 •

edited

Loading

codecov-commenter commented Nov 30, 2021 •

edited

Loading

mathause left a comment

znicholls commented Dec 6, 2021

mathause left a comment •

edited

Loading

znicholls commented Dec 7, 2021

mathause commented Dec 7, 2021

znicholls commented Dec 7, 2021 •

edited

Loading

znicholls commented Dec 8, 2021

mathause left a comment

Linear regression #116

Linear regression #116

Conversation

znicholls commented Nov 30, 2021 • edited Loading

codecov-commenter commented Nov 30, 2021 • edited Loading

Codecov Report

mathause left a comment

Choose a reason for hiding this comment

znicholls commented Dec 6, 2021

mathause left a comment • edited Loading

Choose a reason for hiding this comment

znicholls commented Dec 7, 2021

mathause commented Dec 7, 2021

znicholls commented Dec 7, 2021 • edited Loading

znicholls commented Dec 8, 2021

mathause left a comment

Choose a reason for hiding this comment

znicholls commented Nov 30, 2021 •

edited

Loading

codecov-commenter commented Nov 30, 2021 •

edited

Loading

mathause left a comment •

edited

Loading

znicholls commented Dec 7, 2021 •

edited

Loading