Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear_regression: add fit_intercept argument #144

Merged

Conversation

mathause
Copy link
Member

This PR adds fit_intercept to mesmer.core.linear_regression.

train_gt_ic_OLSVOLC uses fit_intercept=False therefore this needs to be added to _fit_linear_regression_xr before we can refactor this code path.

linreg_gv_volc = LinearRegression(fit_intercept=False).fit(


Open question - how should intercept be saved when fit_intercept=False?

  1. Omit intercept from the result
  2. As scalar (this is what sklearn does)
  3. The same shape as the predictors (this is the current solution)

See details for an example.

(1) would look like this:

<xarray.Dataset>
Dimensions:        (cell: 3)
Data variables:
    tas            (cell) int64 0 1 2
    fit_intercept  bool False

(2) would look like this:

<xarray.Dataset>
Dimensions:        (cell: 3)
Data variables:
    intercept      int64 0
    tas            (cell) int64 0 1 2
    fit_intercept  bool False

(3) would look like this:

<xarray.Dataset>
Dimensions:        (cell: 3)
Data variables:
    intercept      int64 0 0 0
    tas            (cell) int64 0 1 2
    fit_intercept  bool False

I went for 3 because it's easiest. I would probably go for (2) if it did not involve changing a ton of tests.


Remark: I use LinearRegression().fit(..., fit_intercept=False) while sklearn uses LinearRegression(fit_intercept=False).fit(...). I think this makes more sense (here) because I want to be able to do res = LinearRegression(), res.params = params (i.e., assign the params).

cc @znicholls

@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2022

Codecov Report

Merging #144 (cd5d69f) into master (cc766d4) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #144      +/-   ##
==========================================
+ Coverage   79.19%   79.25%   +0.05%     
==========================================
  Files          30       30              
  Lines        1389     1393       +4     
==========================================
+ Hits         1100     1104       +4     
  Misses        289      289              
Flag Coverage Δ
unittests 79.25% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mesmer/core/linear_regression.py 90.78% <100.00%> (+0.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc766d4...cd5d69f. Read the comment docs.

@znicholls
Copy link
Collaborator

I went for 3 because it's easiest

+1 for me, also because then you have the same shape array as when fit_intercept is True which just makes things simpler downstream.

I think this makes more sense (here) because I want to be able to do res = LinearRegression(), res.params = params (i.e., assign the params).

Agree

Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice (not sure if more to come but I took a quick look)

@mathause mathause merged commit 5944564 into MESMER-group:master May 18, 2022
@mathause mathause deleted the linear_regression_fit_intercept branch May 18, 2022 08:40
@mathause mathause mentioned this pull request Jun 10, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants