Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inplace refactor train_lv.py #167

Merged
merged 3 commits into from
Jul 15, 2022
Merged

Conversation

mathause
Copy link
Member

This is a small refactoring of train_lv_AR1_sci and train_lv_find_localized_ecov before actually doing #153. Helps me understand what happens & should make it slighly faster.

@mathause mathause marked this pull request as draft June 22, 2022 12:05
@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2022

Codecov Report

Merging #167 (940e5b1) into master (0c5b265) will decrease coverage by 0.01%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
- Coverage   79.23%   79.22%   -0.02%     
==========================================
  Files          30       30              
  Lines        1387     1386       -1     
==========================================
- Hits         1099     1098       -1     
  Misses        288      288              
Flag Coverage Δ
unittests 79.22% <94.11%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mesmer/calibrate_mesmer/train_lv.py 80.35% <94.11%> (-0.52%) ⬇️
mesmer/core/auto_regression.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c5b265...940e5b1. Read the comment docs.

Copy link
Member Author

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leabeusch do you want to take a look at this because I anyway have some questions for you?

Comment on lines 256 to 258
# ATTENTION: STILL NEED TO CHECK IF THIS IS TRUE.
# THIS FORMULA IS DIFFERENT IN THE ESD PAPER! (coef not squared)
# But I am pretty sure that code is correct and the error is in the paper)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leabeusch questions:

  1. Will I find the answer to this in Cressie and Wikle 2011?
  2. Given this is the 'analytic formula for an AR1' process - does this mean it is not possible to use an AR(2) process for the local variability (unless the formula is adapted)?
  3. Do I read this correctly: loc_ecov_AR1_innovs is the 'localized empirical spatial covariance matrix for the innovations of the AR 1 process'?
  4. Optional: Do you have better name for c below? Maybe I also find something in the book.

I also ping @yquilcaille who may be interested in the answer to (2).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. you can also check in Humphrey and Gudmundsson 2019 equation (13). As far as i know, in the ESD paper for MESMER, the squared was forgotten, the equation in the code is correct.
  2. I discussed this with Lukas, and it seems that there is no straightforward way to generalize this equation.
  3. Yes, I think that you got it right.
  4. Because the term c actually scales the correlations, but writing scale could be misleading, why not multip_ecov?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking quite some time to answer @mathause & thanks for getting these answers started @yquilcaille! I elaborate some more on them below:

  1. Oh no, I completely forgot to follow up on that! Good thing you’re going through this refactoring @mathause and uncover things I should have done months, maybe even years (?!) ago. 🙈 😅 As @yquilcaille correctly states: the squares got forgotten in the ESD paper & I should still publish a corrigendum for it (dealing with that has re-entered my TODO list now). I have the formula from Vincent (he used it for his reconstruction work that @yquilcaille also pointed to), I just wrote it down a bit more like code / less mathematical than he did. As far as I know, he originally found it in the Cressie & Wikle 2011 book. I just quickly glanced at it once & couldn’t find it right away, but Lukas reassured me that it must be inside, so I kept the reference nevertheless. Maybe you have more luck checking than I did? Or you directly ask Vincent? He’s back at the institute, no? (If you find the exact equation inside the book, please let me know where!)
  2. Exactly. I’m pretty sure Vincent found a (considerably) more complicated analytical for AR(2) somewhere as well & he may have even used it for some work? But I’m not aware of clean analytical solutions for higher order AR processes. Originally (i.e., including the preprint stage of the first MESMER method paper: https://doi.org/10.5194/esd-2019-34), I allowed for an up to AR(4) process in the local variability of MESMER. & since I had no analytical formula for it, I just computed out the covariance matrix of the innovations empirically after extracting the innovation terms from Equation 8 of the preprint (i.e., after extracting the ν). However, I when I delved more into AR-process-targeted verification for the review, I realized that the AR terms are quite noisy terms (e.g., Fig. 10 of the final paper: https://doi.org/10.5194/esd-11-139-2020) & that I gain more by just allowing for a single lag to be included & having an analytical solution for it, as this considerably increased the localization radii I obtained during cross validation & thus allowed me to reproduce further-reaching spatial cross-correlations between grid points.
  3. Yes, sounds & looks right to me.
  4. What’s your rationale for calling it c here? Generally, I have extremely negative feelings towards single letter variables without meaning. 😅 I think what it is, is the grid-point-specific reduction in variance when going from the full residual local variability (i.e., what is called η in the ESD paper) to the innovations (i.e., the spatially (but not temporally) correlated noise term that is called ν in the ESD paper). I.e., the innovations have a smaller variability than the full residual local variability because that one also contains the variability induced by the AR memory terms. -> maybe call it something along the lines of ar _induced _reduction_factor? But please feel free to come up with something much better if you can. I’m aware my suggestion isn’t great either. ^^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thanks - I will update the comment.
  2. Ok, this is good to know - will also add a comment.
  3. Thanks.
  4. No good reason to call it c but I needed some name.

@yquilcaille yquilcaille marked this pull request as ready for review July 1, 2022 14:34
Copy link
Collaborator

@yquilcaille yquilcaille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks ready for merge.

Copy link
Collaborator

@leabeusch leabeusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is actually helpful & not just an overwhelming amount of text. Let me know if you have other questions or would rather discuss it in person. & I don’t hit the approve button yet, because I hope you’ll still improve the naming of c & rewrite my ugly (although useful reminder) inline comment about the need to check if the formula is accurate (i.e., maybe just write that in the original paper it is wrong & a corrigendum should be published at some point?). 😉

Btw: I didn’t actually run the code but by trying to think through the way you rewrote the nested loop, it seemed to me that it should produce the same thing as the original code. I assume our tests are good enough that they would have picked up on it in case you (or better: we) would have a thought error there, correct? 😅

Comment on lines 256 to 258
# ATTENTION: STILL NEED TO CHECK IF THIS IS TRUE.
# THIS FORMULA IS DIFFERENT IN THE ESD PAPER! (coef not squared)
# But I am pretty sure that code is correct and the error is in the paper)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking quite some time to answer @mathause & thanks for getting these answers started @yquilcaille! I elaborate some more on them below:

  1. Oh no, I completely forgot to follow up on that! Good thing you’re going through this refactoring @mathause and uncover things I should have done months, maybe even years (?!) ago. 🙈 😅 As @yquilcaille correctly states: the squares got forgotten in the ESD paper & I should still publish a corrigendum for it (dealing with that has re-entered my TODO list now). I have the formula from Vincent (he used it for his reconstruction work that @yquilcaille also pointed to), I just wrote it down a bit more like code / less mathematical than he did. As far as I know, he originally found it in the Cressie & Wikle 2011 book. I just quickly glanced at it once & couldn’t find it right away, but Lukas reassured me that it must be inside, so I kept the reference nevertheless. Maybe you have more luck checking than I did? Or you directly ask Vincent? He’s back at the institute, no? (If you find the exact equation inside the book, please let me know where!)
  2. Exactly. I’m pretty sure Vincent found a (considerably) more complicated analytical for AR(2) somewhere as well & he may have even used it for some work? But I’m not aware of clean analytical solutions for higher order AR processes. Originally (i.e., including the preprint stage of the first MESMER method paper: https://doi.org/10.5194/esd-2019-34), I allowed for an up to AR(4) process in the local variability of MESMER. & since I had no analytical formula for it, I just computed out the covariance matrix of the innovations empirically after extracting the innovation terms from Equation 8 of the preprint (i.e., after extracting the ν). However, I when I delved more into AR-process-targeted verification for the review, I realized that the AR terms are quite noisy terms (e.g., Fig. 10 of the final paper: https://doi.org/10.5194/esd-11-139-2020) & that I gain more by just allowing for a single lag to be included & having an analytical solution for it, as this considerably increased the localization radii I obtained during cross validation & thus allowed me to reproduce further-reaching spatial cross-correlations between grid points.
  3. Yes, sounds & looks right to me.
  4. What’s your rationale for calling it c here? Generally, I have extremely negative feelings towards single letter variables without meaning. 😅 I think what it is, is the grid-point-specific reduction in variance when going from the full residual local variability (i.e., what is called η in the ESD paper) to the innovations (i.e., the spatially (but not temporally) correlated noise term that is called ν in the ESD paper). I.e., the innovations have a smaller variability than the full residual local variability because that one also contains the variability induced by the AR memory terms. -> maybe call it something along the lines of ar _induced _reduction_factor? But please feel free to come up with something much better if you can. I’m aware my suggestion isn’t great either. ^^

@@ -306,31 +303,30 @@ def train_lv_find_localized_ecov(y, wgt_scen_eq, aux, cfg):

"""

return _find_localized_empirical_covariance(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the rationale for introducing this private function here? Just making it easier for later refactoring by knowing which parts of aux & cfg are actually needed by this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the refactored function should eventually take 'simpler' input parameters. No compelling reason to do it here.

if llh_cv_sum[L] > llh_max:
L_sel = L
llh_max = llh_cv_sum[L]
print("Newly selected L =", L_sel)
else:
print("Final selected L =", L_sel)
idx_break = True
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha good to know there’s actually python functionality that does exactly what I wanted to do here. Fingers crossed, I’ll remember it next time I need it. ^^

Copy link
Member Author

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments - I will update accordingly.

@@ -306,31 +303,30 @@ def train_lv_find_localized_ecov(y, wgt_scen_eq, aux, cfg):

"""

return _find_localized_empirical_covariance(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the refactored function should eventually take 'simpler' input parameters. No compelling reason to do it here.

Comment on lines 256 to 258
# ATTENTION: STILL NEED TO CHECK IF THIS IS TRUE.
# THIS FORMULA IS DIFFERENT IN THE ESD PAPER! (coef not squared)
# But I am pretty sure that code is correct and the error is in the paper)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thanks - I will update the comment.
  2. Ok, this is good to know - will also add a comment.
  3. Thanks.
  4. No good reason to call it c but I needed some name.

@mathause
Copy link
Member Author

I extracted adjusting the empirical covariance matrix to it's own function and tried to add the discussed comments in the docstring. I am not yet 100 % happy but suggest to leave it at the moment in order to move forward. Any objections?

@mathause mathause merged commit 723eab5 into MESMER-group:main Jul 15, 2022
@mathause mathause deleted the refactor_train_lv branch July 15, 2022 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants