-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inplace refactor train_lv.py #167
Conversation
Codecov Report
@@ Coverage Diff @@
## master #167 +/- ##
==========================================
- Coverage 79.23% 79.22% -0.02%
==========================================
Files 30 30
Lines 1387 1386 -1
==========================================
- Hits 1099 1098 -1
Misses 288 288
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leabeusch do you want to take a look at this because I anyway have some questions for you?
mesmer/calibrate_mesmer/train_lv.py
Outdated
# ATTENTION: STILL NEED TO CHECK IF THIS IS TRUE. | ||
# THIS FORMULA IS DIFFERENT IN THE ESD PAPER! (coef not squared) | ||
# But I am pretty sure that code is correct and the error is in the paper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leabeusch questions:
- Will I find the answer to this in Cressie and Wikle 2011?
- Given this is the 'analytic formula for an AR1' process - does this mean it is not possible to use an AR(2) process for the local variability (unless the formula is adapted)?
- Do I read this correctly:
loc_ecov_AR1_innovs
is the 'localized empirical spatial covariance matrix for the innovations of the AR 1 process'? - Optional: Do you have better name for
c
below? Maybe I also find something in the book.
I also ping @yquilcaille who may be interested in the answer to (2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you can also check in Humphrey and Gudmundsson 2019 equation (13). As far as i know, in the ESD paper for MESMER, the squared was forgotten, the equation in the code is correct.
- I discussed this with Lukas, and it seems that there is no straightforward way to generalize this equation.
- Yes, I think that you got it right.
- Because the term
c
actually scales the correlations, but writing scale could be misleading, why notmultip_ecov
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking quite some time to answer @mathause & thanks for getting these answers started @yquilcaille! I elaborate some more on them below:
- Oh no, I completely forgot to follow up on that! Good thing you’re going through this refactoring @mathause and uncover things I should have done months, maybe even years (?!) ago. 🙈 😅 As @yquilcaille correctly states: the squares got forgotten in the ESD paper & I should still publish a corrigendum for it (dealing with that has re-entered my TODO list now). I have the formula from Vincent (he used it for his reconstruction work that @yquilcaille also pointed to), I just wrote it down a bit more like code / less mathematical than he did. As far as I know, he originally found it in the Cressie & Wikle 2011 book. I just quickly glanced at it once & couldn’t find it right away, but Lukas reassured me that it must be inside, so I kept the reference nevertheless. Maybe you have more luck checking than I did? Or you directly ask Vincent? He’s back at the institute, no? (If you find the exact equation inside the book, please let me know where!)
- Exactly. I’m pretty sure Vincent found a (considerably) more complicated analytical for AR(2) somewhere as well & he may have even used it for some work? But I’m not aware of clean analytical solutions for higher order AR processes. Originally (i.e., including the preprint stage of the first MESMER method paper: https://doi.org/10.5194/esd-2019-34), I allowed for an up to AR(4) process in the local variability of MESMER. & since I had no analytical formula for it, I just computed out the covariance matrix of the innovations empirically after extracting the innovation terms from Equation 8 of the preprint (i.e., after extracting the ν). However, I when I delved more into AR-process-targeted verification for the review, I realized that the AR terms are quite noisy terms (e.g., Fig. 10 of the final paper: https://doi.org/10.5194/esd-11-139-2020) & that I gain more by just allowing for a single lag to be included & having an analytical solution for it, as this considerably increased the localization radii I obtained during cross validation & thus allowed me to reproduce further-reaching spatial cross-correlations between grid points.
- Yes, sounds & looks right to me.
- What’s your rationale for calling it
c
here? Generally, I have extremely negative feelings towards single letter variables without meaning. 😅 I think what it is, is the grid-point-specific reduction in variance when going from the full residual local variability (i.e., what is called η in the ESD paper) to the innovations (i.e., the spatially (but not temporally) correlated noise term that is called ν in the ESD paper). I.e., the innovations have a smaller variability than the full residual local variability because that one also contains the variability induced by the AR memory terms. -> maybe call it something along the lines ofar _induced _reduction_factor
? But please feel free to come up with something much better if you can. I’m aware my suggestion isn’t great either. ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Thanks - I will update the comment.
- Ok, this is good to know - will also add a comment.
- Thanks.
- No good reason to call it
c
but I needed some name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks ready for merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this is actually helpful & not just an overwhelming amount of text. Let me know if you have other questions or would rather discuss it in person. & I don’t hit the approve button yet, because I hope you’ll still improve the naming of c
& rewrite my ugly (although useful reminder) inline comment about the need to check if the formula is accurate (i.e., maybe just write that in the original paper it is wrong & a corrigendum should be published at some point?). 😉
Btw: I didn’t actually run the code but by trying to think through the way you rewrote the nested loop, it seemed to me that it should produce the same thing as the original code. I assume our tests are good enough that they would have picked up on it in case you (or better: we) would have a thought error there, correct? 😅
mesmer/calibrate_mesmer/train_lv.py
Outdated
# ATTENTION: STILL NEED TO CHECK IF THIS IS TRUE. | ||
# THIS FORMULA IS DIFFERENT IN THE ESD PAPER! (coef not squared) | ||
# But I am pretty sure that code is correct and the error is in the paper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking quite some time to answer @mathause & thanks for getting these answers started @yquilcaille! I elaborate some more on them below:
- Oh no, I completely forgot to follow up on that! Good thing you’re going through this refactoring @mathause and uncover things I should have done months, maybe even years (?!) ago. 🙈 😅 As @yquilcaille correctly states: the squares got forgotten in the ESD paper & I should still publish a corrigendum for it (dealing with that has re-entered my TODO list now). I have the formula from Vincent (he used it for his reconstruction work that @yquilcaille also pointed to), I just wrote it down a bit more like code / less mathematical than he did. As far as I know, he originally found it in the Cressie & Wikle 2011 book. I just quickly glanced at it once & couldn’t find it right away, but Lukas reassured me that it must be inside, so I kept the reference nevertheless. Maybe you have more luck checking than I did? Or you directly ask Vincent? He’s back at the institute, no? (If you find the exact equation inside the book, please let me know where!)
- Exactly. I’m pretty sure Vincent found a (considerably) more complicated analytical for AR(2) somewhere as well & he may have even used it for some work? But I’m not aware of clean analytical solutions for higher order AR processes. Originally (i.e., including the preprint stage of the first MESMER method paper: https://doi.org/10.5194/esd-2019-34), I allowed for an up to AR(4) process in the local variability of MESMER. & since I had no analytical formula for it, I just computed out the covariance matrix of the innovations empirically after extracting the innovation terms from Equation 8 of the preprint (i.e., after extracting the ν). However, I when I delved more into AR-process-targeted verification for the review, I realized that the AR terms are quite noisy terms (e.g., Fig. 10 of the final paper: https://doi.org/10.5194/esd-11-139-2020) & that I gain more by just allowing for a single lag to be included & having an analytical solution for it, as this considerably increased the localization radii I obtained during cross validation & thus allowed me to reproduce further-reaching spatial cross-correlations between grid points.
- Yes, sounds & looks right to me.
- What’s your rationale for calling it
c
here? Generally, I have extremely negative feelings towards single letter variables without meaning. 😅 I think what it is, is the grid-point-specific reduction in variance when going from the full residual local variability (i.e., what is called η in the ESD paper) to the innovations (i.e., the spatially (but not temporally) correlated noise term that is called ν in the ESD paper). I.e., the innovations have a smaller variability than the full residual local variability because that one also contains the variability induced by the AR memory terms. -> maybe call it something along the lines ofar _induced _reduction_factor
? But please feel free to come up with something much better if you can. I’m aware my suggestion isn’t great either. ^^
@@ -306,31 +303,30 @@ def train_lv_find_localized_ecov(y, wgt_scen_eq, aux, cfg): | |||
|
|||
""" | |||
|
|||
return _find_localized_empirical_covariance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the rationale for introducing this private function here? Just making it easier for later refactoring by knowing which parts of aux
& cfg
are actually needed by this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the refactored function should eventually take 'simpler' input parameters. No compelling reason to do it here.
if llh_cv_sum[L] > llh_max: | ||
L_sel = L | ||
llh_max = llh_cv_sum[L] | ||
print("Newly selected L =", L_sel) | ||
else: | ||
print("Final selected L =", L_sel) | ||
idx_break = True | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha good to know there’s actually python functionality that does exactly what I wanted to do here. Fingers crossed, I’ll remember it next time I need it. ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments - I will update accordingly.
@@ -306,31 +303,30 @@ def train_lv_find_localized_ecov(y, wgt_scen_eq, aux, cfg): | |||
|
|||
""" | |||
|
|||
return _find_localized_empirical_covariance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the refactored function should eventually take 'simpler' input parameters. No compelling reason to do it here.
mesmer/calibrate_mesmer/train_lv.py
Outdated
# ATTENTION: STILL NEED TO CHECK IF THIS IS TRUE. | ||
# THIS FORMULA IS DIFFERENT IN THE ESD PAPER! (coef not squared) | ||
# But I am pretty sure that code is correct and the error is in the paper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Thanks - I will update the comment.
- Ok, this is good to know - will also add a comment.
- Thanks.
- No good reason to call it
c
but I needed some name.
I extracted adjusting the empirical covariance matrix to it's own function and tried to add the discussed comments in the docstring. I am not yet 100 % happy but suggest to leave it at the moment in order to move forward. Any objections? |
isort . && black . && flake8
CHANGELOG.rst
This is a small refactoring of
train_lv_AR1_sci
andtrain_lv_find_localized_ecov
before actually doing #153. Helps me understand what happens & should make it slighly faster.