-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub Issue NOAA-EMC/GSI#539 Add options to tune cross-scale/variable/time covariances in EnVar #542
GitHub Issue NOAA-EMC/GSI#539 Add options to tune cross-scale/variable/time covariances in EnVar #542
Conversation
Is it possible to ask @TingLei-NOAA and @CatherineThomas-NOAA to review this PR? I think they are well-qualified because they are major contributers of #460 and #504. |
integer(i_kind) :: i_ensloccov4tim=0 | ||
integer(i_kind) :: i_ensloccov4var=0 | ||
integer(i_kind) :: i_ensloccov4scl=0 | ||
real(r_kind) :: r_ensloccov4tim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyokota , why are those options changed to real numbers? When they are supposed to be used as real numbers in some situations , we can always use real function to convert integer to real version. But their basic usages are used as options with many equality evaluations. While equality for real numbers are risky or at least not be encouraged to be used. So, any compelling reasons for this change ?
src/gsi/gsimod.F90
Outdated
! i_ensloccov4scl - flag of cross-scale localization | ||
! =0: cross-scale covariance is retained | ||
! =1: cross-scale covariance is zero | ||
! r_ensloccov4tim - factor multiplying to cross-temporal localization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, those documentation should be kept with any updates if needed. Here in gsimod.f90 is a good place to give explanation/documentation to those namelist parameters.
src/gsi/apply_scaledepwgts.f90
Outdated
@@ -68,7 +68,7 @@ subroutine init_mult_spc_wgts(jcap_in) | |||
integer(i_kind) :: l_sum_spc_weights | |||
|
|||
! Spectral scale decomposition is differernt between SDL-cross and SDL-nocross | |||
if( i_ensloccov4scl == 1 )then | |||
if( r_ensloccov4scl == zero )then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one example when equality evaluation is used for the option in the real number. As mentioned in another comment, this kind of use of real numbers are not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out the issue of equality evaluation. This issue occurs only here, so I modified "if( r_ensloccov4scl == zero )" to "if( r_ensloccov4scl < tiny_r_kind )". It is required to change these parameters from integer to real for tuning cross-scale/variable/time covariances flexibly. This is the main objective of this PR.
241db3d
to
38853c2
Compare
A general comment about using different names for those sdl related options, when developers are using scripts with the original option names, using GSI with this PR means those scripts won't work and we lose the important backward compatibility. |
ensloccov4var=one | ||
elseif (i_ensloccov4var==1)then | ||
ensloccov4var=zero | ||
if (r_ensloccov4var>=zero.and.r_ensloccov4var<=one) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The equality issue also exist for those ">= " and " <=". what will be the results when r_ensloccov4var = zero or one?
@TingLei-NOAA I changed the definition of these parameters in this PR. For example, previous i_ensloccov4scl=0 coresponds to r_ensloccov4scl=1 in this PR. So keeping only names of these parameters confuses developers. Or, are you suggesting not to remove i_ensloccov4{scl,var,tim} and just to add r_ensloccov4{scl,var,tim}? |
@shoyokota Thanks for keeping improving the codes. However, as I mentioned earlier, it is also important to keep the backward compatibility. That will be preferable if your changes could make it clearer as you want while won't let the current scripts fail. |
We have two choices. I don't come up with another choice. (i) remove i_ensloccov4{scl,var,tim} and add r_ensloccov4{scl,var,tim} (my current coding for this PR) If we should keep the backward compatibility and resolve the equality issue, I can code based on (ii). But if it is not necessarily required to resolve these issues, I think (i) is simpler and easier to understand the coding for developers. I will choose either based on the opinions of @TingLei-NOAA and @CatherineThomas-NOAA. Which is your preferable? |
@TingLei-NOAA @shoyokota While I understand the general importance of backwards compatibility, these changes are not in operations yet, so I think that now is the time to make a change before it becomes operational. |
@CatherineThomas-NOAA @shoyokota I prefer the option 2 to keep the backward compatibility . Though they are not in operational runs yet, RRFS developers at EMC/GSL have begun to use those scripts with verified setup using those "old" options. Those scripts are being "populated" among developers who are interested to use those functions. Hence, I don't think it is a good plan for developers to learn new set up standard. For the confusion in the current options names, more documentation could help. In fact, I have been expecting a longer doc on SDL/VDL options for GSI users/developers. |
I think now it is a "prone to errors" period for the community to apply the recent major developments about SDL/VDL. Those scripts from those PR submitters including @shoyokota are good starting examples for developers to test and use in their own systems like RRFS. A different setup of options will not help with this "learning and transition" process. |
I agree with @CatherineThomas-NOAA that option (i) is more intuitive. This makes the code more readable. However, as @TingLei-NOAA notes, we need to be careful when using floating point numbers in logical tests including equality. If we stick with option (i) we should consider revising logical tests which include equality to account for precision issues with floating point numbers (e.g., line 5606 of If we go with option (ii), we should consider replacing integer variable
When I'm not a reviewer on this PR so feel free to ignore my comments. |
@RussTreadon-NOAA Thanks for your rather complete analysis on this issue! |
Backward compatibility may not be important in the context of simplifying codes because this function has not yet been actually tested and used in RRFS's real-time parallel. However, I do understand it might create the additional effort for the current retro tests. Since this will not change the solution, the evaluation result will still be valid for the decision making of testing this in real time parallel. |
38853c2
to
bd4cf81
Compare
Thank you for discussing about options (i) and (ii). For now, I removed the part of "if ... else ..." for r_ensloccov4* in src/gsi/hybrid_ensemble_isotropic.F90 because it is not necessarily required. Now the equality issue is resolved even in the option (i). I think the remaining issue of the option (i) is only backward compatibility. I will select either options based on the discussion of backward compatibility by email. |
@shoyokota and @TingLei-NOAA Do you get enough feedback to make decision about keeping "backward compatibility"? |
@ShunLiu-NOAA The collaborators invited to the off-line discussion on this issue need more time to give their opinions. |
@TingLei-NOAA Any feedback from GSL? @hu5970 , @daviddowellNOAA , @chunhuazhou and, please let us know if you want to keep backward compatibility option, otherwise, we will proceed to approve this PR. |
Thanks to feedbacks we got. Developers including at GSL who are using or potentially to test/apply SDL functions are open to accommodate changes in the parameter names brought in by this PR and Sho could help in this process. Hence, @shoyokota , please make sure you take care of the suggestions in Russ's point 1 and, then, we can go ahead. |
@TingLei-NOAA @RussTreadon-NOAA I have already resolved Russ's concern for (i). (Please see here.) If this modification is not sufficient, could you let me know? Thanks. |
@shoyokota , a conflict with |
src/gsi/gsimod.F90
Outdated
! i_ensloccov4scl - flag of cross-scale localization | ||
! =0: cross-scale covariance is retained | ||
! =1: cross-scale covariance is zero | ||
! r_ensloccov4tim - factor multiplying to cross-temporal localization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyokota could you give a documentation on how the new parameter is used to control the sdl setup in place of previous parameters?
bd4cf81
to
dda2524
Compare
Thank you, @shoyokota , for resolving the conflict. |
@shoyokota Would you please add a concise documentation to where the previous doc about old parameter was? |
…e/time covariances in EnVar
dda2524
to
036cf39
Compare
@TingLei-NOAA I added documentation of the setting example of "r_ensloccov4{scl,var,tim}" in |
@CatherineThomas-NOAA Could you please check to see if the changes meet your needs? |
@ShunLiu-NOAA I've tested the SDL and it reproduces my previous tests. I just need to check the VDL and TDL, which I am working on today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have finished my global tests comparing the weighted scale/time/variable dependent localization to the original formulation. See figures here.
The increments and cost functions are as expected. The code looks good to me as well.
Thank you @TingLei-NOAA for coordinating with GSL on the backwards compatibility question.
@TingLei-NOAA @CatherineThomas-NOAA Thank you for reviewing and testing the changes. |
@TingLei-NOAA Could you please make a regression test on HERA or WCOSS2? |
@ShunLiu-NOAA My regression test on Orion was as follows.
It is impossible for me to pass the tests of 1-3. The other tests were passed except for non-essential failures (5,6,8,9) that are not related to this PR. |
@hu5970 Ming, Could you do a regression test on Orion? |
Yes, I will run regression test suite today. |
1/9 Test #1: [=[global_3dvar]=] ...............***Failed 300.86 sec 44% tests passed, 5 tests failed out of 9 Total Test time (real) = 1801.57 sec The following tests FAILED: Will rerun failed cases. |
Try three Orion failed cases again: Check the regression test results: There are not major failure on Orion. We can consider regression test passed on Orion. |
Update on the regression tests on hera: All tests passed. It happened again that I had to manually add some missing fix files like rrfs_glb_berror.l127y770.f77, though I am sure some specific order of "git submoduel" commands and the "git checkout " commands would avoid such an issue as I did with another previous PR. However, the "vulnerability " to such an issue is concerning, though I speculate this issue is maybe not related to this PR since this happened previously with other PR. |
@hu5970 and @TingLei-NOAA Thank you for regression test on Orion and Hera. Since regression test passed on the both machine, we can merge this PR to develop. |
@TingLei-NOAA , I executed the following on Orion
The first
This is is correct. The
The top level The cmake build prints a runtime message stating that GSI binary files files are being copied from the staged location
After
|
@RussTreadon-NOAA Following your steps, I did some extra "playing around" with this "missing fix files issue" . it seems from the EMC GSI , no problem would occur no matter if I did "git submodule update" before or after building. Seems the issue is specific to the fork involved in this PR. For example, the rrfs fix files are missing when git clone this PR , while it exists in the EMC GSI. But, since it will not affect the EMC GSI repository, I think we can stop here satisfied that EMC GSI does work well. |
@TingLei-NOAA , thank you for the update. I find nothing wrong with the fix submodule associated with The three step sequence
returns a I am unable to reproduce the missing fix file problem you report. I agree with you. It's time to move forward since this PR has been merged into NOAA-EMC/GSI |
@RussTreadon-NOAA . You can see /scratch2/NCEPDEV/fv3-cam/Ting.Lei/dr-pr/dr-test/GSI/fix/list.txt_after_submodule_update, which lists the fix files before I ran build.sh . and you can compare with https://github.com/shoyokota/GSI.git after git submodule update. In the latter, several rrfs txt files like (anavinfo.rrfs ..) are missing (only rrfs_glb_berror.l127y770.f77 exists). I didn't try to dig more on Rcov* binary files issue this time, which is supposed to be copied in build.sh (while, before, I found those files would disappear after once copied previously). |
And, after checking out the branch with this PR : feature/PR_NOAA-EMC_EnVar-SDL2. Did "git submodule sync" and "git submodule update", It would be found the "anavinfo.rrfs convinfo.rrfs errtable.rrfs* hybens_info.rrfs" would exist but the previous "rrfs_glb_berror.l127y770.f77" would disappear. I think this is all about the appropriate git managing of the fix dir (submodule). But, since it is not an issue with the EMC GSI. I think I would be satisfied with the current situation. And I will be happy to learn if I missed anything on this problem. |
@TingLei-NOAA ,
Are these five rrfs files you what you expect to see? I did the following on Hera
A listing of fix shows that only
This makes sense because
Sho's PR is for his forked branch
The
Since the fix submodule points at a different repo, NOAA-EMC/GSI-fix, and not gerrit:GSI-fix, we need to resync the fix url via
A check of rrfs files in
The above listing only includes four rrfs ASCII GSI fix files. This is correct. NOAA-EMC/GSI-fix only contains ASCII GSI fix files. EIB stages binary GSI fix files on disk. Binary GSI fix files are pulled into After executing
This agrees with the listing you provided in file Sho's branch |
@RussTreadon-NOAA Thanks for your digging. So, it seems , after I finished all building in the feature/PR_NOAA-EMC_EnVar-SDL2, if I switched to develop and did " git submodule sync/update" (for some reasons) and, all rrfs txt files and binary fix files copied in build would be removed , even after I switched back to feature/PR_NOAA-EMC_EnVar-SDL2 branch. I think that was what happened in my case. I just repeated the above steps and confirmed this . |
…e/time covariances in EnVar (#542) This PR modifies options (i_ensloccov4{scl,var,tim} -> r_ensloccov4{scl,var,tim}) to tune cross-scale/variable/time covariances in EnVar (NOAA-EMC/GSI#539). Regression tests for global 3dvar/4denvar/4dvar are not completed yet, but for other tests, issues are not found except for "failed the scalability test" and "exceeded maximum allowable hardware memory limit" on Orion. Fixes #539 Co-authored-by: Sho Yokota <syokota@Orion-login-1.HPC.MsState.Edu>
This PR modifies options (i_ensloccov4{scl,var,tim} -> r_ensloccov4{scl,var,tim}) to tune cross-scale/variable/time covariances in EnVar (#539). Regression tests for global 3dvar/4denvar/4dvar are not completed yet, but for other tests, issues are not found except for "failed the scalability test" and "exceeded maximum allowable hardware memory limit" on Orion.
Fixes #539