Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Issue NOAA-EMC/GSI#539 Add options to tune cross-scale/variable/time covariances in EnVar #542

Merged

Conversation

shoyokota
Copy link
Collaborator

@shoyokota shoyokota commented Feb 21, 2023

This PR modifies options (i_ensloccov4{scl,var,tim} -> r_ensloccov4{scl,var,tim}) to tune cross-scale/variable/time covariances in EnVar (#539). Regression tests for global 3dvar/4denvar/4dvar are not completed yet, but for other tests, issues are not found except for "failed the scalability test" and "exceeded maximum allowable hardware memory limit" on Orion.

Fixes #539

@shoyokota
Copy link
Collaborator Author

Is it possible to ask @TingLei-NOAA and @CatherineThomas-NOAA to review this PR? I think they are well-qualified because they are major contributers of #460 and #504.

integer(i_kind) :: i_ensloccov4tim=0
integer(i_kind) :: i_ensloccov4var=0
integer(i_kind) :: i_ensloccov4scl=0
real(r_kind) :: r_ensloccov4tim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyokota , why are those options changed to real numbers? When they are supposed to be used as real numbers in some situations , we can always use real function to convert integer to real version. But their basic usages are used as options with many equality evaluations. While equality for real numbers are risky or at least not be encouraged to be used. So, any compelling reasons for this change ?

! i_ensloccov4scl - flag of cross-scale localization
! =0: cross-scale covariance is retained
! =1: cross-scale covariance is zero
! r_ensloccov4tim - factor multiplying to cross-temporal localization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, those documentation should be kept with any updates if needed. Here in gsimod.f90 is a good place to give explanation/documentation to those namelist parameters.

@@ -68,7 +68,7 @@ subroutine init_mult_spc_wgts(jcap_in)
integer(i_kind) :: l_sum_spc_weights

! Spectral scale decomposition is differernt between SDL-cross and SDL-nocross
if( i_ensloccov4scl == 1 )then
if( r_ensloccov4scl == zero )then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one example when equality evaluation is used for the option in the real number. As mentioned in another comment, this kind of use of real numbers are not safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out the issue of equality evaluation. This issue occurs only here, so I modified "if( r_ensloccov4scl == zero )" to "if( r_ensloccov4scl < tiny_r_kind )". It is required to change these parameters from integer to real for tuning cross-scale/variable/time covariances flexibly. This is the main objective of this PR.

@shoyokota shoyokota force-pushed the feature/PR_NOAA-EMC_EnVar-SDL2 branch from 241db3d to 38853c2 Compare March 1, 2023 21:06
@TingLei-NOAA
Copy link
Contributor

TingLei-NOAA commented Mar 1, 2023

A general comment about using different names for those sdl related options, when developers are using scripts with the original option names, using GSI with this PR means those scripts won't work and we lose the important backward compatibility.
Hence, I suggest to go back to original parameter names.

ensloccov4var=one
elseif (i_ensloccov4var==1)then
ensloccov4var=zero
if (r_ensloccov4var>=zero.and.r_ensloccov4var<=one) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equality issue also exist for those ">= " and " <=". what will be the results when r_ensloccov4var = zero or one?

@shoyokota
Copy link
Collaborator Author

@TingLei-NOAA I changed the definition of these parameters in this PR. For example, previous i_ensloccov4scl=0 coresponds to r_ensloccov4scl=1 in this PR. So keeping only names of these parameters confuses developers. Or, are you suggesting not to remove i_ensloccov4{scl,var,tim} and just to add r_ensloccov4{scl,var,tim}?

@TingLei-NOAA
Copy link
Contributor

@shoyokota Thanks for keeping improving the codes. However, as I mentioned earlier, it is also important to keep the backward compatibility. That will be preferable if your changes could make it clearer as you want while won't let the current scripts fail.

@shoyokota
Copy link
Collaborator Author

We have two choices. I don't come up with another choice.

(i) remove i_ensloccov4{scl,var,tim} and add r_ensloccov4{scl,var,tim} (my current coding for this PR)
(ii) not remove i_ensloccov4{scl,var,tim} and add r_ensloccov4{scl,var,tim} as options of i_ensloccov4{scl,var,tim}=0

If we should keep the backward compatibility and resolve the equality issue, I can code based on (ii). But if it is not necessarily required to resolve these issues, I think (i) is simpler and easier to understand the coding for developers. I will choose either based on the opinions of @TingLei-NOAA and @CatherineThomas-NOAA. Which is your preferable?

@CatherineThomas-NOAA
Copy link
Collaborator

CatherineThomas-NOAA commented Mar 2, 2023

@TingLei-NOAA @shoyokota
I prefer (i). I think this PR simplifies the code, makes it more intuitive, all while adding a new feature (tuning the cross covariance). The previous convention was: i_* = 0 includes cross covariance. This PR: r_* = 0 zeroes out the cross covariance, which makes more sense to me.

While I understand the general importance of backwards compatibility, these changes are not in operations yet, so I think that now is the time to make a change before it becomes operational.

@TingLei-NOAA
Copy link
Contributor

@CatherineThomas-NOAA @shoyokota I prefer the option 2 to keep the backward compatibility . Though they are not in operational runs yet, RRFS developers at EMC/GSL have begun to use those scripts with verified setup using those "old" options. Those scripts are being "populated" among developers who are interested to use those functions. Hence, I don't think it is a good plan for developers to learn new set up standard. For the confusion in the current options names, more documentation could help. In fact, I have been expecting a longer doc on SDL/VDL options for GSI users/developers.

@TingLei-NOAA
Copy link
Contributor

I think now it is a "prone to errors" period for the community to apply the recent major developments about SDL/VDL. Those scripts from those PR submitters including @shoyokota are good starting examples for developers to test and use in their own systems like RRFS. A different setup of options will not help with this "learning and transition" process.

@RussTreadon-NOAA
Copy link
Contributor

I agree with @CatherineThomas-NOAA that option (i) is more intuitive. This makes the code more readable. However, as @TingLei-NOAA notes, we need to be careful when using floating point numbers in logical tests including equality.

If we stick with option (i) we should consider revising logical tests which include equality to account for precision issues with floating point numbers (e.g., line 5606 of src/gsi/hybrid_ensemble_isotropic.F90)

If we go with option (ii), we should consider replacing integer variable i_ensloccov4scl with a logical variable. The use of a logical, call it l_ensloccov4scl, yields more readable code. Logical variable l_ensloccov4scl would be set as follows:

  • l_ensloccov4scl=.true. corresponds to i_ensloccov4scl=0 which means to include cross covariance
  • l_ensloccov4scl=.false. corresponds to i_ensloccov4scl=1 which means to NOT include cross covariance

When l_ensloccov4scl=.true., floating point variable r_ensloccov4var is the factor multiplying the cross-scale covariance.

I'm not a reviewer on this PR so feel free to ignore my comments.

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA Thanks for your rather complete analysis on this issue!
I will initiate a off-line discussion (including participants here) by email to seek feedbacks from who have or are ready to use those functions as soon as possible. If most persons think those changes will not be a problem or its' benefits are larger than the resulted inconvenience, I will be happy to agree with option 1..

@ShunLiu-NOAA
Copy link
Contributor

Backward compatibility may not be important in the context of simplifying codes because this function has not yet been actually tested and used in RRFS's real-time parallel. However, I do understand it might create the additional effort for the current retro tests. Since this will not change the solution, the evaluation result will still be valid for the decision making of testing this in real time parallel.

@shoyokota shoyokota force-pushed the feature/PR_NOAA-EMC_EnVar-SDL2 branch from 38853c2 to bd4cf81 Compare March 6, 2023 16:54
@shoyokota
Copy link
Collaborator Author

Thank you for discussing about options (i) and (ii). For now, I removed the part of "if ... else ..." for r_ensloccov4* in src/gsi/hybrid_ensemble_isotropic.F90 because it is not necessarily required. Now the equality issue is resolved even in the option (i). I think the remaining issue of the option (i) is only backward compatibility. I will select either options based on the discussion of backward compatibility by email.

@ShunLiu-NOAA ShunLiu-NOAA self-requested a review March 9, 2023 15:12
@ShunLiu-NOAA
Copy link
Contributor

@shoyokota and @TingLei-NOAA Do you get enough feedback to make decision about keeping "backward compatibility"?

@TingLei-NOAA
Copy link
Contributor

@ShunLiu-NOAA The collaborators invited to the off-line discussion on this issue need more time to give their opinions.

@ShunLiu-NOAA
Copy link
Contributor

@TingLei-NOAA Any feedback from GSL?

@hu5970 , @daviddowellNOAA , @chunhuazhou and, please let us know if you want to keep backward compatibility option, otherwise, we will proceed to approve this PR.

@TingLei-NOAA
Copy link
Contributor

Thanks to feedbacks we got. Developers including at GSL who are using or potentially to test/apply SDL functions are open to accommodate changes in the parameter names brought in by this PR and Sho could help in this process. Hence, @shoyokota , please make sure you take care of the suggestions in Russ's point 1 and, then, we can go ahead.

@shoyokota
Copy link
Collaborator Author

@TingLei-NOAA @RussTreadon-NOAA I have already resolved Russ's concern for (i). (Please see here.) If this modification is not sufficient, could you let me know? Thanks.

@RussTreadon-NOAA
Copy link
Contributor

@shoyokota , a conflict with src/gsi/hybrid_ensemble_parameters.f90 needs to be resolved.

! i_ensloccov4scl - flag of cross-scale localization
! =0: cross-scale covariance is retained
! =1: cross-scale covariance is zero
! r_ensloccov4tim - factor multiplying to cross-temporal localization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyokota could you give a documentation on how the new parameter is used to control the sdl setup in place of previous parameters?

@shoyokota shoyokota force-pushed the feature/PR_NOAA-EMC_EnVar-SDL2 branch from bd4cf81 to dda2524 Compare March 22, 2023 20:39
@RussTreadon-NOAA
Copy link
Contributor

Thank you, @shoyokota , for resolving the conflict.

@TingLei-NOAA
Copy link
Contributor

@shoyokota Would you please add a concise documentation to where the previous doc about old parameter was?
Like "
image
" , except from your ppt.

@shoyokota shoyokota force-pushed the feature/PR_NOAA-EMC_EnVar-SDL2 branch from dda2524 to 036cf39 Compare March 24, 2023 22:04
@shoyokota
Copy link
Collaborator Author

@TingLei-NOAA I added documentation of the setting example of "r_ensloccov4{scl,var,tim}" in src/gsi/gsimod.F90 and in src/gsi/hybrid_ensemble_parameters.f90. Thanks.

@ShunLiu-NOAA
Copy link
Contributor

@CatherineThomas-NOAA Could you please check to see if the changes meet your needs?

@CatherineThomas-NOAA
Copy link
Collaborator

@ShunLiu-NOAA I've tested the SDL and it reproduces my previous tests. I just need to check the VDL and TDL, which I am working on today.

Copy link
Collaborator

@CatherineThomas-NOAA CatherineThomas-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished my global tests comparing the weighted scale/time/variable dependent localization to the original formulation. See figures here.

The increments and cost functions are as expected. The code looks good to me as well.

Thank you @TingLei-NOAA for coordinating with GSL on the backwards compatibility question.

@ShunLiu-NOAA
Copy link
Contributor

@TingLei-NOAA @CatherineThomas-NOAA Thank you for reviewing and testing the changes.

@ShunLiu-NOAA
Copy link
Contributor

@TingLei-NOAA Could you please make a regression test on HERA or WCOSS2?
@shoyokota Could you please make a regression test on Orion?

@shoyokota
Copy link
Collaborator Author

@ShunLiu-NOAA My regression test on Orion was as follows.

1/9 Test #1: global_3dvar .....................***Failed   83.88 sec
2/9 Test #2: global_4dvar .....................***Failed  173.09 sec
3/9 Test #3: global_4denvar ...................***Failed  146.22 sec
4/9 Test #4: hwrf_nmm_d2 ......................   Passed  677.82 sec
5/9 Test #5: hwrf_nmm_d3 ......................***Failed  740.92 sec
6/9 Test #6: rtma .............................***Failed  1273.05 sec
7/9 Test #7: rrfs_3denvar_glbens ..............   Passed  791.00 sec
8/9 Test #8: netcdf_fv3_regional ..............***Failed  604.80 sec
9/9 Test #9: global_enkf ......................***Failed  609.39 sec
  • The failure of 1-3: Because my access to some files belonging to group "rstprod" is not permitted.
  • The failure of 5-6: Because "The case has Failed the scalability test."
  • The failure of 8: Because "This has exceeded maximum allowable threshold time" and "This has exceeded maximum allowable hardware memory limit".
  • The failure of 9: Because "This has exceeded maximum allowable threshold time".

It is impossible for me to pass the tests of 1-3. The other tests were passed except for non-essential failures (5,6,8,9) that are not related to this PR.

@ShunLiu-NOAA
Copy link
Contributor

@hu5970 Ming, Could you do a regression test on Orion?

@hu5970
Copy link
Collaborator

hu5970 commented Mar 30, 2023

Yes, I will run regression test suite today.

@hu5970
Copy link
Collaborator

hu5970 commented Mar 30, 2023

Start 1: [=[global_3dvar]=]
Start 2: [=[global_4dvar]=]
Start 3: [=[global_4denvar]=]
Start 4: [=[hwrf_nmm_d2]=]
Start 5: [=[hwrf_nmm_d3]=]
Start 6: [=[rtma]=]
Start 7: [=[rrfs_3denvar_glbens]=]
Start 8: [=[netcdf_fv3_regional]=]
Start 9: [=[global_enkf]=]

1/9 Test #1: [=[global_3dvar]=] ...............***Failed 300.86 sec
2/9 Test #9: [=[global_enkf]=] ................***Failed 670.81 sec
3/9 Test #7: [=[rrfs_3denvar_glbens]=] ........ Passed 726.46 sec
4/9 Test #4: [=[hwrf_nmm_d2]=] ................ Passed 727.67 sec
5/9 Test #8: [=[netcdf_fv3_regional]=] ........***Failed 786.59 sec
6/9 Test #5: [=[hwrf_nmm_d3]=] ................***Failed 1037.01 sec
7/9 Test #6: [=[rtma]=] .......................***Failed 1512.72 sec
8/9 Test #3: [=[global_4denvar]=] ............. Passed 1744.55 sec
9/9 Test #2: [=[global_4dvar]=] ............... Passed 1801.57 sec

44% tests passed, 5 tests failed out of 9

Total Test time (real) = 1801.57 sec

The following tests FAILED:
1 - [=[global_3dvar]=] (Failed)
5 - [=[hwrf_nmm_d3]=] (Failed)
6 - [=[rtma]=] (Failed)
8 - [=[netcdf_fv3_regional]=] (Failed)
9 - [=[global_enkf]=] (Failed)

Will rerun failed cases.

@hu5970
Copy link
Collaborator

hu5970 commented Mar 30, 2023

Try three Orion failed cases again:
1/1 Test #1: [=[global_3dvar]=] ............... Passed 2588.40 sec
1/1 Test #5: [=[hwrf_nmm_d3]=] ................***Failed 856.69 sec
1/1 Test #8: [=[netcdf_fv3_regional]=] ........***Failed 660.39 sec

Check the regression test results:
hwrf_nmm_d3, global_enkf, and rtma failed "resulting in Failure time-thresh of the regression test".
netcdf_fv3_regional: "The slope for the update (42.310395 seconds per node) is greater than or equal to that for the control (34.247575 seconds per node)."

There are not major failure on Orion. We can consider regression test passed on Orion.

@TingLei-NOAA
Copy link
Contributor

Update on the regression tests on hera: All tests passed. It happened again that I had to manually add some missing fix files like rrfs_glb_berror.l127y770.f77, though I am sure some specific order of "git submoduel" commands and the "git checkout " commands would avoid such an issue as I did with another previous PR. However, the "vulnerability " to such an issue is concerning, though I speculate this issue is maybe not related to this PR since this happened previously with other PR.

@ShunLiu-NOAA
Copy link
Contributor

@hu5970 and @TingLei-NOAA Thank you for regression test on Orion and Hera. Since regression test passed on the both machine, we can merge this PR to develop.
@TingLei-NOAA Do you think we should open an issue for rrfs_glb_berror.l127y770.f77?

@ShunLiu-NOAA ShunLiu-NOAA merged commit 14ae595 into NOAA-EMC:develop Apr 3, 2023
@RussTreadon-NOAA
Copy link
Contributor

@TingLei-NOAA , I executed the following on Orion

  1. cd /work/noaa/da/rtreadon/git/gsi
  2. mkdir test
  3. cd test
  4. git clone --recursive https://github.com/NOAA-EMC/GSI.git .
  5. ls -l fix/rrfs*
  6. cd ush
  7. ./build.sh
  8. cd ../
  9. ls -l fix/rrfs*

The first ls -l fix/rrfs* (step 5 above). returned nothing.

Orion-login-3:/work/noaa/da/rtreadon/git/gsi/test$ ls -l fix/rrfs*
ls: cannot access fix/rrfs*: No such file or directory
Orion-login-3:/work/noaa/da/rtreadon/git/gsi/test$

This is is correct. The fix submodule in GSI develop points at NOAA-EMC/GSI-fix. The GSI-fix repository only contains ASCII GSI fix files. File fix/rrfs_glb_berror.l127y770.f77 is a binary GSI fix files. EIB stages binary GSI fix files on WCOSS2 and select NOAA RDHPCS machines. The path to the binary GSI fix files is specified in modulefiles/gsi_$machine.lua as shown below

gsi_hera.gnu.lua:pushenv("GSI_BINARY_SOURCE_DIR", "/scratch1/NCEPDEV/global/glopara/fix/gsi/20221128")
gsi_hera.intel.lua:pushenv("GSI_BINARY_SOURCE_DIR", "/scratch1/NCEPDEV/global/glopara/fix/gsi/20221128")
gsi_jet.lua:pushenv("GSI_BINARY_SOURCE_DIR", "/mnt/lfs4/HFIP/hfv3gfs/glopara/git/fv3gfs/fix/gsi/20221128")
gsi_orion.lua:pushenv("GSI_BINARY_SOURCE_DIR", "/work/noaa/global/glopara/fix/gsi/20221128")
gsi_s4.lua:pushenv("GSI_BINARY_SOURCE_DIR", "/data/prod/glopara/fix/gsi/20221128")
gsi_wcoss2.lua:pushenv("GSI_BINARY_SOURCE_DIR", "/lfs/h2/emc/global/noscrub/emc.global/FIX/fix/gsi/20221128")

The top level CMakeLists.txt and fix/CMakeLists.txt contain logic which populates fix with binary GSI fix files when ush/build.sh is executed (step 7 above).

The cmake build prints a runtime message stating that GSI binary files files are being copied from the staged location

-- Found w3emc: /apps/contrib/NCEP/libs/hpc-stack/intel-2018.4/w3emc/2.9.1/lib64/libw3emc_4.a (found version "2.9.1")
-- Copying GSI binary fix files from /work/noaa/global/glopara/fix/gsi/20221128
-- RT: Control GSIexec Found: /work/noaa/da/rtreadon/git/gsi/develop/install/bin/gsi.x

After build.sh completes, ls -l fix/rrfs* shows that the rrfs file exists

Orion-login-3:/work/noaa/da/rtreadon/git/gsi/test$  ls  -l fix/rrfs*
-rw-r--r-- 1 rtreadon da 67363540 Apr  3 12:26 fix/rrfs_glb_berror.l127y770.f77
Orion-login-3:/work/noaa/da/rtreadon/git/gsi/test$

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA Following your steps, I did some extra "playing around" with this "missing fix files issue" . it seems from the EMC GSI , no problem would occur no matter if I did "git submodule update" before or after building. Seems the issue is specific to the fork involved in this PR. For example, the rrfs fix files are missing when git clone this PR , while it exists in the EMC GSI. But, since it will not affect the EMC GSI repository, I think we can stop here satisfied that EMC GSI does work well.
Thanks.

@RussTreadon-NOAA
Copy link
Contributor

@TingLei-NOAA , thank you for the update. I find nothing wrong with the fix submodule associated with shoyokota:feature/PR_NOAA-EMC_EnVar-SDL2 .

The three step sequence

  1. git clone --recursive https://github.com/shoyokota/GSI.git .
  2. git checkout feature/PR_NOAA-EMC_EnVar-SDL2
  3. git submodule update

returns a fix pointing at 0be2697. This agrees with NOAA-EMC/GSI develop. This fix only contains ASCII GSI fix files. This is correct. Execution of ush/build.sh populates fix with GSI binary fix files.

I am unable to reproduce the missing fix file problem you report. I agree with you. It's time to move forward since this PR has been merged into NOAA-EMC/GSI develop

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA . You can see /scratch2/NCEPDEV/fv3-cam/Ting.Lei/dr-pr/dr-test/GSI/fix/list.txt_after_submodule_update, which lists the fix files before I ran build.sh . and you can compare with https://github.com/shoyokota/GSI.git after git submodule update. In the latter, several rrfs txt files like (anavinfo.rrfs ..) are missing (only rrfs_glb_berror.l127y770.f77 exists). I didn't try to dig more on Rcov* binary files issue this time, which is supposed to be copied in build.sh (while, before, I found those files would disappear after once copied previously).

@TingLei-NOAA
Copy link
Contributor

And, after checking out the branch with this PR : feature/PR_NOAA-EMC_EnVar-SDL2. Did "git submodule sync" and "git submodule update", It would be found the "anavinfo.rrfs convinfo.rrfs errtable.rrfs* hybens_info.rrfs" would exist but the previous "rrfs_glb_berror.l127y770.f77" would disappear. I think this is all about the appropriate git managing of the fix dir (submodule). But, since it is not an issue with the EMC GSI. I think I would be satisfied with the current situation. And I will be happy to learn if I missed anything on this problem.

@RussTreadon-NOAA
Copy link
Contributor

@TingLei-NOAA , grep rrfs /scratch2/NCEPDEV/fv3-cam/Ting.Lei/dr-pr/dr-test/GSI/fix/list.txt_after_submodule_update returns the following

Hera(hfe08):/scratch1/NCEPDEV/da/Russ.Treadon/para_gfs/prufsatm$ grep rrfs /scratch2/NCEPDEV/fv3-cam/Ting.Lei/dr-pr/dr-test/GSI/fix/list.txt_after_submodule_update
-rw-r--r-- 1 Ting.Lei fv3-cam      2468 Apr 10 15:12 anavinfo.rrfs
-rw-r--r-- 1 Ting.Lei fv3-cam     27424 Apr 10 15:12 convinfo.rrfs
-rwxr-xr-x 1 Ting.Lei fv3-cam    492800 Apr 10 15:12 errtable.rrfs*
-rw-r--r-- 1 Ting.Lei fv3-cam      2540 Apr 10 15:12 hybens_info.rrfs
-rw-r--r-- 1 Ting.Lei fv3-cam  67363540 Apr 10 15:35 rrfs_glb_berror.l127y770.f77

Are these five rrfs files you what you expect to see?

I did the following on Hera

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ git clone --recursive https://github.com/shoyokota/GSI.git .
Cloning into '.'...
remote: Enumerating objects: 64001, done.
remote: Counting objects: 100% (81/81), done.
remote: Compressing objects: 100% (51/51), done.
remote: Total 64001 (delta 39), reused 53 (delta 29), pack-reused 63920
Receiving objects: 100% (64001/64001), 52.44 MiB | 15.09 MiB/s, done.
Resolving deltas: 100% (41988/41988), done.
Submodule 'fix' (gerrit:GSI-fix) registered for path 'fix'
Cloning into '/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting/fix'...
remote: Counting objects: 259, done
remote: Finding sources: 100% (258/258)
remote: Total 1436 (delta 127), reused 1414 (delta 127)
Receiving objects: 100% (1436/1436), 3.22 GiB | 38.18 MiB/s, done.
Resolving deltas: 100% (692/692), done.
Submodule path 'fix': checked out '023f81d4ba631d235156172e04a529a4bf273617'

A listing of fix shows that only rrfs_glb_berror.l127y770.f77 is found.

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ ls -l fix/ |grep rrfs
-rw-r--r-- 1 Russ.Treadon da  67363540 Apr 10 18:42 rrfs_glb_berror.l127y770.f77

This makes sense because git clone --recursive https://github.com/shoyokota/GSI.git . places us on develop in Sho's fork. His develop is at hash 1587c8c which dates back to August 11, 2022. The fix\ associated with develop at this hash points at e23204e which is in repository gerrit:GSI-fix.

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ git log fix | head
commit e23204e5ea8f73056ffb2d181ff340deaa0d2163
Author: michael.lueken <Michael.Lueken@noaa.gov>
Date:   Thu Jul 28 17:05:42 2022 +0000

    GitHub Issue NOAA-EMC/GSI#444. Addition of EFSOI block to global_anavinfo.l127.txt.

Sho's PR is for his forked branch feature/PR_NOAA-EMC_EnVar-SDL2. When we checkout his branch into the working copy of his develop, we get

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ git checkout feature/PR_NOAA-EMC_EnVar-SDL2
Checking out files: 100% (126/126), done.
M       fix
Branch 'feature/PR_NOAA-EMC_EnVar-SDL2' set up to track remote branch 'feature/PR_NOAA-EMC_EnVar-SDL2' from 'origin'.
Switched to a new branch 'feature/PR_NOAA-EMC_EnVar-SDL2'

The M fix tells us that the fix submodule has been modified. It has been updated to a new repository, NOAA-EMC/GSI-fix

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ git log fix |head
commit 43286a17da7243bbdc33c358224ad2e957f1f24d
Author: Rahul Mahajan <aerorahul@users.noreply.github.com>
Date:   Fri Jan 13 06:47:16 2023 -0500

    Deprecate gerrit and use github for fix files (#521)

Since the fix submodule points at a different repo, NOAA-EMC/GSI-fix, and not gerrit:GSI-fix, we need to resync the fix url via git submodule sync. After this we pull in fix from the NOAA-EMC/GSI-fix repository.

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ git submodule sync
Synchronizing submodule url for 'fix'
Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ git submodule update
warning: no common commits
remote: Enumerating objects: 796, done.
remote: Counting objects: 100% (796/796), done.
remote: Compressing objects: 100% (373/373), done.
remote: Total 796 (delta 439), reused 773 (delta 423), pack-reused 0
Receiving objects: 100% (796/796), 8.54 MiB | 3.27 MiB/s, done.
Resolving deltas: 100% (439/439), done.
From https://github.com/NOAA-EMC/GSI-fix
 * [new branch]      develop    -> origin/develop
Submodule path 'fix': checked out '0be26971f834fe9b1d5b118e1e0ffed53facf671'

A check of rrfs files in fix/ now returns

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ ls -l fix/ |grep rrfs
-rw-r--r-- 1 Russ.Treadon da    2468 Apr 10 18:57 anavinfo.rrfs
-rw-r--r-- 1 Russ.Treadon da   27424 Apr 10 18:57 convinfo.rrfs
-rwxr-xr-x 1 Russ.Treadon da  492800 Apr 10 18:57 errtable.rrfs
-rw-r--r-- 1 Russ.Treadon da    2540 Apr 10 18:57 hybens_info.rrfs

The above listing only includes four rrfs ASCII GSI fix files. This is correct. NOAA-EMC/GSI-fix only contains ASCII GSI fix files. EIB stages binary GSI fix files on disk. Binary GSI fix files are pulled into fix\ via the build step.

After executing ush/build.sh, a check of rrfs files in fix\ returns

Hera(hfe01):/scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/ting$ ls -l fix/ | grep rrfs
-rw-r--r-- 1 Russ.Treadon da      2468 Apr 10 18:57 anavinfo.rrfs
-rw-r--r-- 1 Russ.Treadon da     27424 Apr 10 18:57 convinfo.rrfs
-rwxr-xr-x 1 Russ.Treadon da    492800 Apr 10 18:57 errtable.rrfs
-rw-r--r-- 1 Russ.Treadon da      2540 Apr 10 18:57 hybens_info.rrfs
-rw-r--r-- 1 Russ.Treadon da  67363540 Apr 10 19:02 rrfs_glb_berror.l127y770.f77

This agrees with the listing you provided in file /scratch2/NCEPDEV/fv3-cam/Ting.Lei/dr-pr/dr-test/GSI/fix/list.txt_after_submodule_update

Sho's branch feature/PR_NOAA-EMC_EnVar-SDL2 is fine.

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA Thanks for your digging. So, it seems , after I finished all building in the feature/PR_NOAA-EMC_EnVar-SDL2, if I switched to develop and did " git submodule sync/update" (for some reasons) and, all rrfs txt files and binary fix files copied in build would be removed , even after I switched back to feature/PR_NOAA-EMC_EnVar-SDL2 branch. I think that was what happened in my case. I just repeated the above steps and confirmed this .

TingLei-daprediction referenced this pull request in TingLei-daprediction/GSI May 23, 2023
…e/time covariances in EnVar (#542)

This PR modifies options (i_ensloccov4{scl,var,tim} ->
r_ensloccov4{scl,var,tim}) to tune cross-scale/variable/time covariances
in EnVar (NOAA-EMC/GSI#539). Regression tests
for global 3dvar/4denvar/4dvar are not completed yet, but for other
tests, issues are not found except for "failed the scalability test" and
"exceeded maximum allowable hardware memory limit" on Orion.

Fixes #539

Co-authored-by: Sho Yokota <syokota@Orion-login-1.HPC.MsState.Edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add options to tune cross-scale/variable/time covariances in EnVar
6 participants