Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ensemble sdv to diag ( selected variables ) #635

Merged
merged 12 commits into from
Mar 23, 2023

Conversation

saraqzhang
Copy link
Contributor

add calculation of ensemble standard deviation and output to diag for selected variables ( in lndfcstand_Nx collection)

@saraqzhang saraqzhang requested a review from a team as a code owner March 8, 2023 22:17
@github-actions
Copy link

github-actions bot commented Mar 8, 2023

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

3 similar comments
@github-actions
Copy link

github-actions bot commented Mar 8, 2023

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

github-actions bot commented Mar 8, 2023

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

github-actions bot commented Mar 8, 2023

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@gmao-rreichle
Copy link
Contributor

@saraqzhang, thanks for putting this together. I've gone through the changes quickly. Here are my 2c, for further discussion:

  • Do we really need the ensemble stddev from both the GEOSens_GridComp and the GEOSlandassim_GridComp? The latter is an instantaneous stddev. I'm not entirely sure what we get if we compute the stddev from the GEOSens_GridComp when the output is a time-average. Would we get the ensemble stddev of the time-average or the time average of the ensemble stddev? This may be a question for @weiyuan-jiang. Perhaps we can just use the stddev from GEOSens_GridComp with the understanding that it should only be written for instantaneous output collections. Although in that case it's not entirely clear to me whether we would be getting the ens stddev of the forecast or the analysis. Maybe that's why we need the output from the GEOSens_GridComp and the GEOSlandassim_GridComp, after all, so we could get both the forecast and the analysis ens stddev -- if we actually need both.
  • The calculation of the ens stddev in the current implementation computes the statistic the population, i.e., var = sum[ x - mean]/N. I wonder if we shouldn't compute the sample variance, i.e., var = sum[x-mean]/(N-1). The difference is the normalization factor. The numerical implementation would be something like sqrt( SumOfSquares/(N-1) - N/(N-1)*mean*mean) (to be verified!)
  • The calculations in the GEOSlandassim_GridComp could probably be simplified by defining the **2 ("square") operator for the cat_diagn_S structure. See definition of (arguably simpler) operators in https://github.com/GEOS-ESM/GEOSldas/blob/develop/src/Components/GEOSldas_GridComp/Shared/catch_types.F90 Not sure, though, how exactly to do this for this operator.
  • Across the changes in the PR, we should settle on consistent string for abbreviating "ensemble standard deviation". Right now we have "sdv", "ens.sdv", "ensdv", and "enssdv". Unfortunately, there are also two different strings for the ensemble mean ("enavg" and "ensavg"), but that shouldn't keep us from using just one string consistently for the ensemble stddev. My preference would be to use "enstd" or "ensstd" for consistency with the existing nomenclature for the mean and stddev elsewhere in the code ("std" is used in the *ensprop.nml file).

For now the above is just some food for thought and further discussion. It's not meant to be a complete review.

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , the "before and after analysis" selection records fcst spread (ensGC) and anal spread (assimGC) , respectively. I put the specification for stdv as in "inst" collection in HISTORY.rc. If they are put in tavg collection, MAPL will calculate time average of the ens. sdtv.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , I made modifications corresponding to your comments earlier, and pushed the changes to the branch.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@gmao-rreichle
Copy link
Contributor

gmao-rreichle commented Mar 17, 2023

@saraqzhang, @weiyuan-jiang:

  • I added a few more commits that further clean up the variable names, add new operators for the cat_diagS type, properly protect against division by zero (I hope), and add the new ENSSTD outputs into the tile_bin2nc4.F90 utility.
  • @saraqzhang, please review the changes and test if the revised branch still works for the new outputs.
  • @weiyuan-jiang, please take a look at the changes and let me know if you see anything that may need work. Since the GEOSens_GridComp now does a little bit more than just compute the ens avg, I'm wondering if we should change the GridComp name in GEOS_LdasGridComp.F90 from ENSAVG to ENSSTATS. Would this be as simple as just replacing the string "ENSAVG" with "ENSSTATS" in GEOS_LdasGridComp.F90?
  • This PR needs to be coordinated with clean up variable and subroutine names for Ldas grid  #637, which touches on some of the same files. My preference would be to finish and merge clean up variable and subroutine names for Ldas grid  #637 first, and then merge the present PR, just in case.

PS: I recommend hiding white space differences when looking at the "Files Changed" (on the Files Changed tab, click on Settings, then check the button).

@gmao-rreichle gmao-rreichle marked this pull request as draft March 17, 2023 00:03
@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , I will pull the revised branch and run test.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@gmao-rreichle
Copy link
Contributor

@saraqzhang: Sorry, I just realized that there was no protection against (small) negative values in the ens std output. I added one more commit: 37dbf00

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , the revised branch passed test ( settings of ens_N > 1 ; assim ) , and outputs of std are verified. Should we also test the case of ens_N =1 & no assim ?

@gmao-rreichle
Copy link
Contributor

@gmao-rreichle , the revised branch passed test ( settings of ens_N > 1 ; assim ) , and outputs of std are verified. Should we also test the case of ens_N =1 & no assim ?

Thanks, @saraqzhang. Yes, it would be good to run a very short test for N_ens=1, just to verify that the ensstd output is indeed MAPL_UNDEF everywhere.

Another useful short test could be to request ens std output for N>1 but remove the ens avg variables from HISTORY. In this case, you should also get MAPL_UNDEF everywhere given the current implementation.

If we want to have ens std output in the absence of matching ens avg output, we have to modify the if conditions. I didn't deal with this yesterday but I think it may be straightforward by changing
if (associated(TSURF_enavg))
to
if (associated(TSURF_enavg) .or. associated(TSURF_enstd))
etc. But I'd have to take a closer look at the code to sort this out. Feel free to take a look yourself.

BTW, how exactly did you write ensstd output? Was it written in tile space? If so, please verify that the LONG_NAMES and units in the nc4 file are as expected. This has to go through tile_bin2nc4.F90, which hadn't been modified in the original PR, and I'm wondering how you were able to get good nc4 files in your earlier tests. (GEOSldas_HIST.rc had ens std variables only in the 1d lndfcstana collection, which is why I assume you wrote output in tile space.) I might be missing something.

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , I realized that I pushed the wrong version of GEOSldas_HIST.rc in my local branch. I had HISTORY.rc in the test run dir. that had std in grid output (Nx) . We should be able to write std in 1d tile collection though I have not tested that .
I will test the cases of std =MAPL_UNDEF.
I will need to take a closer look at the case of ens std output in the absence of matching ens avg output.

@gmao-rreichle
Copy link
Contributor

I pushed the wrong version of GEOSldas_HIST.rc

Is the version of GEOSldas_HIST.rc that is on the branch right now ok, then?

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , yes the current version of GEOSldas_HIST.rc works for both Nx and Nt outputs for ensstd .

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , the summary of tests for ensstd :
(1) N_ens=1 .
LDAS carries out a model run, ASSIM and ENSAVG GC are not participated. Thus no outputs from those GC should be in HISTORY.rc ( the same goes with "SMAP_L4_SM_gph" collection, which ldas_setup took care of not being in HISTORY.rc when N_ens =1, ASSIM =No ) . If a collection from ENSAVG and ASSIM is in HISTORY.rc when N_ens=1, MAPL HISTORY would crash.
(2) N_ens >1, and ensstd in HISTORY.rc without ensavg in the collection.
in this case the outputs are : fcst_ensstd ( from ENSAVG GC) are not calculated and output as MAPL_UNDEF ; ana_ensstd ( from ASSIM GC) are calculated and values outputted (no matter if the variables are or not in HISTORY.rc ). Since In ENSAVG GC the export variables ( definition and calculation) are controlled by HISTORY.rc, to output ensstd without ensavg from ENSAVG GC will require an internal defined ensavg calculation ( not defined in export and association) for the calculation of ensstd.

@gmao-rreichle
Copy link
Contributor

(1) N_ens=1 . LDAS carries out a model run, ASSIM and ENSAVG GC are not participated. Thus no outputs from those GC should be in HISTORY.rc

@saraqzhang: We do have a use case where ASSIM is engaged for N_ens=1. This is for the rule-based snow cover fraction assimilation that Lauren is working on. The functionality for using ASSIM with N_ens=1 may still be only on Lauren's branch, but it will be merged at some point in the future, so we need to protect against users requesting "ensstd" output when N_ens=1. I think the easiest way to do this is to simply set the export to MAPL_UNDEF if it cannot be computed because there is only one ensemble member.

(2) N_ens >1, and ensstd in HISTORY.rc without ensavg in the collection. in this case the outputs are : fcst_ensstd ( from ENSAVG GC) are not calculated and output as MAPL_UNDEF ; ana_ensstd ( from ASSIM GC) are calculated and values outputted

Re. ana_ensstd, the output for ensstd when N_ens=1 should be MAPL_UNDEF, based on the following lines of code:

Is this what's happening?

... values outputted (no matter if the variables are or not in HISTORY.rc ).

I'm confused. How would MAPL HISTORY write out a variable if it's not in HISTORY.rc?

Since In ENSAVG GC the export variables ( definition and calculation) are controlled by HISTORY.rc, to output ensstd without ensavg from ENSAVG GC will require an internal defined ensavg calculation ( not defined in export and association) for the calculation of ensstd.

Yes, indeed. The question is whether it's worth the extra work. Usually, I would expect that ensavg output is written whenever ensstd output is requested. Not sure if it's worth to implement getting ensstd output (other than MAPL_UNDEF) when ensavg is not in HISTORY.rc. The question is whether it's easier to properly document/explain this use case or to just implement it with the internal ensavg calcs.

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , "values outputted (no matter if the variables are or not in HISTORY.rc )." bad writing,. ens stats are carried out regardless HISTORY.rc, but the output is still controlled by HISTORY.rc.
I think it is reasonable to require always output ensavg and ensstd together for the case of ens_N> 1 . When ens_N =1 , we enforce in the code ensstd =UNDEF . How is ensavg handled in Lauren's branch? In our current branch, if ens_N=1, ldas_setup does not create ensavg dir ( ens_0001 only) .

@gmao-rreichle
Copy link
Contributor

How is ensavg handled in Lauren's branch?

@saraqzhang: By Lauren's branch I was referring to #512 (for which the branch is in my name, but Lauren is working on it).

The only modification in this PR that has anything to do with using ASSIM when N_ens=1 is in GEOS_LandPertGridComp.F90, and that's not really relevant here. I'm thinking that the infrastructure needed to make ASSIM work with N_ens=1 must already be on the GEOSldas "develop" branch. But I'm not sure if I'm completely off here or perhaps you missed the change in functionality.

…ut; rearrange do loop in GEOS_LandAssimGridComp.F90
@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@gmao-rreichle
Copy link
Contributor

@saraqzhang: I added on commit as discussed: 42f5e71
This should facilitate HISTORY output of ensstd even when ensavg is not written out.
I also rearranged one (new) do loop and hope I didn't mess things up.
Please take a look and re-test the functionality. Thanks!

@saraqzhang
Copy link
Contributor Author

will pull the revision and carry out tests.

@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , the test on ensstd without ensavg failed. I realize that memory allocations for those variables are controlled by MAPL HISTORY,. Without ensavg in the history collection list the pointer does not have a memory address to point to. Thus the run failed with segmentation fault. we can try locally allocated variables just for the case of ensstd without ensavg, though the code will not look as neat and efficient.

…put when ensavg HISTORY output is not also requested
@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@gmao-rreichle
Copy link
Contributor

@saraqzhang: Thanks for the update. It's too bad there's not an elegant solution for ensstd output when ensavg output is not simultaneously requested in HISTORY. I reverted the previous commit and think it's ok as is now. I also merged develop, which was just updated, into the branch and added some rudimentary documentation of the ENSSTD output into GEOSldas_HIST.rc. Please double-check the following two commits: f6a1b71 and 89a8cc5
@biljanaorescanin, when you get a chance, please run the usual set of LDAS Nighty Tests and let us know the results.
Thanks!

@github-actions
Copy link

This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged.

@biljanaorescanin
Copy link
Contributor

All Nightly Tests passed on the branch.

@gmao-rreichle gmao-rreichle marked this pull request as ready for review March 23, 2023 13:03
@gmao-rreichle gmao-rreichle merged commit 86bbdd2 into develop Mar 23, 2023
@gmao-rreichle gmao-rreichle deleted the feature/saraqzhang/enssdvfordiag branch March 23, 2023 13:04
@saraqzhang
Copy link
Contributor Author

@gmao-rreichle , tests passed : (1) ens_N >1, LDAS_ASSIM=YES output valid values of ensavg and ensstd
(2) ens_N=1, LDAS_ASSIM=YES ( Innov run) , output valid values of ensavg , UNDEF of ensstd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-diff enhancement New feature or request output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants