Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User defined top snow layer #792

Merged
merged 23 commits into from
Sep 9, 2019

Conversation

slevis-lmwg
Copy link
Contributor

@slevis-lmwg slevis-lmwg commented Aug 26, 2019

Description of changes

Instead of hardcoding, we now make
dzmin(1) = dzmin_1
dzmax_l(1) = dzmax_l
dzmax_u(1) = dzmax_u
where the right-hand-side variables get namelist-defined values and the model calculates dzmin and dzmax_* for the remaining snow layers. The calculation is intended to match the hardcoded values based on formulas written out in a file titled "A firn model for CLM" that has been linked to #729.

Specific notes

Contributors other than yourself, if any: @billsacks

CTSM Issues Fixed (include github issue #): #729

Are answers expected to change (and if so in what way)?
Realistically we expected roundoff but early testing suggests bfb!

Any User Interface Changes (namelist or namelist defaults changes)?
New namelist parameters dzmin, dzmax_l, dzmax_u with default values of 0.01, 0.03, 0.02.

Testing performed, if any:
SMS_D_Ly6_Mmpi-serial.1x1_smallvilleIA.IHistClm45BgcCropQianGs.cheyenne_intel.clm-cropMonthOutput -c ctsm1.0.dev059 PASS nlevsno = 12

SMS_P720x1_Ln3.hcru_hcru.I2000Clm50BgcCruGs.cheyenne_intel.clm-coldStart -c ctsm1.0.dev059 PASS nlevsno = 12

SMS_Ld1.nldas2_rnldas2_mnldas2.I2000Ctsm50NwpSpNldasGs.cheyenne_gnu.clm-default -c ctsm1.0.dev059 PASS nlevsno = 5

SMS_Lm1_D.f10_f10_musgs.I2000Clm50BgcCrop.cheyenne_intel.clm-snowlayers_3_monthly -c ctsm1.0.dev059 PASS nlevsno = 3

ekluzek and others added 3 commits August 16, 2019 15:19
Continue adding water tracers to LakeHydrology

Finish implementing water tracers for initial snow-related code in
LakeHydrology. This involved refactoring code in order to reuse existing
code in SnowHydrology rather than having near-duplicates of that code in
LakeHydrology.

Resolves ESCOMP#775
Instead of hardcoding, we now make
dzmin(1) = dzmin_1
dzmax_l(1) = dzmax_l
dzmax_u(1) = dzmax_u
where the right-hand-side variables get namelist-defined values and
the model calculates dzmin and dzmax_* for the remaining snow layers.
The calculation is based on formulas written out in a file titled
"A firn model for CLM" that has been linked to ESCOMP#729.
@@ -800,12 +808,12 @@ subroutine BuildFilter_SnowpackInitialized(bounds, num_c, filter_c, &
! maintain answers as before.
snowpack_initialized(c) = ( &
snl(c) == 0 .and. &
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billsacks I think the next four 0.01 values are what you wanted me to look for, right?

More generally speaking, I think this code is ready for review. Also I would like to know your preference in this case between new Unit Test and new standard test-suite test. I will be working on that next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are the four places where I know this value is hard-coded. If you haven't already, thenI also thought it would be worth grepping the code for 0.01 to see if there are any other references that I'm not aware of.

As for testing: With some upcoming suggestions that I'm going to make (in which you don't have hard-coded defaults for these values, but rather have them always explicitly set in the namelist), your new code will be sufficiently covered by existing system tests, so no need to add additional tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked for all the instances of 0.01 at the time, and there were some more that looked unrelated, so I left them alone.

@slevis-lmwg
Copy link
Contributor Author

Hobart test-suite (not complete, yet) does produce some diffs from dev059. Will investigate.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Aug 26, 2019

From a total of 23 tests that returned DIFF, I spot checked the following and found:

SMS.f10_f10_musgs.I2000Clm50BgcCrop.hobart_gnu.clm-crop ROUNDOFF
SMS_D.f10_f10_musgs.I2000Clm50BgcCrop.hobart_intel.clm-crop ROUNDOFF
ERI_D_Ld9_P48x1.f10_f10_musgs.I2000Clm50BgcCruGs.hobart_nag.clm-reduceOutput ROUNDOFF
ERP_D_Ld5_P48x1.f10_f10_musgs.I1850Clm50Bgc.hobart_nag.clm-ciso ROUNDOFF
ERP_Ld5_P48x1.f10_f10_musgs.I2000Clm50BgcCruGs.hobart_nag.clm-noFUN_flexCN ROUNDOFF
SMS_Ld5_D_P48x1.f10_f10_musgs.IHistClm50Bgc.hobart_nag.clm-decStart >ROUNDOFF
SMS_Ld5_D_P48x1.f10_f10_musgs.IHistClm50Bgc.hobart_nag.clm-monthly ROUNDOFF
SMS_P48x1_D_Ld5.f10_f10_musgs.I2000Clm50Cn.hobart_nag.clm-default ROUNDOFF

The one with DIFFs > ROUNDOFF had RMS as high as 0.23. I suspect that this is still not due to a mistake and is just the result of changing from hardwired to calculated dzmin and dzmax_*.

Would you like me to confirm roundoff for all tests that returned DIFF in the test-suite? Also I may be looking too quickly at the RMS values by scanning just the first page. Should I be scrolling through all the "by rms" pages?

I have now submitted the cheyenne test-suite, which also is returning many tests with DIFFs.

@billsacks
Copy link
Member

Regarding the DIFFs, I would suggest the following: First, write out the three dz* arrays, manually inspect these and verify that they are no more than roundoff-level different from the original, hard-coded arrays. Second, run at least a few tests (maybe not necessary to run the whole test suite; maybe at least one Clm50 test, a Clm45 test, and an Nwp test?) where, after setting the three dz* arrays based on the namelist inputs, you have code like this:

  dzmin_orig = (/ 0.010_r8, 0.015_r8, 0.025_r8, 0.055_r8, 0.115_r8, 0.235_r8, &
                  0.475_r8, 0.955_r8, 1.915_r8, 3.835_r8, 7.675_r8, 15.355_r8 /)

  if (max(abs(dzmin(1:nlevsno) - dzmin_orig(1:nlevsno))) > 1.e-13_r8) then
     call endrun(...)
  else
     dzmin(1:nlevsno) = dzmin_orig(1:nlevsno)
  end if

And similarly for the two other dz arrays. Then verify that that is bfb - ensuring that the only differences arise from these roundoff-level diffs in the dz arrays. If that's the case, then no need to look closely at the RMS errors in the tests.

@billsacks
Copy link
Member

But you can hold off on doing that testing I just suggested until the final version of the code, because I'm about to suggest a lot of changes.

@billsacks
Copy link
Member

Before I go into my more detailed review, I wanted to start with some overall comments / requests:

Based on your realization that dzmax_l(2) also follows a different formula, I'd suggest that we let the user specify six values: dzmin, dzmax_u and dzmax_l for the first two layers. Then derive layers 3+ from the specified layer 2 values. I now think this will be more straightforward and less error-prone than using special-purpose formulas for layer 2, particularly given that these special-purpose formulas likely won't work in all cases.

You should then have some error checking code to ensure that all values - both the user-specified ones and the generated values - satisfy some constraints:

(a) All three vectors should be monotonically increasing

(b) dzmin(n) < dzmax_u(n) < dzmax_l(n) for all n

There may be additional, tighter constraints needed as well. If you think you could do this without it becoming a huge undertaking, would you be able to look through the snow code - particularly the layer combination and division - to see if it looks like there should be any tighter constraints in order to avoid problematic situations? I'm imagining situations like: the code tries to divide a layer, but the various dz arrays are set up in such a way that the newly divided layers cannot hold this snow, so then the code would try to immediately recombine the layers, etc. (I'm not sure exactly what could cause a situation like this, but it seems theoretically possible that there could be such a pathological situation.) If you feel it would take you a long time to understand the snow code well enough to determine this, I guess it's not absolutely necessary, but in that case, we should attach appropriate warnings to these new namelist variables, that they should be used with caution. It does seem that, if these are going to be used, we'll want to go through this exercise of coming up with constraints at some point....

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your initial work on this. See line comments below in addition to some overall comments I made in the PR.

@billsacks
Copy link
Member

Two other thoughts:

  • In order for this to be only double-precision roundoff-level different, the namelist defaults should have d0 in them (e.g., 0.01d0 or 0.01d00: I see the latter in a few places in namelist_defaults_ctsm.xml, so I guess use that for consistency, even though the two should be the same).

  • Please write the derived dz vectors to the log file on the master processor (i.e., just to the lnd.log file).

3.74_r8, 7.57_r8, 15.24_r8, 30.59_r8, 61.3_r8, huge(1._r8) /)
real(r8), parameter :: dzmax_u(12) = & ! maximum thickness of layer when layers beneath
(/ 0.02_r8, 0.05_r8, 0.11_r8, 0.23_r8, 0.47_r8, 0.95_r8, &
1.91_r8, 3.83_r8, 7.67_r8, 15.35_r8, 30.71_r8, huge(1._r8) /)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reference to this comment, the original code assigned huge(1._r8) to dzmax_l(12) and dzmax_u(12). So to match the original behavior, I think I need to do the following:

if (nlevsno >= 12) then
dzmax_l = huge(1._r8)
dzmax_u = huge(1._r8)
end if

Is this what you were suggesting @billsacks ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely...
if (j >= 12) then
dzmax_l(j) = ...
dzmax_u(j) = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing mind:
I prefer the first if statement with these corrections:
dzmax_l(nlevsno) = ...
dzmax_u(nlevsno) = ...

This replicates the current behavior for nlevsno = 12. If some day we allow nlevsno > 12, this just sets the bottom layer to huge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the last comment is what I was thinking (where the ... should be huge(1._r8)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., the bottom layer should be set to huge, regardless of the number of snow layers. I now realize that that isn't exactly what the original code did, but I think that was the intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be good to verify that that does the right thing. Either (a) make that change and verify that it doesn't change answers for, say, a 5-layer case (a CLM45 case), or (b) don't make that change, and ensure that having dzmax_l and dzmax_u NOT set to huge for nlevsno gives the same answers as setting it to huge for a 12-layer case. I prefer (a), but don't feel strongly about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer (a).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be good to verify that that does the right thing. Either (a) make that change and verify that it doesn't change answers for, say, a 5-layer case (a CLM45 case), or [...]

The hobart test-suite includes a few clm45 cases and they all PASS.

@slevis-lmwg
Copy link
Contributor Author

There may be additional, tighter constraints needed as well. If you think you could do this without it becoming a huge undertaking [...]

This would probably take me a bit of time since I haven't delved into the snow code before. I will hold off for now and will attach warnings to the new namelist variables. And I'm also open to investigating tighter constraints if you give me the go-ahead.

New clm5 paramsfile that has the bad date of 631 changed to 701. Along with some simple bit for
bit changes. Removing some problematic options. Making Dynamic Vegetation a deprecated option
that requires you to use -ignore_warnings with it.
@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Sep 2, 2019

[...] Second, run at least a few tests (maybe not necessary to run the whole test suite; maybe at least one Clm50 test, a Clm45 test, and an Nwp test?) where, after setting the three dz* arrays based on the namelist inputs, you have code like this:
[...]
Then verify that that is bfb - ensuring that the only differences arise from these roundoff-level diffs in the dz arrays. If that's the case, then no need to look closely at the RMS errors in the tests.

This test returned DIFF without the recommended code and PASS with the recommended code:
./create_test ERI_D_Ld9.f10_f10_musgs.I2000Clm50BgcCruGs.cheyenne_intel.clm-default -c /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev059
I will push my mods, including the temporary recommended code so that you can inspect if you wish. Also this way I will be able to pull on hobart and submit the full test-suite with the temporary recommended code.

PS. Reran the individual test after git merge ctsm1.0.dev061 and got PASS. Now running hobart's test-suite.

PS2. The hobart test-suite does not include Nwp tests, so I submitted a couple of those separately:
./create_test ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Ctsm50NwpBgcCropGswpGs.cheyenne_intel.clm-default -c /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev061 PASS
./create_test ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Ctsm50NwpSpGswpGs.cheyenne_intel.clm-default -c /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev061 PASS

Includes:
- Adding the 3 additional namelist variables needed for 2nd snow layer
- Renaming all 6 new namelist variables (1st & 2nd snow layer's vars)
- Moving the default values to namelist_defaults_ctsm.xml
- Adding error checks in CLMBuildNamelist.pm and SnowHydrologyMod.F90
- Reverting changes in clm_varcon.F90 and making the dzmin and dzmax_*
variables local to SnowHydrologyMod.F90
- Temporary testing code for review and for testing on Hobart, to be
removed when done testing
New clm5 paramsfile that has the bad date of 631 changed to 701. Along with some simple bit for
bit changes. Removing some problematic options. Making Dynamic Vegetation a deprecated option
that requires you to use -ignore_warnings with it.
@slevis-lmwg
Copy link
Contributor Author

[...] this way I will be able to pull on hobart and submit the full test-suite with the temporary recommended code.

The hobart test-suite:
./run_sys_tests -s aux_clm -c ctsm1.0.dev061 --testroot-base /scratch/cluster/slevis --skip-generate
returns PASS.

@slevis-lmwg
Copy link
Contributor Author

@billsacks I have completed your suggested bfb testing successfully. I will leave the test code in there temporarily for your review.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your detailed notes responding to my comments and describing testing you performed. This looks very good now. I have just a few small comments below. I think it's fine not to have additional sanity checks for now, but just to have the warnings you put in place.

ekluzek and others added 5 commits September 4, 2019 16:35
Fire bug fix and DWT_SLASH change and fix urban stream years for clm4_5.
The latitude used in an expression for Bgc-Fire model Li2016 (used in Clm50)
had latitude in degrees rather than in radians as it should have been. The
term is used for ignition and the cosine was being taken in degrees rather
than radians, hence the spatial pattern for ignition with latitude was incorrect.

The history field DWT_SLASH_CFLUX was being calculated on the column, but
really should have been a patch level variable. In the same way other fields
were handled we made DWT_SLASH_CFLUX a grid-cell variable, and added
DWT_SLASH_CFLUX_PATCH on the patch level that could be added to output. The same
is true of the C13_ and C14_ versions of it.

Clm45 was setting urbantv to year 2000, rather than 1850 or historical as it should
have. The urbantv file for Clm45 doesn't actually change, so this doesn't actually
make a difference. But, it does make the namelist look wrong.

Note, that SP cases, Vic cases and Fates cases do NOT change answers. Or if fire is
off, or certain single-point cases. Some cold-start cases don't seem to be sensitive
to it either.
@slevis-lmwg
Copy link
Contributor Author

@billsacks I had already merged dev063 with my branch before receiving your warning against it. What do you recommend I do now? Should I revert to ae415da and start over?

@slevis-lmwg
Copy link
Contributor Author

Meanwhile the izumi test-suite has passed with the temporary code in place. I don't even see what used to be expected failures on hobart.

Also this Nwp test has passed on cheyenne:
ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Ctsm50NwpBgcCropGswpGs.cheyenne_intel.clm-default -c /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev063

@billsacks
Copy link
Member

Okay, the new code looks good - thanks for that.

One thing that still seems untested (unless I'm missing something) is ensuring that setting dzmax_l(nlevsno) and dzmax_u(nlevsno) equal to huge doesn't change answers for a CLM45 case (where I think that wasn't being done before): Your new code correctly (I think) sets these values to huge, but then the bit-for-bit tests you have run so far overwrite these values with the originals. If you haven't already, can you do one CLM45 test where you confirm that you get the same answers with dzmax_l(nlevsno) = dzmax_u(nlevsno) = huge as you do when they are at the original values?

Other than that, all that seems needed at this point is the removal of the temporary test code, then you can go ahead with final testing.

Regarding the merge of the slightly problematic dev063: I actually don't think what you've done will cause any problems. To be safe, just merge master into this branch so you have the absolute latest.

And yes, there are no longer any expected failures on izumi (the expected failures on hobart were due to a pgi compiler bug that is fixed in the more recent compiler version on izumi).

@slevis-lmwg
Copy link
Contributor Author

One thing that still seems untested (unless I'm missing something) is ensuring that setting dzmax_l(nlevsno) and dzmax_u(nlevsno) equal to huge doesn't change answers for a CLM45 case (where I think that wasn't being done before): Your new code correctly (I think) sets these values to huge, but then the bit-for-bit tests you have run so far overwrite these values with the originals. If you haven't already, can you do one CLM45 test where you confirm that you get the same answers with dzmax_l(nlevsno) = dzmax_u(nlevsno) = huge as you do when they are at the original values?

Ok, I ran this test and got PASS
ERS_D_Ld5.f10_f10_musgs.I2000Clm45Fates.izumi_nag.clm-FatesColdDef -c ctsm1.0.dev063

All along I had assumed that this would have changed answers...

@slevis-lmwg
Copy link
Contributor Author

Repeating this test one more time (sorry).

Repeated a clm4.5 test and got PASS.
This commit includes updated wording in the ChangeLog.
@slevis-lmwg
Copy link
Contributor Author

Regarding the merge of the slightly problematic dev063: I actually don't think what you've done will cause any problems. To be safe, just merge master into this branch so you have the absolute latest.

git fetch escomp
returned nothing and
git merge master
returned
Already up-to-date.
so I think I'm good.

@billsacks
Copy link
Member

Thanks a lot for running that additional test: it makes me more comfortable knowing that this setting to huge doesn't change answers. This all looks good now and is ready for final testing as far as I'm concerned.

Regarding the merge, though: you should actually do git merge escomp/master, because your local master isn't actually up-to-date with the remote-tracking branch. If you do that, you'll merge in Erik's merge commit. However, I checked just now and that won't actually change anything, so it's completely optional whether you do that or not... I'm more letting you know for the future.

@slevis-lmwg
Copy link
Contributor Author

On izumi only NLCOMP and DIFF tests fail as expected.

On cheyenne the same, except that I need to investigate one test:
FAIL FUNITCTSM_P1x1.f10_f10_musgs.I2000Clm50SpGs.cheyenne_intel RUN time=38

@slevis-lmwg
Copy link
Contributor Author

The error is valid, and I need to think about it. May be as simple as adding an if statement asking if there is any snow at all:

12/41 Test #12: SnowHydrology ....................***Failed Required regular expression not found.Regex=[OK
] 0.12 sec
. ERROR at snow layer j = 2 because dzmin(j) =
0.000000000000000E+000 and dzmin(j-1) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 2 because dzmax_u(j) =
0.000000000000000E+000 and dzmax_u(j-1) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 2 because dzmax_l(j) =
0.000000000000000E+000 and dzmax_l(j-1) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 3 because dzmin(j) = 0.000000000000000E+000
and dzmin(j-1) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 1 because dzmin(j) = 0.000000000000000E+000
and dzmax_u(j) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 2 because dzmin(j) = 0.000000000000000E+000
and dzmax_u(j) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 1 because dzmax_u(j) =
0.000000000000000E+000 and dzmax_l(j) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 2 because dzmax_u(j) =
0.000000000000000E+000 and dzmax_l(j) = 0.000000000000000E+000
ENDRUN:
ERROR at snow layer j = 3 because dzmax_u(j) =
1.000000000000000E-002 and dzmax_l(j) = 1.000000000000000E-002
ENDRUN:
dzmin = 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
5.000000000000000E-003 1.500000000000000E-002
dzmax_l = 0.000000000000000E+000 0.000000000000000E+000
1.000000000000000E-002 4.000000000000000E-002 1.797693134862316E+308
dzmax_u = 0.000000000000000E+000 0.000000000000000E+000
1.000000000000000E-002 3.000000000000000E-002 1.797693134862316E+308
forrtl: severe (151): allocatable array is already allocated
Image PC Routine Line Source
test_SnowHydrolog 000000000164323B Unknown Unknown Unknown
test_SnowHydrolog 00000000012B1C7E snowhydrologymod_ 2290 SnowHydrologyMod.F90

@slevis-lmwg
Copy link
Contributor Author

Actually no. Not if there's any snow at all, but if the user has requested dzmin and dzmax-* values of zero. I think...

@slevis-lmwg
Copy link
Contributor Author

@billsacks I have successfully run the unit tests now:
100% tests passed, 0 tests failed out of 41
Thank you for sharing your insights. This would have taken me some time to figure out alone.

Next I will push from cheyenne, pull on izumi, run a nag test for sanity, and finally rerun the FUNIT test with the -g option (though you said this may not be necessary).

@billsacks
Copy link
Member

Thanks @slevisconsulting - glad that fixed the unit tests. Your changes look good to me.

Minor point for the future: The style of your comments in SnowHydrologySetControlForTesting (where you have a multi-line comment that follows various lines of code) can be problematic to maintain over time, as code is added, deleted, moved, etc. I'd prefer that comments like this come on their own lines before the relevant lines of code. I don't feel you need to change it this time, but just an FYI for the future.

@slevis-lmwg
Copy link
Contributor Author

./create_test SMS_P48x1_D_Ld5.f10_f10_musgs.I2000Clm50Cn.izumi_nag.clm-default PASS

./create_test FUNITCTSM_P1x1.f10_f10_musgs.I2000Clm50SpGs.cheyenne_intel -c /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev063 -g ctsm1.0.dev064 PASS

@billsacks billsacks merged commit 1d5ea12 into ESCOMP:master Sep 9, 2019
@slevis-lmwg slevis-lmwg deleted the user_defined_top_snow_layer branch September 9, 2019 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done (non release/external)
Development

Successfully merging this pull request may close these issues.

3 participants