Add the mosart runoff model #526

apcraig · 2015-12-08T18:29:08Z

These changes add a new model, mosart, as a runoff model.
The changes to the scripts and coupling support those changes.
In addition to the new source code under mosart, there were also
small changes to rtm for consistency and a few changes to the coupling fields.

A handful of new compsets were created to run with mosart, but
additional compsets will be required as needed for science.

LG-20
[non-BFB]

Add the mosart runoff model and two new land->runoff coupling fields. This version of mosart is identical to mosart1_0_00 in the NCAR CESM repository and the modifications match the changes in the CESM exp_tag mosart01_cesm1_3_beta04. Added the land to runoff coupling fields Flrl_rofgwl and Flrl_rofsub. These are glacier/wetland/lake runoff and subsurface runoff. Both are liquid water fluxes. In CLM, these are currently set to zero. The two fields are passed through the coupler with Flrl_rofl and Flrl_rofi. Both rtm and mosart receive these fields and add them to the rofl flux term at present. This commit provides the ability to run mosart and to couple the two new fields. Some science development is still required in CLM and mosart to fully leverage the new capabilities. Three new compsets were added to support testing of mosart, IMCN, BM1850C5, and BM1850CN. These are identical to ICN, B1850C5, and B1850CN except mosart is used instead of rtm. Development and testing was done on titan only. The following tests were run to verify technical correctness of the mosart implementation. *CME.T31_g37.IMCN.titan_pgi *ERS_D.T31_g37.BM1850C5.titan_pgi *ERS.T31_g37.BM1850C5.titan_pgi *NCK.T31_g37.BM1850C5.titan_pgi *NCK.T31_g37.IMCN.titan_pgi *SMS_D_E.T31_g37.IMCN.titan_pgi *SMS_E.T31_g37.IMCN.titan_pgi *SMS.T31_g37.IMCN.titan_pgi *CME.f19_g16.X.titan_pgi And the acme-development test suite was also run and shown to be BFB with the starting ACME version. There is also a bug fix in component_mod.F90 to fix an error in the esmf coupling interface that is unrelated to mosart. [BFB]

This change syncs up the ACME+mosart version with the CESM mosart03_cesm1_4_beta03 version. In this version, mosart exact restart is fixed, Flrl_rofl is renamed to Flrl_rofsur, and Flrl_volrmch is added. There are a few other changes to mosart science as well. These changes produce identical bit-for-bit results with respect to climate. Some of the runoff coupling field names have been changed and a few new coupling fields have been added. This may impact the comparison results, but the results should be identical when comparing. LG-19 *********1*********2*********3*********4*********5*********6*********7** Longer commit message body describing the commit. Can contain lists as follows: * Item 1 * Item 2 * Item 3 A good commit message should be written like an email, a subject followed by a blank line, followed by a more descriptive body. Can also contain a tag at the bottom describing if the commit is Non-BFB or Climate changing: [Non-BFB] [CC]

* Add WRM source code * new file: models/rof/mosart/src/wrm/WRM_modules.F90 * new file: models/rof/mosart/src/wrm/WRM_read_print.F90 * new file: models/rof/mosart/src/wrm/WRM_returnflow.F90 * new file: models/rof/mosart/src/wrm/WRM_start_op_year.F90 * new file: models/rof/mosart/src/wrm/WRM_subw_IO_mod.F90 * new file: models/rof/mosart/src/wrm/WRM_type_mod.F90 * Add NLDAS resolution, datm mode, and IM_2000_CN_NLDAS compset * Add constance machine port WRM is not yet fully validated and is off by default. This is just an initial commit to provide NLDAS capability and to allow some code review. This is BFB with prior versions with WRM off.

fix parallelization bug in nUp sum for non-basin decompositions refactor mosart to improve performance - loop restructuring - reduce number of operations - improve performance of expensive math ops (sqrt, **, sin) Non-BFB or Climate changing: [Non-BFB]

[BFB]

- code cleanup of rtmini and rtmrun - works with mosart input files with scrambled IDs - moved dto term into rtmrun - added direct-to-outlet tranfer capability - removed a bunch of old rtm code - fixed esmf interfaces and tested in DEBUG mode - added budget calculation (still being validated) - has a known exact restart error that introduces a roundoff difference at the first timestep at a handful of gridcells. This is probably not going to impact science, will be fixed next. [Non-BFB] LG-20

*********1*********2*********3*********4*********5*********6*********7** Code cleanup to remove dead code and make the code clearer. More could be done. Update the direct output computation. Add budget diagnostics to verify conservation. In short tests, the model conserves except for in the frozen term in the euler solver. There is still a small issue there that needs further diagnosing. Review history and restart files and checked many fields. Updated the MOSART input file to adjust the rotated Antarctica. LG-20 [Non-BFB]

new input file new direct terms new areatot update history files [Non-BFB] LG-20

[BFB] LG-20

to MOSART_Global_half_20151205a.nc [Non-BFB] LG-20

bishtgautam · 2015-12-08T19:02:37Z

Is this PR BFB?

apcraig · 2015-12-08T19:06:45Z

It should be BFB for configurations that don't use mosart. mosart is not currently part of the standard test suite. There may be some differences because we modified some of the coupling fields' names, but the results should ultimately be identically when running rtm. When you use mosart, you will have a new configuration with no baselines, and mosart will not be bit-for-bit with rtm.

bishtgautam · 2015-12-08T19:40:02Z

Added 'BFB'.

bishtgautam · 2015-12-09T02:33:01Z

@jgfouca , @rljacob, @douglasjacobsen:

The merge of this PR produces relatively small number of conflicts but I have few issues that I need help with.

The following gist shows the status after merge of this branch into next https://gist.github.com/bishtgautam/d0b804c0f2d272d14e51.
In this feature branch, there were few changes made to config_compilers.xml and config_machines.xml. I'm considering reverting all changes made to config_compilers.xml and config_machines.xml in this branch. What do you think?
Why git bailed on automatically merging changes made to the following files?
- models/lnd/clm/src/cpl/lnd_import_export.F90: Is it because next now has two lnd_import_export.F90 (i.e. models/lnd/clm/src/cpl/lnd_import_export.F90
  and /models/lnd/clm/src_clm40/main/lnd_import_export.F90)?
- scripts/ccsm_utils/Case.template/config_definition.xml: Is it because cime/scripts/Tools/config_definition.xml doesn't directly map back to scripts/ccsm_utils/Case.template/config_definition.xml?

@apcraig : I'm trying to determine if the changes made to models/lnd/clm/src/cpl/lnd_import_export.F90 should be made to both models/lnd/clm/src/cpl/lnd_import_export.F90
and /models/lnd/clm/src_clm40/main/lnd_import_export.F90. Is MOSART compatible with CLM4.0 and CLM4.5?

- The files are revert to ba21605, which was the hash of the master when this feature branch was created.

douglasjacobsen · 2015-12-09T15:45:11Z

@bishtgautam:

So, first you'll need to git mv models/rof/mosart components/mosart

But for the other conflicts... I can't tell completely, but it looks like the lnd_import_export.F90 file is more like the CLM4.0 version than the CLM4.5 version.

I think the changes in config_definition.xml should be pretty easy to "re-do" in the merge, in the right place. It looks like this is the diff: https://gist.github.com/douglasjacobsen/3dda495c3234140507a5

Here is the diff for the machines changes (i.e. config_compilers.xml, etc) https://gist.github.com/douglasjacobsen/5399fa13e64bb32d3bdd

Those also look either relatively small, or unneeded (I don't know what constance and cab are). But it seems like it also deletes things like hera, which I also am not sure if we need or not.

A bit about why git bailed on merging those files though.

The way git does inexact rename matching is by trying to compare the new file against all old files in the repo, and see which file it's most similar to. If it can't find a file that it thinks it's similar enough to, it won't try to "rename" it. Additionally, if it finds multiple files that it's equally similar to, it won't rename it.

In this case, I'm not exactly sure which the reason is, but it's probably one of those two.

bishtgautam · 2015-12-09T16:32:22Z

@douglasjacobsen : Thanks for the detail explanation.

apcraig · 2015-12-09T17:58:37Z

@bishtgautam
I think the best thing would be to make sure the clm mods are implemented in both clm4 and clm4.5. there are some science issues associated with that, so you might want to contact ruby or hongyi as well to check. thanks.

bishtgautam · 2015-12-10T14:27:43Z

@apcraig : Thanks. I will add clm mods in both clm40 and clm45.

bishtgautam · 2015-12-10T17:04:13Z

@apcraig : It appears that this is a non-BFB PR. There are differences in x2l_Flrr_volr, r2x_Forr_rofl, and r2x_Flrr_volr. Are these changes expected?

Below are testlogs of four tests done on Titan.

https://gist.github.com/bishtgautam/6629b19c5dfa205f9778
https://gist.github.com/bishtgautam/efb0f43b05e97477d6b4
https://gist.github.com/bishtgautam/004d619bec201d726473
https://gist.github.com/bishtgautam/33f5bb9899b633a0996b

apcraig · 2015-12-10T17:09:49Z

We will need to look at the results carefully. There are some new coupling fields, that might be the reason the compare is flagging those fields. If the results are not bit-for-bit, then we should carefully review the mods. There were some changes to the way the clm fields are pass out of clm, fields were split into different terms. That could introduce a roundoff difference, but I think in my testing, that did not occur. Is there a way for me to help review the code changes?

bishtgautam · 2016-01-04T04:31:13Z

Hi @apcraig,

A fully coupled case (1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB test. The BFB failure wasn't unusual as we had already earlier determined that this PR had non-BFB code modifications. But, suprisingly the test still failed BFB even after I made above mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at /project/projectdirs/acme/gbisht/pr-526-mosart/cases:

SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:
- Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:
- Comparison with above generated baseline
- lnd_import_export.F90: With no modifications
- Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:
- Comparison with above generated baseline
- lnd_import_export.F90: With modifications as suggested by you
- Result: BFB failure

Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947 had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file in each case corresponds when the case was run with info_debug=2.

apcraig · 2016-01-05T03:44:41Z

i'll try to have a look in the next few days, but i may not get to it
this week.
i'm on reduced hours right now due to holidays.

tony........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:
SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure
Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.

—
Reply to this email directly or view it on GitHub
#526 (comment).

apcraig · 2016-01-08T02:44:17Z

gautam,

i want to be able to duplicate your tests. can you tell me exactly what
version of the model you're using, how you set it up, what changes you
made to source code or namelist and so forth, in detail. i will try to look
at the problem this weekend.

thanks,

tony..........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:
SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure
Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.

—
Reply to this email directly or view it on GitHub
#526 (comment).

bishtgautam · 2016-01-08T04:36:59Z

Hi Tony,

I generated the baseline with 055722c and performed comparison with 865bc59

>git log -n 15 --oneline --graph
*   865bc59 Merge branch 'apcraig/mosart/add-mosart' into next (PR #526)
|\  
| * 7d3a782 Reverts changes made to compiler and machine files
| * ec03922 Update mosart input file
| * 2aea37a Update mosart ChangeLog
| * b25233e Update mosart to mosart1_0_10
| * 5f404ff mosart code cleanup, add budget diagnostics, review history files
| * 780d433 Update mosart to mosart1_0_07 version
| * fb5f4a2 refactor sqrt of slope data in mosart
| * 51a2751 refactor mosart to improve performance and parallelization
| * b9a7086 Add initial WRM model to mosart and add NLDAS configuration
| * f05d5e5 Update mosart coupling to mosart03_cesm1_4_beta03 version
| * 0282da9 Add mosart runoff model
* |   055722c Merge branch 'agsalin/mpasli/albany-porting' into next (PR #414)
|\ \  
| * | 5941f88 Fix incorrect logic for velocity solver selection
| * | b6e02dc Add CXX_LIBS for Melvin so Albany links

Generation of baseline:

>git checkout 055722c
>cd cime/scripts-acme
>./create_test \
SMS.ne30_m120.A_B1850CN \
--baseline-root /global/cscratch1/sd/gbisht/acme_baselines \
-p acme -g -b 055722c -v --no-build
>cd <test-case-directory>
>cat > user_nl_cpl <<\EOF
info_debug=2
EOF
>./cesm_setup
>./*.build
>./*.submit

Comparison against baseline with no modifications to lnd_import_export.F90:

>git checkout 865bc59
>cd cime/scripts-acme
>./create_test \
SMS.ne30_m120.A_B1850CN \
--baseline-root /global/cscratch1/sd/gbisht/acme_baselines \
-p acme -c -b 055722c -v --no-build
>cd <test-case-directory>
>cat > user_nl_cpl <<\EOF
info_debug=2
EOF
>./cesm_setup
>./*.build
>./*.submit

Comparison against baseline with modifications to lnd_import_export.F90:

>git checkout 865bc59
>cd cime/scripts-acme
>./create_test \
SMS.ne30_m120.A_B1850CN \
--baseline-root /global/cscratch1/sd/gbisht/acme_baselines \
-p acme -c -b 055722c -v --no-build
>cd <test-case-directory>
>cat > user_nl_cpl <<\EOF
info_debug=2
EOF
>./cesm_setup

# Modifed lnd_import_export.F90 is available on Cori in SourceMods/src.clm of
# /project/projectdirs/acme/gbisht/pr-526-mosart/cases/SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947.tar.gz
> cp <modified-lnd_import_export.F90> SourceMods/src.clm
>./*.build
>./*.submit

apcraig · 2016-01-11T20:26:55Z

Gautam,

Can you check permissions on

/project/projectdirs/acme/gbisht/pr-526-mosart/cases

looks like pr-526-mosart is not readable.

tony........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:
SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure
Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.

—
Reply to this email directly or view it on GitHub
#526 (comment).

bishtgautam · 2016-01-11T21:01:02Z

Hi Tony, I have changed permissions to those directories. Please try again.

apcraig · 2016-01-11T21:42:09Z

looking at the log files, the first real divergence in "G" vs "C-155" is in the ocean model after the first ocean coupling period (hour 2). that difference seems to be very small in just a couple fields and is almost certainly roundoff. Based on the global sums, all the forcing TO the ocean model before that coupling period is the identical in the two runs. That likely means that there are a few gridpoints that are roundoff different in the ocean forcing that the global sums are not picking up, but that are producing a tiny difference in the ocean solution.

The other interesting thing is that "G" vs "C-142" is that the ocean model diverges only on the third hour. The difference between C-155 and C-142 is that in one case, the sum of the land fields is done on the land side and in the other it's done on the runoff side. This seems to be enough to change answers by roundoff with C-142 actually being closer (with respect to ocean forcing) with the G case.

I would argue that the lnd-runoff-ocean coupling in these cases is fine. The differences are tiny and consistent with roundoff. The changes to the sum that we hoped would be bit-for-bit are not in this case even though they were in other cases. That makes sense. That mod is not guaranteed to be bit-for-bit, the fact that is was before was helpful. What we're seeing now though is roundoff differences between the three cases in the lnd-rof-ocean coupling, probably just at a few gridcells, and that's
fine. In the end, these tests validate the land-runoff-ocean coupling at being consistent between the three runs.

Having said all that, the most worrisome issue is the fact that the atm diverges in all three simulations on the second hour in a way that's MUCH larger than the land-rof-ocean coupling. That seems to have nothing to do with the runoff coupling and I do not understand it. If you look at the 2 atm.log files in any run, you actually see that the two runs do NOT produce the same diagnostics bit-for-bit in the atm.log
files which suggests that the model is NOT reproducible in the atm model. That seems like a huge issue that is unrelated to the changes in the runoff coupling. That's true even in the G case.

So, in summary, I would say the land-runoff-ocean coupling is fine as implemented. We can see that coupling separately from the atm divergence in the log files for at least the first few coupling periods. The land-runoff-ocean solutions are nearly identical in the three cases, are initially roundoff different at a few gridcells and that divergence grows. That seems fairly clear. But separate from that, there seems to be a huge problem in the atm model with reproducibility unrelated to the land-runoff-ocean
coupling. That is a separate issue that should be investigated!!

Let me know if you have any questions. thanks,

tony........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:
SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure
Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.

—
Reply to this email directly or view it on GitHub
#526 (comment).

bishtgautam · 2016-01-11T22:32:18Z

But separate from that, there seems to be a
huge problem in the
atm model with reproducibility unrelated to the land-runoff-ocean
coupling.

The non-BFB differences in the atmosphere must be related to the merge of this branch, right? Could the use of MPAS-O be an issue? Could the small changes in the ocean be causing large changes in the atmosphere?

mt5555 · 2016-01-11T22:57:23Z

@bishtgautam : your comparison of 055722c with 865bc59 could be comparing models with different atmosphere? What about comparing 7d3a782 with it's branch point from master? (git has a command that will tell you the common parent of 7d3a782 and master).

Then you could isolate the BFB changes introduced just by this code. That comparison may remove the atmosphere differences Tony mentioned, and verify that the branch only has the expected BFB differences.

(just a suggestion based on reading Tony's message - please ignore if I'm missing something).

bishtgautam · 2016-01-11T23:14:00Z

055722c is one of the parent of 865bc59.

>git log -n 15 --oneline --graph
*   865bc59 Merge branch 'apcraig/mosart/add-mosart' into next (PR #526)
|\  
| * 7d3a782 Reverts changes made to compiler and machine files
| * ec03922 Update mosart input file
| * 2aea37a Update mosart ChangeLog
| * b25233e Update mosart to mosart1_0_10
| * 5f404ff mosart code cleanup, add budget diagnostics, review history files
| * 780d433 Update mosart to mosart1_0_07 version
| * fb5f4a2 refactor sqrt of slope data in mosart
| * 51a2751 refactor mosart to improve performance and parallelization
| * b9a7086 Add initial WRM model to mosart and add NLDAS configuration
| * f05d5e5 Update mosart coupling to mosart03_cesm1_4_beta03 version
| * 0282da9 Add mosart runoff model
* |   055722c Merge branch 'agsalin/mpasli/albany-porting' into next (PR #414)
|\ \  
| * | 5941f88 Fix incorrect logic for velocity solver selection
| * | b6e02dc Add CXX_LIBS for Melvin so Albany links

Since, no change to atmosphere code was made in this PR, I expect the atmosphere model to be exactly the same in 055722c and 865bc59. Thus, the non-BFB differences are baffling to me.

mt5555 · 2016-01-11T23:17:43Z

Are you sure? I read that graph as 055722c as a point on next that is not in the history of branch 7d3a782. (your branch is represented by the "|" at that point, while 055722c is refering to the "*" on next).

apcraig · 2016-01-11T23:20:14Z

My main concern is that each case was run twice, once with
info_debug set to 1 and another with it set to 2. These just
turn on some coupling diagnostics. These "identical" runs
were NOT producing identical results in the atmosphere
for either version of ACME. There are questions about whether
the parent and branch are the same. But a bigger question
first is whether either the parent or the branch generate
reproducible results when run more than once. Based on the
results I'm seeing from these tests, the answer is no. It would
be good to make sure someone else sees the same thing.

tony.......

On 1/11/16 3:14 PM, Gautam Bisht wrote:

055722c
055722c
is one of the parent of 865bc59
865bc59.

|>git log -n 15 --oneline --graph * 865bc59 Merge branch
'apcraig/mosart/add-mosart' into next (PR #526) |\ | * 7d3a782 Reverts
changes made to compiler and machine files | * ec03922 Update mosart
input file | * 2aea37a Update mosart ChangeLog | * b25233e Update
mosart to mosart1_0_10 | * 5f404ff mosart code cleanup, add budget
diagnostics, review history files | * 780d433 Update mosart to
mosart1_0_07 version | * fb5f4a2 refactor sqrt of slope data in mosart
| * 51a2751 refactor mosart to improve performance and parallelization
| * b9a7086 Add initial WRM model to mosart and add NLDAS
configuration | * f05d5e5 Update mosart coupling to
mosart03_cesm1_4_beta03 version | * 0282da9 Add mosart runoff model *
| 055722c Merge branch 'agsalin/mpasli/albany-porting' into next (PR
#414) |\ \ | * | 5941f88 Fix incorrect logic for velocity solver
selection | * | b6e02dc Add CXX_LIBS for Melvin so Albany links |

Since, no change to atmosphere code was made in this PR, I expect the
atmosphere model to be exactly the same in 055722c
055722c
and 865bc59
865bc59.
Thus, the non-BFB differences are baffling to me.

—
Reply to this email directly or view it on GitHub
#526 (comment).

bishtgautam · 2016-01-11T23:35:26Z

Hi Tony,

My main concern is that each case was run twice, once with
info_debug set to 1 and another with it set to 2. These just
turn on some coupling diagnostics. These "identical" runs
were NOT producing identical results in the atmosphere
for either version of ACME.

Ok, I now understand the point you are making.

But a bigger question
first is whether either the parent or the branch generate
reproducible results when run more than once.

To test this out, how about the using only 055722c to do the following:

Baseline_1: Generate the baseline with info_debug=1,
Baseline_2: Generate the baseline with info_debug=2,

Comparison_1: Compare with Baseline_1 with info_debug=1.
Comparison_2: Compare with Baseline_2 with info_debug=2.
Comparison_3: Compare with Baseline_1 with info_debug=2.
Comparison_4: Compare with Baseline_2 with info_debug=1.

What do you think?

apcraig · 2016-01-11T23:39:31Z

info_debug should not change answers. if it does, there is
something wrong. I think running the parent 4-6 times with
info_debug=1 and info_debug=2 and comparing them all would
be a good test to do right now.

baseline_1 and baseline_2 should be identical and in some ways,
you've already done those and they aren't. It will be good to see
if after 4-6 runs, each solution is different or if it's reproducible
with respect to at least info_debug. Keep me posted.

thanks,

tony......

On 1/11/16 3:35 PM, Gautam Bisht wrote:

Hi Tony,
My main concern is that each case was run twice, once with
info_debug set to 1 and another with it set to 2. These just
turn on some coupling diagnostics. These "identical" runs
were NOT producing identical results in the atmosphere
for either version of ACME.
Ok, I now understand the point you are making.
But a bigger question
first is whether either the parent or the branch generate
reproducible results when run more than once.
To test this out, how about the using only 055722c
055722c
to do the following:

Baseline_1: Generate the baseline with |info_debug=1|,
Baseline_2: Generate the baseline with |info_debug=2|,

Comparison_1: Compare with Baseline_1 with |info_debug=1|.
Comparison_2: Compare with Baseline_2 with |info_debug=2|.
Comparison_3: Compare with Baseline_1 with |info_debug=2|.
Comparison_4: Compare with Baseline_2 with |info_debug=1|.

What do you think?

—
Reply to this email directly or view it on GitHub
#526 (comment).

bishtgautam · 2016-01-11T23:43:24Z

@mt5555

I read that graph as 055722c as a point on next that is not in the history of branch 7d3a782.
I agree with the above statement.

And, since only 055722c and 865bc59 are on next, we should be generating and comparing using those commits.

mt5555 · 2016-01-11T23:48:02Z

Thanks - I understand now :-)

055722c vs 865bc59 is a much better idea than what I was suggesting.

rljacob · 2016-01-12T00:28:33Z

One related issue that may be relevant is that coupled cases that still used POP and CICE did not have any BFB differences with baselines after adding the mosart pieces. That's what I recall from the cdash results although its hard to trace. Might be worth checking BFB with baselines using the above 2 commits, an 1850_CAM5_CLM45%CN_CICE_POP2_RTM_SGLC_SWAV compset and ne30_g16 resolution.

The atmosphere consistently passes ERS tests which should catch any problems with reproducibility.

bishtgautam · 2016-01-13T17:21:28Z

Hi Tony, My jobs on Cori has been sitting in the queue. Today I'm moving my testing to Titan and will keep you updated.

These changes add a new model, mosart, as a runoff model. The changes to the scripts and coupling support those changes. In addition to the new source code under mosart, there were also small changes to rtm for consistency and a few changes to the coupling fields. A handful of new compsets were created to run with mosart, but additional compsets will be required as needed for science. LG-20 [non-BFB] Conflicts: cime/components/data_comps/datm/bld/namelist_files/namelist_definition_datm.xml cime/scripts/Tools/config_compsets.xml components/clm/bld/namelist_files/namelist_defaults_clm4_0.xml models/lnd/clm/src/cpl/lnd_import_export.F90 scripts/ccsm_utils/Case.template/config_definition.xml

bishtgautam · 2016-01-14T19:03:03Z

@jgfouca : With this PR merged into master, I expect the following tests to fails on master during our nightly tests:

SMS.f19_f19.I1850CLM45CN
SMS.f09_g16.I1850CLM45CN
SMS.hcru_hcru.I1850CRUCLM45CN
SMS.f09_g16_a.IGCLM45_MLI
PET_PT.f19_g16.X
SMS.ne30_m120.A_B1850CN

jgfouca · 2016-01-14T19:04:06Z

With a DIFF?

bishtgautam · 2016-01-14T19:04:34Z

Yes, with DIFF

These changes add a new model, mosart, as a runoff model. The changes to the scripts and coupling support those changes. In addition to the new source code under mosart, there were also small changes to rtm for consistency and a few changes to the coupling fields. A handful of new compsets were created to run with mosart, but additional compsets will be required as needed for science. LG-20 [non-BFB] Conflicts: cime/components/data_comps/datm/bld/namelist_files/namelist_definition_datm.xml cime/scripts/Tools/config_compsets.xml components/clm/bld/namelist_files/namelist_defaults_clm4_0.xml models/lnd/clm/src/cpl/lnd_import_export.F90 scripts/ccsm_utils/Case.template/config_definition.xml

Changes to F90 files are suspect. I just went with the CSEG side for all conflicts. Config archive was changed to support MOSART on both sides, but in very different ways. Again, went with CSEG. * acme_master: (25 commits) bless_test_results: Finally add a full regression test for this tool Fix reference to unassigned variable. Better use of return codes in key ACME tools. Fix failing regression test. Add gnu modules to env for redsky and skybridge Add total test time to CDash field 'OS Version' Merge branch 'apcraig/mosart/add-mosart' (PR #526) Fixing odd number of tasks to handle Further fixes to the PEA test Add dynamic ozone treatment to ATMMOD compset Update r2o mapping file for ne30_ec60 grid Fixing link issues on Cetus+Mira after Albany build changes Turn on baseline comparison for redsky Implementation of linoz_mam4_resus_mom and hooks for v1. Merge branch 'singhbalwinder/atm/gold2718-new-infrastructure'(PR #343) Fixing PEA_P1_M.f45_g37_rx1.A.edison_intel Updates to BUILD_THREADED Updating default walltimes to use Edison's values Slurm time and partition directives Fixing cray-mpich module version to 7.3.0 ... Conflicts: driver_cpl/driver/prep_rof_mod.F90 driver_cpl/driver/seq_diag_mct.F90 driver_cpl/shr/seq_flds_mod.F90 machines/config_pes.xml scripts-python/jenkins_generic_job scripts/Testing/Testcases/config_tests.xml scripts/Tools/case.build scripts/Tools/cesm_setup scripts/Tools/config_archive.xml scripts/Tools/config_definition.xml scripts/Tools/config_grid.xml scripts/Tools/st_archive scripts/create_newcase utils/perl5lib/Batch/BatchMaker.pm

apcraig added 10 commits June 6, 2015 15:01

refactor sqrt of slope data in mosart

fb5f4a2

[BFB]

Update mosart to mosart1_0_10

b25233e

new input file new direct terms new areatot update history files [Non-BFB] LG-20

Update mosart ChangeLog

2aea37a

[BFB] LG-20

Update mosart input file

ec03922

to MOSART_Global_half_20151205a.nc [Non-BFB] LG-20

apcraig added enhancement Coupler compset BFB PR leaves answers BFB River labels Dec 8, 2015

apcraig assigned rljacob Dec 8, 2015

rljacob assigned bishtgautam and unassigned rljacob Dec 8, 2015

rljacob added this to the v1.0 Alpha milestone Dec 8, 2015

Reverts changes made to compiler and machine files

7d3a782

- The files are revert to ba21605, which was the hash of the master when this feature branch was created.

jgfouca force-pushed the master branch from cb31a69 to fbc2bb2 Compare January 14, 2016 00:37

bishtgautam merged commit d0e97d6 into master Jan 14, 2016

rljacob mentioned this pull request Jan 20, 2016

Diffs from baseline in SMS.ne30_m120.A_B1850CN after adding Mosart #634

Closed

jgfouca deleted the apcraig/mosart/add-mosart branch August 31, 2016 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the mosart runoff model #526

Add the mosart runoff model #526

apcraig commented Dec 8, 2015

bishtgautam commented Dec 8, 2015

apcraig commented Dec 8, 2015

bishtgautam commented Dec 8, 2015

bishtgautam commented Dec 9, 2015

douglasjacobsen commented Dec 9, 2015

bishtgautam commented Dec 9, 2015

apcraig commented Dec 9, 2015

bishtgautam commented Dec 10, 2015

bishtgautam commented Dec 10, 2015

apcraig commented Dec 10, 2015

bishtgautam commented Jan 4, 2016

apcraig commented Jan 5, 2016

apcraig commented Jan 8, 2016

bishtgautam commented Jan 8, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

mt5555 commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

mt5555 commented Jan 11, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

mt5555 commented Jan 11, 2016

rljacob commented Jan 12, 2016

bishtgautam commented Jan 13, 2016

bishtgautam commented Jan 14, 2016

jgfouca commented Jan 14, 2016

bishtgautam commented Jan 14, 2016

Add the mosart runoff model #526

Add the mosart runoff model #526

Conversation

apcraig commented Dec 8, 2015

bishtgautam commented Dec 8, 2015

apcraig commented Dec 8, 2015

bishtgautam commented Dec 8, 2015

bishtgautam commented Dec 9, 2015

douglasjacobsen commented Dec 9, 2015

bishtgautam commented Dec 9, 2015

apcraig commented Dec 9, 2015

bishtgautam commented Dec 10, 2015

bishtgautam commented Dec 10, 2015

apcraig commented Dec 10, 2015

bishtgautam commented Jan 4, 2016

apcraig commented Jan 5, 2016

apcraig commented Jan 8, 2016

bishtgautam commented Jan 8, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

mt5555 commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

mt5555 commented Jan 11, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

apcraig commented Jan 11, 2016

bishtgautam commented Jan 11, 2016

mt5555 commented Jan 11, 2016

rljacob commented Jan 12, 2016

bishtgautam commented Jan 13, 2016

bishtgautam commented Jan 14, 2016

jgfouca commented Jan 14, 2016

bishtgautam commented Jan 14, 2016