Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the mosart runoff model #526

Merged
merged 12 commits into from
Jan 14, 2016
Merged

Add the mosart runoff model #526

merged 12 commits into from
Jan 14, 2016

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Dec 8, 2015

These changes add a new model, mosart, as a runoff model.
The changes to the scripts and coupling support those changes.
In addition to the new source code under mosart, there were also
small changes to rtm for consistency and a few changes to the coupling fields.

A handful of new compsets were created to run with mosart, but
additional compsets will be required as needed for science.

LG-20
[non-BFB]

apcraig added 10 commits June 6, 2015 15:01
Add the mosart runoff model and two new land->runoff coupling fields.

This version of mosart is identical to mosart1_0_00 in the NCAR
CESM repository and the modifications match the changes in
the CESM exp_tag mosart01_cesm1_3_beta04.

Added the land to runoff coupling fields Flrl_rofgwl and
Flrl_rofsub.  These are glacier/wetland/lake runoff and
subsurface runoff.  Both are liquid water fluxes.  In CLM, these
are currently set to zero.  The two fields are passed through the
coupler with Flrl_rofl and Flrl_rofi.  Both rtm and mosart receive
these fields and add them to the rofl flux term at present.  This
commit provides the ability to run mosart and to couple the two
new fields.  Some science development is still required in CLM and
mosart to fully leverage the new capabilities.

Three new compsets were added to support testing of mosart, IMCN,
BM1850C5, and BM1850CN.  These are identical to ICN, B1850C5,
and B1850CN except mosart is used instead of rtm.

Development and testing was done on titan only.  The following tests
were run to verify technical correctness of the mosart implementation.
 *CME.T31_g37.IMCN.titan_pgi
 *ERS_D.T31_g37.BM1850C5.titan_pgi
 *ERS.T31_g37.BM1850C5.titan_pgi
 *NCK.T31_g37.BM1850C5.titan_pgi
 *NCK.T31_g37.IMCN.titan_pgi
 *SMS_D_E.T31_g37.IMCN.titan_pgi
 *SMS_E.T31_g37.IMCN.titan_pgi
 *SMS.T31_g37.IMCN.titan_pgi
 *CME.f19_g16.X.titan_pgi
And the acme-development test suite was also run and shown to be
BFB with the starting ACME version.

There is also a bug fix in component_mod.F90 to fix an error in
the esmf coupling interface that is unrelated to mosart.

[BFB]
This change syncs up the ACME+mosart version with the
CESM mosart03_cesm1_4_beta03 version.  In this version, mosart
exact restart is fixed, Flrl_rofl is renamed to Flrl_rofsur,
and Flrl_volrmch is added.  There are a few other changes
to mosart science as well.

These changes produce identical bit-for-bit results with respect
to climate.  Some of the runoff coupling field names have been
changed and a few new coupling fields have been added.  This
may impact the comparison results, but the results should be
identical when comparing.

LG-19

*********1*********2*********3*********4*********5*********6*********7**
Longer commit message body describing the commit.
Can contain lists as follows:
	* Item 1
	* Item 2
	* Item 3

A good commit message should be written like an email, a subject
followed by a blank line, followed by a more descriptive body.

Can also contain a tag at the bottom describing if the commit is
Non-BFB or Climate changing:
[Non-BFB]
[CC]
* Add WRM source code
*    new file:   models/rof/mosart/src/wrm/WRM_modules.F90
*    new file:   models/rof/mosart/src/wrm/WRM_read_print.F90
*    new file:   models/rof/mosart/src/wrm/WRM_returnflow.F90
*    new file:   models/rof/mosart/src/wrm/WRM_start_op_year.F90
*    new file:   models/rof/mosart/src/wrm/WRM_subw_IO_mod.F90
*    new file:   models/rof/mosart/src/wrm/WRM_type_mod.F90
* Add NLDAS resolution, datm mode, and IM_2000_CN_NLDAS compset
* Add constance machine port

WRM is not yet fully validated and is off by default.
This is just an initial commit to provide NLDAS capability
and to allow some code review.
This is BFB with prior versions with WRM off.
fix parallelization bug in nUp sum for non-basin decompositions
refactor mosart to improve performance
 - loop restructuring
 - reduce number of operations
 - improve performance of expensive math ops (sqrt, **, sin)

Non-BFB or Climate changing:
[Non-BFB]
- code cleanup of rtmini and rtmrun
- works with mosart input files with scrambled IDs
- moved dto term into rtmrun
- added direct-to-outlet tranfer capability
- removed a bunch of old rtm code
- fixed esmf interfaces and tested in DEBUG mode
- added budget calculation (still being validated)
- has a known exact restart error that introduces a roundoff
  difference at the first timestep at a handful of gridcells.
  This is probably not going to impact science, will be fixed
  next.

[Non-BFB]

LG-20
*********1*********2*********3*********4*********5*********6*********7**

Code cleanup to remove dead code and make the code clearer.  More could
be done.

Update the direct output computation.

Add budget diagnostics to verify conservation.  In short tests, the
model conserves except for in the frozen term in the euler solver.
There is still a small issue there that needs further diagnosing.

Review history and restart files and checked many fields.

Updated the MOSART input file to adjust the rotated Antarctica.

LG-20

[Non-BFB]
new input file
new direct terms
new areatot
update history files

[Non-BFB]

LG-20
to MOSART_Global_half_20151205a.nc

[Non-BFB]

LG-20
@bishtgautam
Copy link
Contributor

Is this PR BFB?

@rljacob rljacob added this to the v1.0 Alpha milestone Dec 8, 2015
@apcraig
Copy link
Contributor Author

apcraig commented Dec 8, 2015

It should be BFB for configurations that don't use mosart. mosart is not currently part of the standard test suite. There may be some differences because we modified some of the coupling fields' names, but the results should ultimately be identically when running rtm. When you use mosart, you will have a new configuration with no baselines, and mosart will not be bit-for-bit with rtm.

@bishtgautam
Copy link
Contributor

Added 'BFB'.

@bishtgautam
Copy link
Contributor

@jgfouca , @rljacob, @douglasjacobsen:

The merge of this PR produces relatively small number of conflicts but I have few issues that I need help with.

  • The following gist shows the status after merge of this branch into next https://gist.github.com/bishtgautam/d0b804c0f2d272d14e51.
  • In this feature branch, there were few changes made to config_compilers.xml and config_machines.xml. I'm considering reverting all changes made to config_compilers.xml and config_machines.xml in this branch. What do you think?
  • Why git bailed on automatically merging changes made to the following files?
    • models/lnd/clm/src/cpl/lnd_import_export.F90: Is it because next now has two lnd_import_export.F90 (i.e. models/lnd/clm/src/cpl/lnd_import_export.F90
      and /models/lnd/clm/src_clm40/main/lnd_import_export.F90)?
    • scripts/ccsm_utils/Case.template/config_definition.xml: Is it because cime/scripts/Tools/config_definition.xml doesn't directly map back to scripts/ccsm_utils/Case.template/config_definition.xml?

@apcraig : I'm trying to determine if the changes made to models/lnd/clm/src/cpl/lnd_import_export.F90 should be made to both models/lnd/clm/src/cpl/lnd_import_export.F90
and /models/lnd/clm/src_clm40/main/lnd_import_export.F90. Is MOSART compatible with CLM4.0 and CLM4.5?

- The files are revert to ba21605, which was the hash of
  the master when this feature branch was created.
@douglasjacobsen
Copy link
Member

@bishtgautam:

So, first you'll need to git mv models/rof/mosart components/mosart

But for the other conflicts... I can't tell completely, but it looks like the lnd_import_export.F90 file is more like the CLM4.0 version than the CLM4.5 version.

I think the changes in config_definition.xml should be pretty easy to "re-do" in the merge, in the right place. It looks like this is the diff: https://gist.github.com/douglasjacobsen/3dda495c3234140507a5

Here is the diff for the machines changes (i.e. config_compilers.xml, etc) https://gist.github.com/douglasjacobsen/5399fa13e64bb32d3bdd

Those also look either relatively small, or unneeded (I don't know what constance and cab are). But it seems like it also deletes things like hera, which I also am not sure if we need or not.

A bit about why git bailed on merging those files though.

The way git does inexact rename matching is by trying to compare the new file against all old files in the repo, and see which file it's most similar to. If it can't find a file that it thinks it's similar enough to, it won't try to "rename" it. Additionally, if it finds multiple files that it's equally similar to, it won't rename it.

In this case, I'm not exactly sure which the reason is, but it's probably one of those two.

@bishtgautam
Copy link
Contributor

@douglasjacobsen : Thanks for the detail explanation.

@apcraig
Copy link
Contributor Author

apcraig commented Dec 9, 2015

@bishtgautam
I think the best thing would be to make sure the clm mods are implemented in both clm4 and clm4.5. there are some science issues associated with that, so you might want to contact ruby or hongyi as well to check. thanks.

@bishtgautam
Copy link
Contributor

@apcraig : Thanks. I will add clm mods in both clm40 and clm45.

@bishtgautam
Copy link
Contributor

@apcraig : It appears that this is a non-BFB PR. There are differences in x2l_Flrr_volr, r2x_Forr_rofl, and r2x_Flrr_volr. Are these changes expected?

Below are testlogs of four tests done on Titan.

https://gist.github.com/bishtgautam/6629b19c5dfa205f9778
https://gist.github.com/bishtgautam/efb0f43b05e97477d6b4
https://gist.github.com/bishtgautam/004d619bec201d726473
https://gist.github.com/bishtgautam/33f5bb9899b633a0996b

@apcraig
Copy link
Contributor Author

apcraig commented Dec 10, 2015

We will need to look at the results carefully. There are some new coupling fields, that might be the reason the compare is flagging those fields. If the results are not bit-for-bit, then we should carefully review the mods. There were some changes to the way the clm fields are pass out of clm, fields were split into different terms. That could introduce a roundoff difference, but I think in my testing, that did not occur. Is there a way for me to help review the code changes?

@bishtgautam
Copy link
Contributor

Hi @apcraig,

A fully coupled case (1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB test. The BFB failure wasn't unusual as we had already earlier determined that this PR had non-BFB code modifications. But, suprisingly the test still failed BFB even after I made above mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at /project/projectdirs/acme/gbisht/pr-526-mosart/cases:

  • SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:
    • Generation of baseline
  • SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:
    • Comparison with above generated baseline
    • lnd_import_export.F90: With no modifications
    • Result: BFB failure
  • SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:
    • Comparison with above generated baseline
    • lnd_import_export.F90: With modifications as suggested by you
    • Result: BFB failure

Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947 had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file in each case corresponds when the case was run with info_debug=2.

@apcraig
Copy link
Contributor Author

apcraig commented Jan 5, 2016

i'll try to have a look in the next few days, but i may not get to it
this week.
i'm on reduced hours right now due to holidays.

tony........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:

SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure

Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.


Reply to this email directly or view it on GitHub
#526 (comment).

@apcraig
Copy link
Contributor Author

apcraig commented Jan 8, 2016

gautam,

i want to be able to duplicate your tests. can you tell me exactly what
version of the model you're using, how you set it up, what changes you
made to source code or namelist and so forth, in detail. i will try to look
at the problem this weekend.

thanks,

tony..........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:

SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure

Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.


Reply to this email directly or view it on GitHub
#526 (comment).

@bishtgautam
Copy link
Contributor

Hi Tony,

I generated the baseline with 055722c and performed comparison with 865bc59

>git log -n 15 --oneline --graph
*   865bc59 Merge branch 'apcraig/mosart/add-mosart' into next (PR #526)
|\  
| * 7d3a782 Reverts changes made to compiler and machine files
| * ec03922 Update mosart input file
| * 2aea37a Update mosart ChangeLog
| * b25233e Update mosart to mosart1_0_10
| * 5f404ff mosart code cleanup, add budget diagnostics, review history files
| * 780d433 Update mosart to mosart1_0_07 version
| * fb5f4a2 refactor sqrt of slope data in mosart
| * 51a2751 refactor mosart to improve performance and parallelization
| * b9a7086 Add initial WRM model to mosart and add NLDAS configuration
| * f05d5e5 Update mosart coupling to mosart03_cesm1_4_beta03 version
| * 0282da9 Add mosart runoff model
* |   055722c Merge branch 'agsalin/mpasli/albany-porting' into next (PR #414)
|\ \  
| * | 5941f88 Fix incorrect logic for velocity solver selection
| * | b6e02dc Add CXX_LIBS for Melvin so Albany links
  • Generation of baseline:
>git checkout 055722c
>cd cime/scripts-acme
>./create_test \
SMS.ne30_m120.A_B1850CN \
--baseline-root /global/cscratch1/sd/gbisht/acme_baselines \
-p acme -g -b 055722c -v --no-build
>cd <test-case-directory>
>cat > user_nl_cpl <<\EOF
info_debug=2
EOF
>./cesm_setup
>./*.build
>./*.submit
  • Comparison against baseline with no modifications to lnd_import_export.F90:
>git checkout 865bc59
>cd cime/scripts-acme
>./create_test \
SMS.ne30_m120.A_B1850CN \
--baseline-root /global/cscratch1/sd/gbisht/acme_baselines \
-p acme -c -b 055722c -v --no-build
>cd <test-case-directory>
>cat > user_nl_cpl <<\EOF
info_debug=2
EOF
>./cesm_setup
>./*.build
>./*.submit
  • Comparison against baseline with modifications to lnd_import_export.F90:
>git checkout 865bc59
>cd cime/scripts-acme
>./create_test \
SMS.ne30_m120.A_B1850CN \
--baseline-root /global/cscratch1/sd/gbisht/acme_baselines \
-p acme -c -b 055722c -v --no-build
>cd <test-case-directory>
>cat > user_nl_cpl <<\EOF
info_debug=2
EOF
>./cesm_setup

# Modifed lnd_import_export.F90 is available on Cori in SourceMods/src.clm of
# /project/projectdirs/acme/gbisht/pr-526-mosart/cases/SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947.tar.gz
> cp <modified-lnd_import_export.F90> SourceMods/src.clm
>./*.build
>./*.submit

@apcraig
Copy link
Contributor Author

apcraig commented Jan 11, 2016

Gautam,

Can you check permissions on

/project/projectdirs/acme/gbisht/pr-526-mosart/cases

looks like pr-526-mosart is not readable.

tony........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:

SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure

Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.


Reply to this email directly or view it on GitHub
#526 (comment).

@bishtgautam
Copy link
Contributor

Hi Tony, I have changed permissions to those directories. Please try again.

@apcraig
Copy link
Contributor Author

apcraig commented Jan 11, 2016

looking at the log files, the first real divergence in "G" vs "C-155" is in the ocean model after the first ocean coupling period (hour 2). that difference seems to be very small in just a couple fields and is almost certainly roundoff. Based on the global sums, all the forcing TO the ocean model before that coupling period is the identical in the two runs. That likely means that there are a few gridpoints that are roundoff different in the ocean forcing that the global sums are not picking up, but that are producing a tiny difference in the ocean solution.

The other interesting thing is that "G" vs "C-142" is that the ocean model diverges only on the third hour. The difference between C-155 and C-142 is that in one case, the sum of the land fields is done on the land side and in the other it's done on the runoff side. This seems to be enough to change answers by roundoff with C-142 actually being closer (with respect to ocean forcing) with the G case.

I would argue that the lnd-runoff-ocean coupling in these cases is fine. The differences are tiny and consistent with roundoff. The changes to the sum that we hoped would be bit-for-bit are not in this case even though they were in other cases. That makes sense. That mod is not guaranteed to be bit-for-bit, the fact that is was before was helpful. What we're seeing now though is roundoff differences between the three cases in the lnd-rof-ocean coupling, probably just at a few gridcells, and that's
fine. In the end, these tests validate the land-runoff-ocean coupling at being consistent between the three runs.

Having said all that, the most worrisome issue is the fact that the atm diverges in all three simulations on the second hour in a way that's MUCH larger than the land-rof-ocean coupling. That seems to have nothing to do with the runoff coupling and I do not understand it. If you look at the 2 atm.log files in any run, you actually see that the two runs do NOT produce the same diagnostics bit-for-bit in the atm.log
files which suggests that the model is NOT reproducible in the atm model. That seems like a huge issue that is unrelated to the changes in the runoff coupling. That's true even in the G case.

So, in summary, I would say the land-runoff-ocean coupling is fine as implemented. We can see that coupling separately from the atm divergence in the log files for at least the first few coupling periods. The land-runoff-ocean solutions are nearly identical in the three cases, are initially roundoff different at a few gridcells and that divergence grows. That seems fairly clear. But separate from that, there seems to be a huge problem in the atm model with reproducibility unrelated to the land-runoff-ocean
coupling. That is a separate issue that should be investigated!!

Let me know if you have any questions. thanks,

tony........

On 1/3/16 8:31 PM, Gautam Bisht wrote:

Hi @apcraig https://github.com/apcraig,

A fully coupled case
(1850_CAM5_CLM45%CN_MPASCICE%SSMI_MPASO_RTM_SGLC_SWAV) failed BFB
test. The BFB failure wasn't unusual as we had already earlier
determined that this PR had non-BFB code modifications. But,
suprisingly the test still failed BFB even after I made above
mentioned modifications to lnd_import_export.F90.

The tar files for cases are available on NERSC at
/project/projectdirs/acme/gbisht/pr-526-mosart/cases:

SMS.ne30_m120.A_B1850CN.corip1_intel.G.20151223_053832:

  o Generation of baseline
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_142909:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With no modifications
  o Result: BFB failure
SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947:

  o Comparison with above generated baseline
  o lnd_import_export.F90: With modifications as suggested by you
  o Result: BFB failure

Any idea why SMS.ne30_m120.A_B1850CN.corip1_intel.C.20151223_155947
had a BFB failure?

Note: Each of the three cases have two logfiles. The second log file
in each case corresponds when the case was run with info_debug=2.


Reply to this email directly or view it on GitHub
#526 (comment).

@bishtgautam
Copy link
Contributor

But separate from that, there seems to be a
huge problem in the
atm model with reproducibility unrelated to the land-runoff-ocean
coupling.

The non-BFB differences in the atmosphere must be related to the merge of this branch, right? Could the use of MPAS-O be an issue? Could the small changes in the ocean be causing large changes in the atmosphere?

@mt5555
Copy link
Contributor

mt5555 commented Jan 11, 2016

@bishtgautam : your comparison of 055722c with 865bc59 could be comparing models with different atmosphere? What about comparing 7d3a782 with it's branch point from master? (git has a command that will tell you the common parent of 7d3a782 and master).

Then you could isolate the BFB changes introduced just by this code. That comparison may remove the atmosphere differences Tony mentioned, and verify that the branch only has the expected BFB differences.

(just a suggestion based on reading Tony's message - please ignore if I'm missing something).

@bishtgautam
Copy link
Contributor

055722c is one of the parent of 865bc59.

>git log -n 15 --oneline --graph
*   865bc59 Merge branch 'apcraig/mosart/add-mosart' into next (PR #526)
|\  
| * 7d3a782 Reverts changes made to compiler and machine files
| * ec03922 Update mosart input file
| * 2aea37a Update mosart ChangeLog
| * b25233e Update mosart to mosart1_0_10
| * 5f404ff mosart code cleanup, add budget diagnostics, review history files
| * 780d433 Update mosart to mosart1_0_07 version
| * fb5f4a2 refactor sqrt of slope data in mosart
| * 51a2751 refactor mosart to improve performance and parallelization
| * b9a7086 Add initial WRM model to mosart and add NLDAS configuration
| * f05d5e5 Update mosart coupling to mosart03_cesm1_4_beta03 version
| * 0282da9 Add mosart runoff model
* |   055722c Merge branch 'agsalin/mpasli/albany-porting' into next (PR #414)
|\ \  
| * | 5941f88 Fix incorrect logic for velocity solver selection
| * | b6e02dc Add CXX_LIBS for Melvin so Albany links

Since, no change to atmosphere code was made in this PR, I expect the atmosphere model to be exactly the same in 055722c and 865bc59. Thus, the non-BFB differences are baffling to me.

@mt5555
Copy link
Contributor

mt5555 commented Jan 11, 2016

Are you sure? I read that graph as 055722c as a point on next that is not in the history of branch 7d3a782. (your branch is represented by the "|" at that point, while 055722c is refering to the "*" on next).

@apcraig
Copy link
Contributor Author

apcraig commented Jan 11, 2016

My main concern is that each case was run twice, once with
info_debug set to 1 and another with it set to 2. These just
turn on some coupling diagnostics. These "identical" runs
were NOT producing identical results in the atmosphere
for either version of ACME. There are questions about whether
the parent and branch are the same. But a bigger question
first is whether either the parent or the branch generate
reproducible results when run more than once. Based on the
results I'm seeing from these tests, the answer is no. It would
be good to make sure someone else sees the same thing.

tony.......

On 1/11/16 3:14 PM, Gautam Bisht wrote:

055722c
055722c
is one of the parent of 865bc59
865bc59.

|>git log -n 15 --oneline --graph * 865bc59 Merge branch
'apcraig/mosart/add-mosart' into next (PR #526) |\ | * 7d3a782 Reverts
changes made to compiler and machine files | * ec03922 Update mosart
input file | * 2aea37a Update mosart ChangeLog | * b25233e Update
mosart to mosart1_0_10 | * 5f404ff mosart code cleanup, add budget
diagnostics, review history files | * 780d433 Update mosart to
mosart1_0_07 version | * fb5f4a2 refactor sqrt of slope data in mosart
| * 51a2751 refactor mosart to improve performance and parallelization
| * b9a7086 Add initial WRM model to mosart and add NLDAS
configuration | * f05d5e5 Update mosart coupling to
mosart03_cesm1_4_beta03 version | * 0282da9 Add mosart runoff model *
| 055722c Merge branch 'agsalin/mpasli/albany-porting' into next (PR
#414) |\ \ | * | 5941f88 Fix incorrect logic for velocity solver
selection | * | b6e02dc Add CXX_LIBS for Melvin so Albany links |

Since, no change to atmosphere code was made in this PR, I expect the
atmosphere model to be exactly the same in 055722c
055722c
and 865bc59
865bc59.
Thus, the non-BFB differences are baffling to me.


Reply to this email directly or view it on GitHub
#526 (comment).

@bishtgautam
Copy link
Contributor

Hi Tony,

My main concern is that each case was run twice, once with
info_debug set to 1 and another with it set to 2. These just
turn on some coupling diagnostics. These "identical" runs
were NOT producing identical results in the atmosphere
for either version of ACME.

Ok, I now understand the point you are making.

But a bigger question
first is whether either the parent or the branch generate
reproducible results when run more than once.

To test this out, how about the using only 055722c to do the following:

Baseline_1: Generate the baseline with info_debug=1,
Baseline_2: Generate the baseline with info_debug=2,

Comparison_1: Compare with Baseline_1 with info_debug=1.
Comparison_2: Compare with Baseline_2 with info_debug=2.
Comparison_3: Compare with Baseline_1 with info_debug=2.
Comparison_4: Compare with Baseline_2 with info_debug=1.

What do you think?

@apcraig
Copy link
Contributor Author

apcraig commented Jan 11, 2016

info_debug should not change answers. if it does, there is
something wrong. I think running the parent 4-6 times with
info_debug=1 and info_debug=2 and comparing them all would
be a good test to do right now.

baseline_1 and baseline_2 should be identical and in some ways,
you've already done those and they aren't. It will be good to see
if after 4-6 runs, each solution is different or if it's reproducible
with respect to at least info_debug. Keep me posted.

thanks,

tony......

On 1/11/16 3:35 PM, Gautam Bisht wrote:

Hi Tony,

My main concern is that each case was run twice, once with
info_debug set to 1 and another with it set to 2. These just
turn on some coupling diagnostics. These "identical" runs
were NOT producing identical results in the atmosphere
for either version of ACME.

Ok, I now understand the point you are making.

But a bigger question
first is whether either the parent or the branch generate
reproducible results when run more than once.

To test this out, how about the using only 055722c
055722c
to do the following:

Baseline_1: Generate the baseline with |info_debug=1|,
Baseline_2: Generate the baseline with |info_debug=2|,

Comparison_1: Compare with Baseline_1 with |info_debug=1|.
Comparison_2: Compare with Baseline_2 with |info_debug=2|.
Comparison_3: Compare with Baseline_1 with |info_debug=2|.
Comparison_4: Compare with Baseline_2 with |info_debug=1|.

What do you think?


Reply to this email directly or view it on GitHub
#526 (comment).

@bishtgautam
Copy link
Contributor

@mt5555

I read that graph as 055722c as a point on next that is not in the history of branch 7d3a782.
I agree with the above statement.

And, since only 055722c and 865bc59 are on next, we should be generating and comparing using those commits.

@mt5555
Copy link
Contributor

mt5555 commented Jan 11, 2016

Thanks - I understand now :-)

055722c vs 865bc59 is a much better idea than what I was suggesting.

@rljacob
Copy link
Member

rljacob commented Jan 12, 2016

One related issue that may be relevant is that coupled cases that still used POP and CICE did not have any BFB differences with baselines after adding the mosart pieces. That's what I recall from the cdash results although its hard to trace. Might be worth checking BFB with baselines using the above 2 commits, an 1850_CAM5_CLM45%CN_CICE_POP2_RTM_SGLC_SWAV compset and ne30_g16 resolution.

The atmosphere consistently passes ERS tests which should catch any problems with reproducibility.

@bishtgautam
Copy link
Contributor

Hi Tony, My jobs on Cori has been sitting in the queue. Today I'm moving my testing to Titan and will keep you updated.

@bishtgautam bishtgautam merged commit d0e97d6 into master Jan 14, 2016
bishtgautam pushed a commit that referenced this pull request Jan 14, 2016
These changes add a new model, mosart, as a runoff model.
The changes to the scripts and coupling support those changes.
In addition to the new source code under mosart, there were also
small changes to rtm for consistency and a few changes to the coupling fields.

A handful of new compsets were created to run with mosart, but
additional compsets will be required as needed for science.

LG-20
[non-BFB]

Conflicts:
	cime/components/data_comps/datm/bld/namelist_files/namelist_definition_datm.xml
	cime/scripts/Tools/config_compsets.xml
	components/clm/bld/namelist_files/namelist_defaults_clm4_0.xml
	models/lnd/clm/src/cpl/lnd_import_export.F90
	scripts/ccsm_utils/Case.template/config_definition.xml
@bishtgautam
Copy link
Contributor

@jgfouca : With this PR merged into master, I expect the following tests to fails on master during our nightly tests:

  1. SMS.f19_f19.I1850CLM45CN
  2. SMS.f09_g16.I1850CLM45CN
  3. SMS.hcru_hcru.I1850CRUCLM45CN
  4. SMS.f09_g16_a.IGCLM45_MLI
  5. PET_PT.f19_g16.X
  6. SMS.ne30_m120.A_B1850CN

@jgfouca
Copy link
Member

jgfouca commented Jan 14, 2016

With a DIFF?

@bishtgautam
Copy link
Contributor

Yes, with DIFF

@jgfouca jgfouca deleted the apcraig/mosart/add-mosart branch August 31, 2016 02:49
jgfouca pushed a commit that referenced this pull request Feb 27, 2018
These changes add a new model, mosart, as a runoff model.
The changes to the scripts and coupling support those changes.
In addition to the new source code under mosart, there were also
small changes to rtm for consistency and a few changes to the coupling fields.

A handful of new compsets were created to run with mosart, but
additional compsets will be required as needed for science.

LG-20
[non-BFB]

Conflicts:
	cime/components/data_comps/datm/bld/namelist_files/namelist_definition_datm.xml
	cime/scripts/Tools/config_compsets.xml
	components/clm/bld/namelist_files/namelist_defaults_clm4_0.xml
	models/lnd/clm/src/cpl/lnd_import_export.F90
	scripts/ccsm_utils/Case.template/config_definition.xml
rljacob pushed a commit that referenced this pull request Apr 16, 2021
These changes add a new model, mosart, as a runoff model.
The changes to the scripts and coupling support those changes.
In addition to the new source code under mosart, there were also
small changes to rtm for consistency and a few changes to the coupling fields.

A handful of new compsets were created to run with mosart, but
additional compsets will be required as needed for science.

LG-20
[non-BFB]

Conflicts:
	cime/components/data_comps/datm/bld/namelist_files/namelist_definition_datm.xml
	cime/scripts/Tools/config_compsets.xml
	components/clm/bld/namelist_files/namelist_defaults_clm4_0.xml
	models/lnd/clm/src/cpl/lnd_import_export.F90
	scripts/ccsm_utils/Case.template/config_definition.xml
rljacob pushed a commit that referenced this pull request Apr 16, 2021
These changes add a new model, mosart, as a runoff model.
The changes to the scripts and coupling support those changes.
In addition to the new source code under mosart, there were also
small changes to rtm for consistency and a few changes to the coupling fields.

A handful of new compsets were created to run with mosart, but
additional compsets will be required as needed for science.

LG-20
[non-BFB]

Conflicts:
	cime/components/data_comps/datm/bld/namelist_files/namelist_definition_datm.xml
	cime/scripts/Tools/config_compsets.xml
	components/clm/bld/namelist_files/namelist_defaults_clm4_0.xml
	models/lnd/clm/src/cpl/lnd_import_export.F90
	scripts/ccsm_utils/Case.template/config_definition.xml
rljacob pushed a commit that referenced this pull request Apr 16, 2021
Changes to F90 files are suspect. I just went with the CSEG side for
all conflicts.

Config archive was changed to support MOSART on both sides, but in very
different ways. Again, went with CSEG.

* acme_master: (25 commits)
  bless_test_results: Finally add a full regression test for this tool
  Fix reference to unassigned variable.
  Better use of return codes in key ACME tools.
  Fix failing regression test.
  Add gnu modules to env for redsky and skybridge
  Add total test time to CDash field 'OS Version'
  Merge branch 'apcraig/mosart/add-mosart' (PR #526)
  Fixing odd number of tasks to handle
  Further fixes to the PEA test
  Add dynamic ozone treatment to ATMMOD compset
  Update r2o mapping file for ne30_ec60 grid
  Fixing link issues on Cetus+Mira after Albany build changes
  Turn on baseline comparison for redsky
  Implementation of linoz_mam4_resus_mom and hooks for v1.
  Merge branch 'singhbalwinder/atm/gold2718-new-infrastructure'(PR #343)
  Fixing PEA_P1_M.f45_g37_rx1.A.edison_intel
  Updates to BUILD_THREADED
  Updating default walltimes to use Edison's values
  Slurm time and partition directives
  Fixing cray-mpich module version to 7.3.0
  ...

Conflicts:
	driver_cpl/driver/prep_rof_mod.F90
	driver_cpl/driver/seq_diag_mct.F90
	driver_cpl/shr/seq_flds_mod.F90
	machines/config_pes.xml
	scripts-python/jenkins_generic_job
	scripts/Testing/Testcases/config_tests.xml
	scripts/Tools/case.build
	scripts/Tools/cesm_setup
	scripts/Tools/config_archive.xml
	scripts/Tools/config_definition.xml
	scripts/Tools/config_grid.xml
	scripts/Tools/st_archive
	scripts/create_newcase
	utils/perl5lib/Batch/BatchMaker.pm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants