Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocean conservation check accumulated sums set to zero at restart. #6642

Closed
mark-petersen opened this issue Sep 24, 2024 · 6 comments · Fixed by #6643
Closed

Ocean conservation check accumulated sums set to zero at restart. #6642

mark-petersen opened this issue Sep 24, 2024 · 6 comments · Fixed by #6643

Comments

@mark-petersen
Copy link
Contributor

mark-petersen commented Sep 24, 2024

The ocean conservation check analysis member overwrites the first entry in the file with a zero after restarts for some variables. The result is that the first day's entry in a monthly conservation check file is correct for continuous runs but zero for a run with a restart break, for some variables.

@mark-petersen
Copy link
Contributor Author

@xylar can you describe the problem in more detail here?

I ran tests and I don’t see what you are seeing. I ran QU240 with config_am_conservationcheck_enable = .true. for all cases, and everything standard from the repo for four cases:

  1. master, 2 months
  2. master 1mo+restart+1 mo
  3. pr 2 months
  4. pr 1mo+restart+1 mo

writing restarts every month. For reference, I used

./create_newcase -case $CASE_ROOT/$E3SM_CASE   -compiler gnu   -mach chrysalis   -project e3sm \
   -compset GMPAS-IAF -res T62_oQU240 --walltime 00:30:00

I can’t find any zeroing behavior in any variables with master at the month break (2). For example,

$ pwd
/lcrc/group/e3sm/ac.mpetersen/scratch/chrys/240924f.ConsCheckMasterRestart.T62_oQU240.GMPAS-IAF.chr/run

$ ncdump -v accumulatedRainFlux  *mpaso.hist.am.conservationCheck*01-02-01.nc |tail

 accumulatedRainFlux = 12129038639.0062, 12114183191.7286, 12117918123.2804,
   ...
    12056372275.0261, 12054120109.5554, 12063708472.0842, 12081661853.1934,
    12074698994.5417 ;
}

ends with exactly the same value with or without the restart (case 1 and 2 above) for month 2. Also, these values are not increasing, so it is accumulated within each daily interval, and then zeroed after writing this daily value. So users have to accumulate the sums themselves.

What variable(s) did you see it in? Could you provide a path?

@xylar
Copy link
Contributor

xylar commented Sep 25, 2024

Thanks @mark-petersen. I didn't appreciate that the conservation check is just computing daily totals. I seem to be mistaken that restart files are needed, in that case.

However, your run does show the same issues that I was seeing. It isn't for all variables, and now it becomes clear that that might not be expected. But many variables in your output file above do show strange behavior as I was seeing after restarts:

$ ncdump *mpaso.hist.am.conservationCheck*01-02-01.nc

As one example:

$ ncdump -v accumulatedRiverRunoffTemperatureFlux *mpaso.hist.am.conservationCheck*01-02-01.nc | tail

 accumulatedRiverRunoffTemperatureFlux = 0, 18605091.7352952, 
    18615620.3741064, 18626816.3199467, 18629576.2448211, 18639342.9743262, 
    18655790.2099141, 18669977.626512, 18682829.8076654, 18696249.6768278, 
    18693466.4480733, 18698002.2772534, 18716545.9092317, 18730335.0354511, 
    18735442.8992608, 18741043.1198336, 20070962.3187527, 20085510.9296013, 
    20097497.8781152, 20119988.1101842, 20148696.7620016, 20165564.3783535, 
    20182018.390221, 20181173.3055113, 20191062.4939655, 20217130.2113496, 
    20248905.5051626, 20262064.0200586 ;

Many, many variables show zeros at the beginning of the compute interval. I now believe this may be a truncate vs. append issue similar to what we fixed in #6434

@xylar
Copy link
Contributor

xylar commented Sep 25, 2024

I believe we always want:

 config_am_conservationcheck_compute_on_startup = .false.
 config_am_conservationcheck_write_on_startup = .false.

I think the odd values we're seeing above are related to certain conservation terms not being computed correctly on startup.

Then, I think we also want clobber_mode="append":

<stream name="conservationCheckOutput"
...
        clobber_mode="append"
        packages="conservationCheckAMPKG">

Otherwise, the entry with xtime at the start of a given month gets clobbered on restarts.

I ran to QU240 G-cases for 10 months with a restart after 5 months. In both cases, I was not computing/writing on startup. In one case, I was using truncate and in the other append as the clobber_mode. For the June output, the append mode can be seen to have the correct xtime values:

$ pwd
/lcrc/group/e3sm/ac.xylar/scratch/chrys/20240925.GMPAS-IAF.T62_oQU240.append.chrysalis/run

$ ncdump -v xtime *mpaso.hist.am.conservationCheck*01-06-01.nc | tail -n 40
data:

 xtime =
  "0001-06-01_00:00:00",
  "0001-06-02_00:00:00",
  "0001-06-03_00:00:00",
  "0001-06-04_00:00:00",
  "0001-06-05_00:00:00",
  "0001-06-06_00:00:00",
  "0001-06-07_00:00:00",
  "0001-06-08_00:00:00",
  "0001-06-09_00:00:00",
  "0001-06-10_00:00:00",
  "0001-06-11_00:00:00",
  "0001-06-12_00:00:00",
  "0001-06-13_00:00:00",
  "0001-06-14_00:00:00",
  "0001-06-15_00:00:00",
  "0001-06-16_00:00:00",
  "0001-06-17_00:00:00",
  "0001-06-18_00:00:00",
  "0001-06-19_00:00:00",
  "0001-06-20_00:00:00",
  "0001-06-21_00:00:00",
  "0001-06-22_00:00:00",
  "0001-06-23_00:00:00",
  "0001-06-24_00:00:00",
  "0001-06-25_00:00:00",
  "0001-06-26_00:00:00",
  "0001-06-27_00:00:00",
  "0001-06-28_00:00:00",
  "0001-06-29_00:00:00",
  "0001-06-30_00:00:00" ;
}

For the one with truncate, the first entry is missing:

$ pwd
/lcrc/group/e3sm/ac.xylar/scratch/chrys/20240925.GMPAS-IAF.T62_oQU240.truncate.chrysalis/run
(e3sm_unified_1.10.0_login) [ac.xasay-davis@chrlogin1 run]$ ncdump -v xtime *mpaso.hist.am.conservationCheck*01-06-01.nc | tail -n 40
data:

 xtime =
  "0001-06-02_00:00:00",
  "0001-06-03_00:00:00",
  "0001-06-04_00:00:00",
  "0001-06-05_00:00:00",
  "0001-06-06_00:00:00",
  "0001-06-07_00:00:00",
  "0001-06-08_00:00:00",
  "0001-06-09_00:00:00",
  "0001-06-10_00:00:00",
  "0001-06-11_00:00:00",
  "0001-06-12_00:00:00",
  "0001-06-13_00:00:00",
  "0001-06-14_00:00:00",
  "0001-06-15_00:00:00",
  "0001-06-16_00:00:00",
  "0001-06-17_00:00:00",
  "0001-06-18_00:00:00",
  "0001-06-19_00:00:00",
  "0001-06-20_00:00:00",
  "0001-06-21_00:00:00",
  "0001-06-22_00:00:00",
  "0001-06-23_00:00:00",
  "0001-06-24_00:00:00",
  "0001-06-25_00:00:00",
  "0001-06-26_00:00:00",
  "0001-06-27_00:00:00",
  "0001-06-28_00:00:00",
  "0001-06-29_00:00:00",
  "0001-06-30_00:00:00" ;
}

Other months are complete in both runs, so the issue is clearly related to restarts.

@mark-petersen
Copy link
Contributor Author

Thanks @xylar for the clarification. I was thinking the same thing, and will change these three items in the PR.

 config_am_conservationcheck_compute_on_startup = .false.
 config_am_conservationcheck_write_on_startup = .false.

<stream name="conservationCheckOutput"
...
        clobber_mode="append"
        packages="conservationCheckAMPKG">

@xylar
Copy link
Contributor

xylar commented Sep 25, 2024

@mark-petersen, please consider just using my branch (https://github.com/xylar/E3SM/commits/mark-petersen/mpas/correct-conservation-check/) rather than duplicating efforts.

@mark-petersen
Copy link
Contributor Author

Updated the description.

jonbob added a commit that referenced this issue Sep 26, 2024
…t (PR #6643)

Correct ocean conservation check settings

Currently, the ocean conservation check analysis member overwrites the
first entry in the file with a zero after restarts for some variables.
This PR corrects this behavior so that the first day's entry in a
monthly conservation check file is identical between continuous runs and
a run with a restart break. Adds a new mpaso testdef and corresponding
stealth test.

Fixes #6642

[NML] for some mpaso resolutions
[BFB]
@jonbob jonbob closed this as completed in b12eb68 Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants