-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure to restart-reproduce if using a restart from 15th of month #2588
Comments
I've created a reproducer branch which reproduces this error in the control_p8 test.
It can be run using
will compare restarts from the control vs the 1418 runs. And
will compare the control and the 1500 runs. In this case, the nccmp result shows
|
@yangfanglin Since @DeniseWorthen's tests suggest that this only happens with restarts on the 15th, which is the date climatology fields are read in, do you think this might be some bug related to climo file read logic? |
Good catch. During the middle of month the GVF will be updated with a new value based on the monthly climatology. It is very likely when you restart from 15th of month, the model will bypass that step.
|
Just for a test, they should reproduce if you set wei1m to 1 in sfcsub.f (not change any fixed fields on the 15th of month) |
Would you be able to do any debugging on this issue? The reproducer branch is all set up to use control_p8 and then run 3 different FHROT values. |
for warmstart, the model will assume you can get anything from restart files and won't go through sfcsub.f |
These are all restarting using the checkpoint restarts written at a particular time. They are using 'warm_start=true'. |
Update: I manually set weim1 in sfcsub.F90 to 1 and obtained the same result---the model restart reproduces using a restart from 04-14-180000 but does not restart reproduce using a restart from 04-15-000000. The idea here was to check if (contrary to expectations), the model does go into sfcsub for some field on the 15th. |
@DeniseWorthen Thanks for testing. Can you try another one to turn off the call of sfcsub.f to see if it can reproduce for the restart from 4-15-000000. |
@HelinWei-NOAA Where exactly would I need to turn off the call to sfcsub? |
@DeniseWorthen comment out "CALL SFCCYCLE" in gcycle.F90 (ufs-weather-model/FV3/ccpp/physics/physics/Interstitials/UFS_SCM_NEPTUNE) |
@HelinWei-NOAA I commented out sfccyle and I get the same results (using 04-14-180000 restart reproduces but 04-15-000000 does not). I'm currently writing ATM restarts every timestep (
Can I run w/o any LSM ? |
@DeniseWorthen I compared input.nml (namelist file) between control and restart runs, there are some difference likely not related to cold/warm start. Do you know why?
|
@HelinWei-NOAA These are the settings required for ATM to be a restart vs a 'cold start'. They are used in both coupled and standalone configurations. ufs-weather-model/tests/tests/control_restart_p8 Lines 63 to 69 in 8933749
|
@HelinWei-NOAA In case you don't know...the restart tests in the RT system utilize checkpoint restarts. This means that the control run will write restarts at a specified interval and then continue to the end of the fhmax. The restart test uses those checkpoint restarts to restart and also run forward to fhmax. The results of the control and restart test are compared; they must be identical if the model "restart reproduced". All the restarts produced by a control run are B4B w/ the restarts produced by the restart run. |
@DeniseWorthen Thanks for the explanation. It is weird to me that both restarts after 1500 can't reproduce either. |
@HelinWei-NOAA I've been able to get restart repro at hour day 15, hour 00 by switching to |
@DeniseWorthen Great work! It looks like the model read some aerosol data during the middle of month if we turn on MERRA2 (iaer=1011) |
@AnningCheng-NOAA Anning, could you please help check why the model is not reproducing if iaer=1011 (merra2 clima) ? |
Hi, Fanglin, I will take a look. |
when I issue which branch or tag is used to repeat the restart issue now? I need to repeat the error for debugging |
It looks like I do not have the permission to access the branch in hera |
never mind, I have just cloned the branch and will post what I have found soon. |
The issue has been fixed by a few lines of code change in aerosol interpolation code to make sure the continuous running to read the forcing file the same way as restart run. Formerly, the continuous run does not read the previous record. My test code is located at /scratch1/NCEPDEV/global/Anning.Cheng/tmp/ufs-weather-model PRs have been created to merge my changes to trunk: |
Description
As part of debugging Issue #2562, I was passed a run directory for the SFS C192mx025 by @ShanSunNOAA. While working on that issue, I found I was not able to restart-reproduce if I used a restart file from the middle of the month (specifically on
2005-11-15-00
).I then set up a test case using a modified
cpld_control_sfs
test and the HR4 tag (fcc9f84). The modifications were to align w/ the run-directory I was debugging for C192-mx025 (no waves, atm-thread=2).I ran that test case out long enough to capture restarts every 24h through to
2021-04-26-06
. I found that I was able to reproduce using the restart at04-14-06
, but not at04-15-06
.To enable easier debugging, I set up
cpld_control_sfs
cases using artificially advanced start times---ie, I set the start year/date to04-13-06
and wrote restarts every 6 hours. I found was able to restart-reproduce using the restart at04-14-18
but not at04-15-00
.I repeated the test using an executable which did not have
-D32BIT=ON -DHYDRO=ON
and the restart again failed to reproduce using a restart on the04-15-00
.Using mediator history files, I find that that the none of the fields imported from the ATM on restart are B4B using a restart from 04-15-00.
To Reproduce:
Currently all test cases reside in my own sandboxes on hera
/scratch1/NCEPDEV/stmp2/Denise.Worthen/sfs.restart
Additional context
I am currently testing the develop branch using the control_c48, control_p8 and the cpld_control_p8 tests.
The text was updated successfully, but these errors were encountered: