-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAFS nested tests failed on several platforms #1085
Comments
@jkbk2004 Would you please take a look of this issue on orion? Thanks |
@junwang-noaa I will look into that on orion. |
@junwang-noaa it seems ./rt.sh -c -e went thru ok on orion: /work/noaa/epic-ps/jongkim/rt-blcheck/stmp/jongkim/FV3_RT/REGRESSION_TEST_INTEL |
@jkbk2004 Yes, the baseline creation is OK. Can you test if "./rt.sh -m -e" can also run successfully? Thanks |
@junwang-noaa I used debug que to test out. It seems like it runs ok: /work/noaa/epic-ps/jongkim/rt-blcheck/stmp/jongkim/FV3_RT/rt_53203/hafs_regional_telescopic_2nests_atm |
@jkbk2994 Can you try the full RT to see if this is still an issue? Thanks |
@junwang-noaa Full RT tests are also successful: /work/noaa/epic-ps/jongkim/UFS-RT-tests/rt-blcheck/tests. We can close the PR. If we see the issue later, we can re-open then. |
hafs_regional_storm_following_1nest_atm failed on hera in PR#909. |
@junwang-noaa, it looks to me these HAFS nesting RT failures are related to the corrupted/incomplete atm/sfc.nest??.fhhh.nc files (even though the file sizes look normal). By any chance, the write grid component did not properly close the netcdf files? Any comments/suggestions/ideas on what might be the cause for these corrupted/incomplete history output files will be much appreciated. Thanks! |
@BinLiu-NOAA Code managers have been tracking this error for a while. This is what we observed:
Currently we suspect it might be a netcdf issue. It would be good to test with netcdf 4.8.1 (current version 4.7.4). |
The hafs_regional_storm_following_1nest_atm test on hera.intel was turned off in PR #909 (comment). We need a follow-up PR to switch from netcdf-parallel to netcdf in all the nested HAFS tests and then turn the test back on. |
Description
The RT test hafs_regional_telescopic_2nests_atm failed a couple times on Orion when new baseline is created. The baseline was created successfully. But the RT test against the baseline failed with error:
Comparing atmf006.nc .........OK
Comparing sfcf006.nc .........OK
Comparing atm.nest02.f006.nc ............ALT CHECK......ERROR
It turned out that the compare_ncfile.py failed when comparing the atm.nest02.f006.nc from baseline and from the RT test:
compare_ncfile.py atm.nest02.f006.nc /work/noaa/stmp/bcurtis/stmp/bcurtis/FV3_RT/rt_410503/hafs_regional_telescopic_2nests_atm/atm.nest02.f006.nc
Traceback (most recent call last):
File "/work/noaa/nems/emc.nemspara/autort/pr/867250832/20220304164615/ufs-weather-model/tests/compare_ncfile.py", line 14, in
if np.shape(nc1[varname][:])!=np.shape(nc2[varname][:]):
File "netCDF4/_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.getitem
File "netCDF4/_netCDF4.pyx", line 5352, in netCDF4._netCDF4.Variable._get
File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
When rerun the test without using baseline creation, there is no issue with file comparison.
To Reproduce:
on orion, check out model code:
cd ufs-weather-model/tests
./rt.sh -c -e
./rt.sh -m -e
The orion log file show:
FAILED TESTS:
Test hafs_regional_telescopic_2nests_atm 104 failed in run_test failed
The text was updated successfully, but these errors were encountered: