-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPAS-O running on mira / cetus #260
Comments
@jayeshkrishna I only assigned you because you're the POC. I'll help make sure it's working, but we can have the discussions about it here. |
Thanks. |
Unfortunately the 2-day run on Cetus failed. @douglasjacobsen , Anything in particular that stands out to you (in the logs) as the source of the failure? |
@jayeshkrishna Not really, but it looks like it's dying in finalization (since It's getting through almost two days worth of time steps). What command did you use to create the case? I want to try it out on cetus, but previously it was dying when it was trying to init the model for me. |
I used the following command to create the case (this is the same error that I got when I was interacting with Mark for #173 ),
|
I found two versions of ESMF_TimeMod in the repo (see below) that is possibly the source of one of the error messages in the cesm log (format specification error),
|
@jayeshkrishna Yeah, the one in the ocean isn't built as part of this. (i.e. models/ocn/mpas-o/model/src/external/esmf_time_f90). Instead, we use the one from models/utils/esmf_wrf_timemgr. So, unless the bottom format specifier looks like the incorrect one, I'd be surprised if this is actually causing an issue. |
@jayeshkrishna I had a run using the same command you used that made it through the queue over the long weekend, but it didn't make it as far as your run did. I'm going to keep looking into it, but I'm not sure what to do about this. I had built this run in debug mode, but since it failed different from how yours failed, I'm going to try it without debug mode. |
@jayeshkrishna I can get through a run, but now when it's deallocating everything, I get the following error message:
It specifically happens on this line: https://github.com/ACME-Climate/MPAS/blob/ocean/develop/src/framework/mpas_pool_routines.F#L252 I tried changing the compiler version to see if it was a bug, but that didn't work. Any ideas? |
Was (dptr % r2) NULLified before use? Some compilers don't nullify ptrs by default. |
Also, I added an |
How about r2 % array (Not NULLiefied in the decl - mpas_field_types.inc)? |
I haven't been paying any attention to this, so please excuse the possibly stupid comment, but have you tried running with fewer processes per node (and more nodes), just to eliminate any possibility that this is a memory corruption issue due to maxing out the memory? |
@jayeshkrishna @worleyph I've not had good luck trying to change the number of processes on mira / cetus. So, I'm just using the default for now. However, this grid shouldn't run into memory issues. It can easily run on 64 processors (slowly at least), and this case is running on 2048. |
I am assuming that deallocating an unassociated ptr might have corrupted data structures. |
@jayeshkrishna It wasn't unassociated though. It was an actually alloacted pointer (meaning |
Ok, I was thinking in the lines of "r2%array not NULLified" => associated() returns true even though the ptr is unassociated => deallocate(r%array) corrupts memory. It might be worthwhile just trying to print couple of elements of the array to make sure it is what you expect it to be. |
|
Not sure if it is relevant here but I got into a lot of unassociated On Wed, Jul 8, 2015 at 10:57 AM, jayeshkrishna notifications@github.com
|
You could also try using the "-qinitauto" compiler option to see if that helps. |
@jayeshkrishna Yeah, the ESMF change needs to happen within ACME either way to prevent that message. I think the MPAS version has that change though, but within ACME the MPAS version doesn't get used so we don't need to modify it at all. I tried adding a |
@jayeshkrishna Adding |
I was able to get past the init failure with DEBUG turned on. However the run did not complete (crashed). |
Thanks @jayeshkrishna. I'll look into it with debug on now and see if it gets us any further. |
On Jul 9, 2015, at 8:28 AM, Doug Jacobsen <notifications@github.commailto:notifications@github.com> wrote: Thanks @jayeshkrishnahttps://github.com/jayeshkrishna. I'll look into it with debug on now and see if it gets us any further. Hi Guys-- i think you may just need to decrease the timestep to 10 minutes. or has it run on other platforms with 20 minute step? -mat |
@maltrud It runs fine with a 20 minute time step. The error on mira is some weird fortran + xlf issue that we're trying to figure out. |
Can this issue be closed? |
This is fixed now. |
92a5d03 Merge pull request #271 from jedwards4b/misc_python_fixes 8659568 fix issues with code_checker and missed run_cmd changes 1f9f0b3 Merge branch 'jedwards4b-more_mira_port' 1df3edd add path search for pylint 89c084f change CAM55 to CAM60 03520b1 merge to trunk 367d47f Merge branch 'jgfouca/add_code_checking' (PR #267) de33082 Fix a couple remaining pylint issues 1516c3b Made code_checker parallel, fix test name so it actually runs e46d0f4 Merge branch 'testreporter' 0c091cc remove whitespace 8a25823 Merge branch 'testreporter_update' of https://github.com/fischer-ncar/cime into testreporter 54e3916 Fix remaining test failures e777c55 Update testreporter to handle the new ESMCI TestStatus files. Put back ETEST compset needed for testing. 300cdce Merge pull request #268 from jedwards4b/pbs_fixes 84ebf2d remove commented code 037d253 get st_archive working on mira aa193c7 need compiler attribute in get_default_mpilib 30b5fcf remove colon from fixes line 42ad9e0 add SAVE_TIMING_DIR for blues b36e2cd Bug fix 380f8d4 fix regex match for blues c5d157e rework CIMEROOT capture in templates acb6008 sta working now 77b4ae1 more updates for mira b90daa8 changed so that the expression is evald a0c984d update acme side 8a5ef7f correct batch directives bfac10e get mira working again 706c28e Merge branch 'master' of https://github.com/ESMCI/cime 9a4db4d Merge pull request #261 from jedwards4b/nck_fix 4fff865 All scripts passing 553bba5 fixes for pbs systems 479ddea Fix missing import, dunno how this made it past testing 557cff3 Merge branch 'master' of https://github.com/ESMCI/cime ed3a74d Merge branch 'jgfouca/python3_better_error' merge to master fd7dd06 fix merge conflict c233f96 Add code checker test. Remove refactor disablings from scripts_regression_tests.py c23bb0b Progress. Tool added. build.py 100% ca97ca9 fix user mod support for multi instance 17fe594 got nck build right d9b7302 correct problems in nck test e681111 better fix for nck test issues 1abee3e if ntasks for a comp is 1 the ninst should not be > 1 8b6d137 Merge branch 'jgfouca/fix_acme_postbuild' (PR #259) fb3aeea Fix unit tests. Unsafe defaults were being used 9239984 Suffix all files with lid d4c503b Minor improvement to error message f70029d Python3 users should get a better error now d3de6ae Trying to get reasonable error when using python 3 e3dcd0b Minor fix bbf9ba7 Merge pull request #260 from mvertens/remove_esmf_interface 302c66c Port performance archiving scripts to python. 766f974 Merge branch 'remove_esmf' into remove_esmf_interface 5e13ecd removed NOC test 42d19e7 fixes for pre-alpha cesm tests e989229 Removed all code related to using the ESMF interfaces in driver_cpl directory 83e7e0f removed esmf interfaces 8fa604a Merge branch 'jedwards4b-read_xml_fix' 41c71bf updated to PR #255 c10289e Merge pull request #252 from ESMCI/santos/case-context-manager 77e5a78 needed a flush in systems_tests_common 215987a Add `read_only` flag to `Case` constructor. d7ae79b Add read-only mode to `Case` objects. f84aebb Make `Case` a context manager. 0f298cf Merge branch 'master' of https://github.com/ESMCI/cime c52b65c Merge pull request #251 from jedwards4b/test_updates2 cd50e4f Merge remote-tracking branch 'mydev/test_updates2' 87b9b7d add sanity check f684eaf added new version of seq_diag_mct.F90 and backed out new config_grids.xml schema dea5064 move case update to preview_namelist d1ed95a fixes erp test e0dda70 working on erp 4c4a57f a pythonization of the original csh scripts f9c8252 fixed bugs in erp.py - there are still outstanding problems dabbdce new schema (version 2.0) for config_grids.xml 3f5eca9 Merge remote-tracking branch 'origin/master' into test_updates2 d55ba8f fixed problems with order dependence of compset attributes ad26223 reordered elements such that grid search could be done correctly 3874918 merged to cesmdev cime cime4.5.22 a26d382 added comments to seq.py a65a6f3 added file back in - needed for regression tests dbde0f0 xmlchange_cmds was not called correctly - this is now fixed ec6f656 fixes to get tests working and major cleanup of utils/perl a1cbd50 updates for numerous tests and deletion from Testing/TestCases e20a53b added pem and erp tests 8e424dc fixed removal of CME tests 8b32dc3 Merge pull request #471 from fischer-ncar/Replace_Bcompsets f3dc375 Fix broken compset names 9bea53a Merge isotope updates and corip1 module updates ceac486 Merge pull request #469 from fischer-ncar/corip1_update 26918a5 Update modules for corip1 60d2fa5 Merge isotope updates to corip1_update 81102e9 Merge pull request #463 from cacraigucar/geotrace_cime d9a31c0 Merge tag 'cime4.5.20' into geotrace_cime b08f005 Merge remote-tracking branch 'upstream/master' e8d85ba Merge pull request #462 from fischer-ncar/mpas_o 3c9357d Merge updated origin. 10bc4ac Update ChangeLog 9be64db Merge pull request #467 from fischer-ncar/ChangeLog 8d45e7e Update ChangeLog ae57982 Merge remote-tracking branch 'upstream/master' 27a22d1 Merge pull request #465 from fischer-ncar/ChangeLog baec797 Add new ocn ice coupler fields f17545b Merge pull request #459 from apcraig/marbl de371db Update to add WaveWatch support 467b8ea Merge pull request #464 from fischer-ncar/WaveWatch 15b3bea Update ChangeLog, fix typo in config_compset.xml 3767ae8 Removed allactive compsets that weren't being tested. 49b2b47 Updates for WaveWatch bc93250 Changes to config_files.xml and config_grids.xml to add mpas-o 43d5d0e Merge tag 'cime4.5.17' into geotrace_cime 9e2603e Merge tag 'cime4.5.14' into geotrace_cime 1837319 Merge tag 'cime4.5.10.1' into geotrace_cime 1c8ef5b Merge tag 'cime4.5.6' into geotrace_cime a9a5ac9 Merge tag 'cime4.4.9' into geotrace_cime 7e4143e Add bcphi, bcpho, flxdst ice to ocean coupling fields c4a9548 Remove ESMF tests and add ChangeLog bc5524e git reset --hard geotrace_cime_n13_cime4.4.8 and add aux_isotope test 9eae658 update esp_present bug for cime4.5.10.1 eba705a Merge tag 'geotrace_cime_n12_cime4.4.8' into geotrace_cime 43e18b8 [ 50 character, one line summary ] add Mariana's fix for CO2 for Isotopes dd34c44 Fix esp assignment in seq_rest_mod.F90. This will go into cesm1_5_beta6.1 a4be399 Merge tag 'cime4.4.8' into geotrace_cime 4410a89 Revert "Revert "Merge tag 'cime4.4.7' into geotrace_cime"" f76d977 Revert "Merge tag 'cime4.4.7' into geotrace_cime" c8f9137 Merge tag 'cime4.4.7' into geotrace_cime d5df824 Merge remote-tracking branch 'upstream/master' e598bea Merge tag 'cime4.2.3' into geotrace_cime cd5c2c9 Merge tag 'cime4.0.3' into geotrace_cime 502959b Merge tag 'cime3.0.7' into geotrace_cime a552bff Merge tag 'cime2.0.18-p1.1' into geotrace_cime 1fafc89 Merge tag 'cime2.0.0' into geotrace_cime 0f0ff2b Add -ntr_iso to CICE config for isotopes ea874e9 Put in correct CMakeLists.txt e96f3b2 Merge tag 'cime1.1.10' into geotrace_cime 3d72660 modified: driver_cpl/bld/namelist_files/namelist_definition_drv.xml new file: driver_cpl/driver/mrg_mod.F90 modified: driver_cpl/driver/prep_ice_mod.F90 modified: driver_cpl/driver/prep_ocn_mod.F90 modified: driver_cpl/driver/prep_rof_mod.F90 modified: driver_cpl/driver/seq_diag_mct.F90 modified: driver_cpl/driver/seq_flux_mct.F90 modified: driver_cpl/shr/seq_flds_mod.F90 modified: externals/pio/CMakeLists.txt modified: machines/config_pes.xml modified: scripts/Tools/config_compsets.xml modified: scripts/Tools/config_definition.xml modified: share/csm_share/shr/shr_const_mod.F90 modified: share/csm_share/shr/shr_flux_mod.F90 new file: share/csm_share/shr/water_isotopes.F90 new file: share/csm_share/shr/water_types.F90 - rest of files from geotracer svn branch 99e4eeb modified: driver_cpl/bld/build-namelist - added first file for geotrace branch 32b1b7d Merge tag 'cime2.0.17-p1.1' into cime2.0.18-p1 cebd23c patch to fix resubmit in cesm1_3_beta07 git-subtree-dir: cime git-subtree-split: 92a5d03
MPAS-O has not been confirmed to run correctly on mira or cetus yet.
The text was updated successfully, but these errors were encountered: