-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocean_only resting/z and resting/layer don't run #298
Comments
As I've mentioned to Nic, it looks like too many CPUs are being allocated to this case, and it may be resolved by using fewer. |
I think @angus-g is right. I'm not sure what is meant to happen within halo updates when the halo becomes larger than the computational domain:
|
Hi, I tested the case with (nx=20,ny=2) and layout=(16,2) in test_mpp_domains The current fms can not support halo size greater than nx or ny. I will Greetings, Zhi On Mon, May 23, 2016 at 7:53 AM, Alistair Adcroft notifications@github.com
|
Thanks @Zhi-Liang . @Hallberg-NOAA thought you might support halos wider than compute domain because of the wide-halo work you guys both did in the shallow water model and barotropic solver. It could be we never push the decomposition fine enough to test this limit. Certainly I can see why you wouldn't want to support it (since it needs repeated communication). We could consider a special flag at the MOM framework level to indicate homogeneity along an axis. We have several 1d and 2d tests where we "mimic" a half dimension by using identical values at both j=1 and j=2. However, this is probably overkill for a problem for which the easiest and simplest solution is not use too many PEs. |
According to build 27 this case is passing. I am not able to reproduce the problem interactively with the latest on dev/master. |
MARBL: convert salt_flux to tracer flux and add to STF
These two experiments don't appear to run under any compiler/build/domain configuration.
Example output:
https://climate-cms.nci.org.au/jenkins/job/mom-ocean.org/job/MOM6_run/build=DEBUG,compiler=intel,experiment=ocean_only-resting-z,memory_type=dynamic/
Error output for vanilla run:
[r2816:7741] *** An error occurred in MPI_Wait
[r2816:7741] *** reported by process [47732917534721,140720308486158]
[r2816:7741] *** on communicator MPI_COMM_WORLD
[r2816:7741] *** MPI_ERR_TRUNCATE: message truncated
[r2816:7741] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[r2816:7741] *** and potentially your MPI job)
Valgrind is reporting an error like this just before the crash but I don't know if it is related:
==1376== by 0x66956A4: walk_type_array (libmpiwrap.c:908)
==1376== by 0x66956A4: make_mem_defined_if_addressable (libmpiwrap.c:1015)
==1376== by 0x66956A4: maybe_complete (libmpiwrap.c:1359)
==1376== by 0x6696AF6: PMPI_Wait (libmpiwrap.c:1463)
==1376== by 0x7E10CDF: PMPI_WAIT (pwait_f.c:74)
==1376== by 0x481B58E: mpp_mod_mp_mpp_sync_self_ (mpp_util_mpi.inc:223)
==1376== by 0x416294D: mpp_domains_mod_mp_mpp_do_update_r8_3d_ (mpp_do_update.h:245)
==1376== by 0x4024582: mpp_domains_mod_mp_mpp_update_domain2d_r8_3d_ (mpp_update_domains2D.h:145)
==1376== by 0x2B50479: mom_domains_mp_pass_var_3d_ (MOM_domains.F90:157)
==1376== by 0x10355C1: mom_state_initialization_mp_mom_initialize_state_ (MOM_state_initialization.F90:253)
==1376== by 0x2DD1864: mom_mp_initialize_mom_ (MOM.F90:1800)
==1376== by 0x1BA5C3E: MAIN__ (MOM_driver.F90:263)
Full valgrind output can be found at:
https://climate-cms.nci.org.au/jenkins/job/mom-ocean.org/job/MOM6_runtime_analyzer/analyzer=valgrind,build=DEBUG,compiler=intel,experiment=ocean_only-resting-z,memory_type=dynamic/37/console
The text was updated successfully, but these errors were encountered: