Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build issue in mpas-o with Intel v17 on cori #1176

Closed
ndkeen opened this issue Dec 13, 2016 · 12 comments · Fixed by #1257
Closed

Build issue in mpas-o with Intel v17 on cori #1176

ndkeen opened this issue Dec 13, 2016 · 12 comments · Fixed by #1257
Assignees

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Dec 13, 2016

On Cori, I ran into a build issue with Intel v17 (both versions on Cori) in the ocean. (See error message pasted in below). Looking closer in the code, there does not seem to be a problem at all! So assuming it was a compiler oddity, I found a simple work-around and continued working. The compiler is complaining that it thinks the integer variable ierr is brought in from a module and then also declared locally. But it is not brought in from a module. I simply renamed the local variable from ierr to ierrz and it compiled/worked. To fix this for the next time, I modified components/mpas-o/model/src/tools/registry/gen_inc.c to insert the code with this renamed local variable (ierrz instead of ierr). I don't see this issue with Intel v16. I'm including the source file as it's one of those auto-generated ones and it's good to see exactly what the compiler is trying to compile.
mpas_ocn_core_interface.f90.txt

I also brought this up to Intel staff member Martyn Corden. I was granted permission to give this one source file to him. He replied:

 
It struck me that you were re-USE-ing the same modules over and over from different subroutines, even though they were used at the head of the module ocn_core_interface. This isn’t necessary and can be counterproductive. For example, mpas_derived_types was USEd >150 times.
 
         I suggest an experiment, where you comment out (or remove) every instance of the following statements that appears after the contains statement:
 
   use mpas_derived_types
   use mpas_pool_routines
   use mpas_dmpar
   use mpas_io_units
 
and see 1) if the problem persists, and 2) whether the compilation goes any faster.
(This used to be a problem for compilation time in older compilers, but that may have been fixed by now. It’s also possible that the fix might be related to the problem in 17.0).
ftn  -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source -xMIC-AVX512 -diag-disable 10121 -O2 -qno-opt-dynamic-align -DMPAS_NO_LOG_REDIRECT -DMPAS_NO_ESMF_INIT -DMPAS_ESM_SHR_CONST -DMPAS_PERF_MOD_TIMERS -c mpas_ocn_core_interface.f90
...
mpas_ocn_core_interface.f90(85608): error #6405: The same named entity from different modules and/or program units cannot be refe
renced.   [IERR]
         read(unitNumber, AM_pointwiseStats, iostat=ierr)
----------------------------------------------------^
compilation aborted for mpas_ocn_core_interface.f90 (code 1)
Makefile:18: recipe for target 'mpas_ocn_core_interface.o' failed
make[3]: *** [mpas_ocn_core_interface.o] Error 1
make[3]: Leaving directory '/global/cscratch1/sd/ndk/acme_scratch/SMS.T62_oEC60to30.GMPAS.cori-knl_intel.m21n041/bld/ocn/source/c
ore_ocean/driver'
Makefile:8: recipe for target 'all' failed
make[2]: *** [all] Error 2
make[2]: Leaving directory '/global/cscratch1/sd/ndk/acme_scratch/SMS.T62_oEC60to30.GMPAS.cori-knl_intel.m21n041/bld/ocn/source/c
ore_ocean'
Makefile:39: recipe for target 'dycore' failed
@ndkeen
Copy link
Contributor Author

ndkeen commented Dec 15, 2016

OK, the change Martyn suggested does appear to get us past this issue. Can we make it?

That is, in components/mpas-o/model/src/tools/registry/gen_inc.c

<     //fortprintf(fd, "      use mpas_derived_types\n");
<     //fortprintf(fd, "      use mpas_dmpar\n");
<     //fortprintf(fd, "      use mpas_pool_routines\n");
<     //fortprintf(fd, "      use mpas_io_units\n");
---
>     fortprintf(fd, "      use mpas_derived_types\n");
>     fortprintf(fd, "      use mpas_dmpar\n");
>     fortprintf(fd, "      use mpas_pool_routines\n");
>     fortprintf(fd, "      use mpas_io_units\n");

@mark-petersen
Copy link
Contributor

@ndkeen thanks for posting this.
I'm mirroring this issue on the MPAS repo, where the alteration will actually need to take place:
MPAS-Dev/MPAS#1175

@rljacob
Copy link
Member

rljacob commented Jan 16, 2017

@mark-petersen what is the status of this issue?

@amametjanov
Copy link
Member

+1
@mark-petersen, this is needed soon too.

The issue shows up with Intel v17 on these tests

  • ERS_Ld5.T62_oQU120.CMPASO-NYF
  • SMS.T62_oQU120_ais20.MPAS_LISIO_TEST

@mark-petersen
Copy link
Contributor

@amametjanov and @rljacob I will do this in the next few days.

@mark-petersen
Copy link
Contributor

@amametjanov and @ndkeen I am not able to see this error on the machines I typically use. @ndkeen could you please post the code diff that solved the problem so I get it right? If I remember right, you simply changed the ierr variable in the line above do some other dummy variable.

@mark-petersen
Copy link
Contributor

We will also look at the reusing of USE statements, but I think that issue is separate from this simple ierr error. I will change ierr first, and work on the USE statement later.

@amametjanov
Copy link
Member

Here is what solved it for me on Anvil with Intel compiler v17:

azamat@blogin1:model$ pwd
/lcrc/group/acme/azamat/repos/ACME-next/components/mpas-o/model
azamat@blogin1:model$ git diff
diff --git a/src/tools/registry/gen_inc.c b/src/tools/registry/gen_inc.c
index edd3c92..c17e403 100644
--- a/src/tools/registry/gen_inc.c
+++ b/src/tools/registry/gen_inc.c
@@ -588,10 +588,10 @@ int parse_namelist_records_from_registry(ezxml_t registry)/*{{{*/
 
                // Start defining new subroutine for namelist record.
                fortprintf(fd, "   subroutine %s_setup_nmlrec_%s(configPool, unitNumber, dminfo)\n", core_string, nmlrecname);
-               fortprintf(fd, "      use mpas_derived_types\n");
-               fortprintf(fd, "      use mpas_dmpar\n");
-               fortprintf(fd, "      use mpas_pool_routines\n");
-               fortprintf(fd, "      use mpas_io_units\n");
+               //fortprintf(fd, "      use mpas_derived_types\n");
+               //fortprintf(fd, "      use mpas_dmpar\n");
+               //fortprintf(fd, "      use mpas_pool_routines\n");
+               //fortprintf(fd, "      use mpas_io_units\n");
                fortprintf(fd, "      implicit none\n");
                fortprintf(fd, "      type (mpas_pool_type), intent(inout) :: configPool\n");
                fortprintf(fd, "      integer, intent(in) :: unitNumber\n");

@ndkeen
Copy link
Contributor Author

ndkeen commented Jan 27, 2017

@mark-petersen I did post the code above. Same as what Az has.

@rljacob
Copy link
Member

rljacob commented Jan 27, 2017

@mark-petersen, as an ACME developer, you should get accounts on both cori and anvil. Then you can exactly reproduce the issue.

@ndkeen
Copy link
Contributor Author

ndkeen commented Jan 27, 2017

It's true the compiler should be able to handle this, but the generating code does not need to put those extra USE statements at all. This fixes the issue, and it's cleaner, and the Intel guy said it might compile faster as well. This is the easiest way to fix it. If you wanted to try something local, I also was able to get it to compile by simply renaming the variable to be something else (like ierrzz), but as this is generated code, you would probably make the change in the same gen_inc.c and it would be propagated into all of those subroutines.

@mark-petersen
Copy link
Contributor

FYI, I have a PR into MPAS-Dev/develop for those four lines in src/tools/registry/gen_inc.c. There are four cores, and all need to review. I tested in all four cores myself, and made a corresponding PR into the atmospheric core for subsequent use statements needed due to this PR. I will be able to discuss it at a Tuesday MPAS telecon.

jonbob added a commit that referenced this issue Feb 9, 2017
Bring in new versions of MPAS framework and components.

This PR will bring in new versions of the MPAS framework and all three MPAS
components.
This PR fulfills the following Pull Requests:
* Enable threaded ocean and ice tests (1038)
* Ensures MPAS AM write/compute startup steps are performed (1191)
* Open streams to redirect per-process stream output to /dev/null (ACME/MPAS 5)
* Fix slow compilation of threaded MPAS-CICE on Mira (ACME/MPAS 6)
* Fix slow compilation of threaded MPAS-O on Mira (ACME/MPAS 7)
This PR fixes the following issues:
* Restore threading tests for MPAS when threading working (828)
* PET_Ln9.T62_oEC60to30.CMPASO-NYF Fails without baseline comparisons (1088)
* write to stderr in MPAS framework file generating redundant output (1105)
* option to build mpas-o without checking that using the correct cvmix (1111)
* issues with MPAS components writing from all ranks (1133)
* water cycle benchmark not deterministic when using threading in MPAS-O (1137)
* Build issue in mpas-o with Intel v17 on cori (1176)
It has been tested with:
* PET_Ln9.T62_oQU240.GMPAS-NYF.edison_intel (now passes)
* SMS.ne30_oEC.A_WCYCL2000.edison_intel (bfb with baseline)
* SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.edison_intel (bfb with baseline)
* SMS.T62_oEC60to30.GMPAS-NYF.wolf_gnu (bfb with baseline)

Fixes #828, Fixes #1088, Fixes #1105, Fixes #1111, Fixes #1133, Fixes #1137, Fixes #1176

[BFB]
[NML]
jonbob added a commit that referenced this issue Feb 13, 2017
Bring in new versions of MPAS framework and components.

This PR will bring in new versions of the MPAS framework and all three MPAS
components.
This PR fulfills the following Pull Requests:
* Enable threaded ocean and ice tests (1038)
* Ensures MPAS AM write/compute startup steps are performed (1191)
* Open streams to redirect per-process stream output to /dev/null (ACME/MPAS 5)
* Fix slow compilation of threaded MPAS-CICE on Mira (ACME/MPAS 6)
* Fix slow compilation of threaded MPAS-O on Mira (ACME/MPAS 7)
This PR fixes the following issues:
* Restore threading tests for MPAS when threading working (828)
* PET_Ln9.T62_oEC60to30.CMPASO-NYF Fails without baseline comparisons (1088)
* write to stderr in MPAS framework file generating redundant output (1105)
* option to build mpas-o without checking that using the correct cvmix (1111)
* issues with MPAS components writing from all ranks (1133)
* water cycle benchmark not deterministic when using threading in MPAS-O (1137)
* Build issue in mpas-o with Intel v17 on cori (1176)

Fixes #828, Fixes #1088, Fixes #1105, Fixes #1111, Fixes #1133, Fixes #1137, Fixes #1176

[BFB]
[NML]
agsalin pushed a commit that referenced this issue Apr 13, 2017
Improve create_test output
Setup cs.status as the first action instead of last.

When a case is completed, add a little extra text for the user so they can estimate progress

Test suite: scripts_regression_tests --fast
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #1176

User interface changes?: cs.status produced early, minor create_test output changes

Code review: @jedwards4b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants