Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usr_mech_infile option doesn't build chemistry preprocessor on cori #3536

Open
tangq opened this issue Apr 3, 2020 · 30 comments
Open

usr_mech_infile option doesn't build chemistry preprocessor on cori #3536

tangq opened this issue Apr 3, 2020 · 30 comments

Comments

@tangq
Copy link
Contributor

tangq commented Apr 3, 2020

The CAM_CONFIG_OPTS option "-usr_mech_infile" is really useful for testing new chemistry mechanisms and it is much needed for the NGD AtmPhys tasks of new chemistry mechanism development.

It worked in July 2019 as documented here to recreate the E3SMv1 chemistry code, but it doesn't work on cori with the current master.

It complains about missing the tar output from the pre-processor when building the namelists. I tracked down the error and found that the chemistry pre-processor failed to build with components/cam/chem_proc/src/Makefile (COMPILE=pgf95). However, my offline build with components/cam/chem_proc/src/make_chempp.cori (COMPILER=gfortran) is successful.

The namelist seems created by a perl script (components/cam/bld/configure). I don't know perl enough to fix it.

@singhbalwinder , you used it successfully last year as mentioned above. Do you have any ideas why it fails now? Do you have a fix for it? It seems to me that we need to use make_chempp.cori instead of Makefile on cori, but I am not sure if that will cause problems on other machines.

@wlin7 , I know you use perl. Do you have any suggestions?

Thanks.

@singhbalwinder
Copy link
Contributor

It was working fine on Cori at some point (like you mentioned), I am not sure what has changed between then and now. There is no test for this preprocessor, therefore there is no way to know when this functionality breaks (due to machine changes or the code changes).

I will have to look at the code to see what went wrong. The Makefile defaults to choosing pgf95 as the default compiler. As a temporary fix, I sometimes replace "pgf95" with "ifort", when I am using Intel compiler to compile the code. It might work on Cori. On Compy, I recently just made the following change to make it work:

ifeq ($(UNAMES),Linux)
  COMPILER := pgf95
endif

to:

ifeq ($(UNAMES),Linux)
  COMPILER := ifort
#pgf95
endif

We should find a permanent solution to fix this in the code.

@wlin7
Copy link
Contributor

wlin7 commented Apr 3, 2020

@tangq , in cam/bld/configure file, indeed the only thing it does is to specify a compiler (then call chem_preprocess routine, which is in another perl module).

If you think gfortran will do it, you can temporarily change line 1408 the configure file (current master) to see if it works

($chem_nadv) = chem_preprocess($cfg_ref,$print,"gnu");

I can update the configure file later in a more appropriate way.

Correction: fc_type should be gnu in the call to chem_preprocess.

@tangq
Copy link
Contributor Author

tangq commented Apr 3, 2020

Thank you, @singhbalwinder for the suggestions. However, on cori when I change COMPILER from pgf95 to gfortran, it gives:
'*** Chem preprocessor FAILED.

I noticed that under components/cam/chem_proc/src, the pre-processor can build with make_chempp.cori, but not Makefile directly.

Looking into make_chempp.cori, it has some module unload and load commands, which are not in Makefile. However, adding these module commands as:

ifeq ($(UNAMES),Linux)
	module unload PrgEnv-pgi
	module unload PrgEnv-cray
	module unload PrgEnv-gnu
	module unload PrgEnv-intel
	module load PrgEnv-gnu
	COMPILER := gfortran
endif

It still fails with the same error.

In addition, for my offline tests, after compiling successfully with ./make_chempp.cori. I have to run those module commands above before running the pre-processor. I suspect the failure after changing the compiler to gfortran is because those model commands are not executed. But I don't know how to test it.

Any suggestions?

I will test on compy as well.

@tangq
Copy link
Contributor Author

tangq commented Apr 3, 2020

@wlin7 and @singhbalwinder , I did a couple of more tests on cori and found the following.

  • Change line 1408 in the configure file to "gnu": The model still tries to use pgf95 and fails.
  • Change pgf95 to gfortran in the Makefile: cannot find gfortran (because haven't run module load PrgEnv-gnu)
  • Add module commands and change to use gfortan in Makefile: make doesn't understand the module command, so fails again.

So, based on all the tests (online and offline) I've done so far, it looks like 2 changes are needed:

  1. Let Makefile of pre-processor "see" gfortran to build it correctly.
  2. Run those module commands before running the pre-processor (Otherwise, it will complain about missing some library file).

@singhbalwinder
Copy link
Contributor

If you are changing CAM_CONFIG_OPTS, the preprocessor should create a file called MAKE.out in your case directory at Buildconf/camconf/chem_proc/. Would you please share the contents of that file when you change the compiler to gfortran? Have you tried "ifort" (after loading Intel specific modules)?

@singhbalwinder
Copy link
Contributor

I have been running the preprocessor successfully on Compy by just changing pgf95 to ifort.

@tangq
Copy link
Contributor Author

tangq commented Apr 3, 2020

@singhbalwinder , I used the MAKE.out to tell where went wrong in my previous comment.

You can find them at cori:/global/cscratch1/sd/tang30/chem_proc_log
MAKE.out.gfortran - change Makefile to use gfortan
MAKE.out.ifort - change Makefile to use ifort
MAKE.out.gfortran+module - change Makefile to use gfortan and add module commands
MAKE.out.configure - change configure file to use gnu

All 4 tests failed on cori.

@singhbalwinder
Copy link
Contributor

I just looked at your MAKE.out.ifort and I do not see any errors in that file. Did something break after the compilation?

@tangq
Copy link
Contributor Author

tangq commented Apr 3, 2020

It is not obvious for the ifort test. I only got the errors:

  2020-04-03 16:13:32 atm
   Calling /global/u2/t/tang30/E3SM_code/20200403/components/cam/cime_config/buildnml
ERROR: Command: '/global/u2/t/tang30/E3SM_code/20200403/components/cam/bld/configure -s -ccsm_seq -ice none -ocn docn -caseroot /global/cscratch1/sd/tang30/E3SM_simulations/tst2_usr_mech_infile.ne30_oECv3.cori-knl/case_scripts -comp_intf mct  -spmd -spmd -smp -smp -dyn se -dyn_target preqx -res ne30np4   -phys cam5 -clubb_sgs -microphys mg2 -chem linoz_mam4_resus_mom_soag -rain_evap_to_coarse_aero -nlev 72 -bc_dep_to_snow_updates -usr_mech_infile /global/homes/t/tang30/E3SM_code/20200324_UCI-chem/components/cam/chem_proc/inputs/pp_chemUCI.inp ' failed with error '*** Chem preprocessor FAILED.

@tangq
Copy link
Contributor Author

tangq commented Apr 3, 2020

Same thing seems happened on compy with ifort as well. Both compiled "campp" successfully...

On cori $casedir = /global/cscratch1/sd/tang30/E3SM_simulations/tst2_usr_mech_infile.ne30_oECv3.cori-knl

On compy $casedir = /compyfs/tang338/E3SM_simulations/20200403.tst_usr_mech_infile.ne30_oECv3.compy

@singhbalwinder
Copy link
Contributor

With ifort your code compiled fine as you got the executable campp (which is great!). My guess is that something might be wrong in your chem mechanism file (just a guess). Unfortunately, the error due to incompatible chemistry input file do not show up in the logs. You might get the scripts to print out these errors if you use --debug flag, like ./case.build --debug. Store the output in a log file like ./case.build --debug |& tee log. I am not 100% sure that it will be helpful but it doesn't hurt to try.

@singhbalwinder
Copy link
Contributor

Just to confirm if it is actually due to the chemistry input file, you can use the model's chemistry input file to see if it works.

@tangq
Copy link
Contributor Author

tangq commented Apr 3, 2020

I did test if the pre-processor likes the new mechanism input file or not by running the campp interactively:

[tang30@cori20 bin (master)]$ ./cam_chempp 
 Enter filespec of input file
../inputs/pp_chemUCI.inp
  
 ================================================
 CAM-Chem preprocessor has successfully completed
 ================================================

And the tar file is created successfully

[tang30@cori20 bin (master)]$ ls ../output/
cam.subs.tar  chemUCI.dat  chemUCI.doc  moz.mat.F90  moz.mods.F90  moz.subs.F90  params.h

I think this means my chemistry input file is not the cause. Can you send me an input file which works for you to confirm it?

@singhbalwinder
Copy link
Contributor

I just did the comparison between yours and mine and it seems like the file which works for me doesn't have the following at the top (I know this is specific to your input file):

BEGSIM
output_unit_number = 7
output_file        = chemUCI.doc
procout_path       = ../output/
src_path           = ../bkend/
procfiles_path     = ../procfiles/cam/
sim_dat_path       = ../output/
sim_dat_filename   = chemUCI.dat

Comments
     "Modified by Qi Tang & Michael Prather"
     "chemUCI standalone for E3SM"
     "last touch 20200401 - MJP"
End Comments

My files starts with SPECIES. I vaguely remember that I have seen this behavior previously as well where if the chemistry preprocessor is run via CAM_CONFIG_OPTS, the file should not have the BEGSIM block at the top (not 100% sure). You can try removing that block at the top. My file is at: /compyfs/sing201/delete/pp_linoz_mam5_resus_mom_soag.inp

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

Ah..That's right. When running online, we don't need the paths at the top. I confirmed it with the default E3SMv1 chem_mech.in file. I will test it with the new input file now.

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

When removing the path session below, my new chemistry input file works! Thank you @singhbalwinder and @wlin7 .

I will manually change the Makefile to use ifort for now.

If you have suggestions to solve this issue for long term, I'd be happy to test them as well.

BEGSIM
output_unit_number = 7
output_file        = chemUCI.doc
procout_path       = ../output/
src_path           = ../bkend/
procfiles_path     = ../procfiles/cam/
sim_dat_path       = ../output/
sim_dat_filename   = chemUCI.dat

@singhbalwinder
Copy link
Contributor

great! thanks @tangq ! I will let you know if I have something for you to try.

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

@singhbalwinder , I am also interested in adding a routine test for the chemistry pre-processor. But I haven't added tests before, so don't know how to implement it.

There are many chemistry/aerosol development tasks planned for the NGD AtmPhys. I expect more frequent updates and modifications to the chemistry mechanisms, so we really need a test for it.

We can use the chemistry mechanism input file of E3SMv1 and feed it to the test running the pre-processor. The test will generate a set of fortran code. We can then diff this new set of code with the code used by E3SMv1 (located at components/cam/src/chemistry/pp_linoz_mam4_resus_mom_soag) to tell if the pre-processor generates the same fortran code files.

Is it possible to add such a test? Or do you think there is a better way to do it? Thank you!

@wlin7
Copy link
Contributor

wlin7 commented Apr 4, 2020

Hi @tangq , @singhbalwinder ,

From cam/bld/configure, given an fc_type (e.g., intel), if would set env variable USER_FC in perl5lib/Build/ChemPreprocess.pm. This env variable USER_FC however is not being used in Makefile

Add the following to Makefile will do it, which will then use the same compiler for building preprocessor and the model. However, be cautious on a system using cross compiler. I kind of remember with BlueGene L, executable built on frontend but can only run on compute nodes. Anyway, here it is.

# guess default compiler 

# use USER_FC is defined

ifneq ($(USER_FC),$(null))
  COMPILER := $(USER_FC)
endif
 

Let me know if we want a PR to make it permanent.

@singhbalwinder
Copy link
Contributor

@tangq : Adding a test is a great idea. I will have to think how to create a test like you suggested as it is very different from already existing tests. In the existing tests, we run and model and compare netcdf files using cprnc tool (not F90 files). They have pretty fixed workflows. However, they do allow us to configure the model in any way we want.

Without too much effort, we can add a simple test which would invoke chemistry preprocessor using CAM_CONFIG_OPTS in the code and build the model based on that. This test will fail if there is something broken in the chemistry preprocessor. Does that sound reasonable?

Thanks @wlin7 for suggesting this fix. @tangq, if possible, would you please test @wlin7 's fix to build chemistry preprocessor? If it works, it will make it much easier to add a test for chemistry preprocessor.

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

On cori, I tested the fix below as @wlin7 suggested - The chemistry pre-processor (PP) created the fortran code at Buildconf/camconf/chem_proc. I couldn't find the log file for building the PP. It seems that the MAKE.out file is only created when PP fails. This process is completed by ./case.setup, for which I am not sure if we have a log file.

Anyway, the fact that the fortran code is created should mean it is successful. This is probably the best solution, because 1) using the same compiler for the PP as for the model ensures the PP compiler is supported; 2) it allows an easier test for the PP as @singhbalwinder mentioned above.

I will make a PR to implement it. Thanks @wlin7 for the great suggestion!

diff --git a/components/cam/chem_proc/src/Makefile b/components/cam/chem_proc/src/Makefile
index 1d1718978..a572cd99a 100644
--- a/components/cam/chem_proc/src/Makefile
+++ b/components/cam/chem_proc/src/Makefile
@@ -101,6 +101,11 @@ endif
 
 endif
 
+# use USER_FC (if defined) to overwrite default compiler
+ifneq ($(USER_FC),$(null))
+       COMPILER := $(USER_FC)
+endif
+
 #------------------------------------------------------------------------
 #------------------------------------------------------------------------
 # set compiler flags ...

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

@singhbalwinder , thanks for explaining how the existing tests work and your ideas make perfect sense to me.

I think we can add the test within the current testing structure as:

  1. Run an F-case (resolution doesn't matter, so ne4 would be fine to be efficient) with -chem linoz_mam4_resus_mom_soag as one of the CAM_CONFIG_OPTS as the baseline run. We should already have such a run for other tests. Right?
  2. Run the same F-case above but with -usr_mech_infile ${step 1 casedir}/Buildconf/camconf/chem_mech.in added for CAM_CONFIG_OPTS.

To determine if the test is successful or not, we can a) compare the netcdf files using cprnc of these two runs, or b) compare the fortran code files at E3SM/components/cam/src/chemistry/pp_linoz_mam4_resus_mom_soag and ${step 2 casedir}/Buildconf/camconf/chem_proc/source with diff.

a) requires less work to implement. b) is more efficient as the run in step 2 only need to finish ./case.setup - don't even need to build the model.

@singhbalwinder , what do you think? Is it hard to add such a test? Thanks.

@wlin7
Copy link
Contributor

wlin7 commented Apr 4, 2020

@tangq , for your reference: when chem preprocessor completes successfully, some intermediate files will be removed, but MAKE.out, if you want to take a look, is still saved under Buildconf/camconf/chem_proc.

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

@wlin7 , I am testing the PR (#3537) to implement your suggestions in Makefile. It seems that when successful MAKE.out is saved on compy but not on cori.

cori dir: /global/cscratch1/sd/tang30/E3SM_simulations/20200324_UCI-chem.tst.chemUCI.ne30_oECv3/case_scripts/Buildconf/camconf/chem_proc

compy dir: /compyfs/tang338/E3SM_simulations/20200123.tst_usr_mech_infile.ne30_oECv3.compy/case_scripts/Buildconf/camconf/chem_proc

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

Interesting...I reran the test on compy and this time MAKE.out was NOT saved. Is it some kind of random behavior?

@tangq
Copy link
Contributor Author

tangq commented Apr 4, 2020

I looked at perl5lib/Build/ChemPreprocess.pm. It seems that no command in that file removes MAKE.out.

@singhbalwinder
Copy link
Contributor

Hi @tangq, Unless I am missing something, I don't think any of the existing tests can be setup like you described above. We will have to create a new test type for this. I am tagging @jgfouca, in case I am forgetting some exiting test type which can help here.

What we can easily do is we can run a test like smoke test (SMS with ne4) where we change the CAM_CONFIG_OPTS to add user_mech_infile. If this tests passes (i.e. runs fine), we would know that the chemistry preprocessor has worked fine. This test would not check if the answers are same as with the default model.

If it is possible to use baselines of another test, we can define a test for chemistry preprocessor which is exactly the same of an existing test but the chemistry preprocessor test uses CAM_CONFIG_OPTS to run the preprocessor. This approach may work but it creates a dependency which can be a problem as baseline test has to run first. These kind of loose dependencies can be hard to maintain overtime.

@tangq
Copy link
Contributor Author

tangq commented Apr 6, 2020

@singhbalwinder , I think what you suggested is fine for now, because the main purpose is to test if the pre-processor can generate the fortran code successfully. Without the baseline test dependency, it should be straightforward.

Can you submit a PR for this test? I'd be happy to review it. Thanks!

@singhbalwinder
Copy link
Contributor

I can add a test to your PR (fixing the compiler issue), if that is okay with you.

@tangq
Copy link
Contributor Author

tangq commented Apr 6, 2020

Sounds great to me. Thanks, @singhbalwinder

wlin7 added a commit that referenced this issue May 4, 2020
Use the same compiler for chemistry pre-processor and the model

This PR solves the issue #3536 to use the same compiler
for the chemistry pre-processor (PP) and the model.

This PR also makes it easier to implement a routine test for the PP.

The tests are successful on both cori and compy. But the log file
(MAKE.out) is only saved on compy. Need to add some changes to this
PR to save MAKE.out on all machines.

[BFB]
wlin7 added a commit that referenced this issue May 6, 2020
Use the same compiler for chemistry pre-processor and the model

This PR solves the issue #3536 to use the same compiler for the
chemistry pre-processor (PP) and the model.

This PR also makes it easier to implement a routine test for the PP.

A new test, SMS_Ln1.ne4_ne4.FC5AV1C-L.cam-chem_pp, is added.

[BFB]
rljacob pushed a commit that referenced this issue Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants