Skip to content

Commit

Permalink
Merge pull request #2005 from jedwards4b/external_process_interface
Browse files Browse the repository at this point in the history
External process interface

Improve interface to external scripts PRERUN_SCRIPT, POSTRUN_SCRIPT, DATA_ASSIMILATION_SCRIPT. If DATA_ASSIMILATION is true but no script is named throw an error, allow scripts in python to be called directly instead of through shell. Improve handling of arguments and log files.

Test suite: scripts_regression_tests.py, hand testing
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes #1953

User interface changes?:

Update gh-pages html (Y/N)?: Y

Code review:
  • Loading branch information
jgfouca authored Nov 6, 2017
2 parents a84b875 + 5b57163 commit 0027c84
Show file tree
Hide file tree
Showing 5 changed files with 132 additions and 69 deletions.
146 changes: 99 additions & 47 deletions doc/source/users_guide/running-a-case.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@ Calling **case.submit**

Before you submit the case using **case.submit**, make sure
the batch queue variables are set correctly for your run
Those variables are contained in the file **$CASEROOT/env_batch.xml**
under the XML ``<group id="case.run">`` and ``<group id="case.st_archive">``
elements.
Those variables are contained in the file **$CASEROOT/env_batch.xml**
under the XML ``<group id="case.run">`` and ``<group id="case.st_archive">``
elements.

Make sure that you have appropriate account numbers (``PROJECT``), time limits
(``JOB_WALLCLOCK_TIME``), and queue (``JOB_QUEUE``) for those groups.

Also modify **$CASEROOT/env_run.xml** for your case using :ref:`xmlchange<modifying-an-xml-file>`.

Once you have executed **case.setup** and **case.build**, run **case.submit**
Once you have executed **case.setup** and **case.build**, run **case.submit**
to submit the run to your machine's batch queue system.
::

Expand All @@ -40,7 +40,7 @@ When called, the **case.submit** script will:

- Run **preview_namelist**, which in turn will run each component's **buildnml**.

- Run :ref:`check_input_data<input_data>` to verify that the required
- Run :ref:`check_input_data<input_data>` to verify that the required
data are present.

- Submit the job to the batch queue. which in turn will run the **case.run** script.
Expand All @@ -52,7 +52,7 @@ Upon successful completion of the run, **case.run** will:

- Copy log files back to ``$LOGDIR``.

- Submit the short-term archiver script **case.st_archive**
- Submit the short-term archiver script **case.st_archive**
to the batch queue if ``$DOUT_S`` is TRUE.

- Resubmit **case.run** if ``$RESUBMIT`` > 0.
Expand Down Expand Up @@ -95,21 +95,21 @@ messages:
.. note::
After a successful first run, set the **env_run.xml** variable
``$CONTINUE_RUN`` to ``TRUE`` before resubmitting or the job will not
progress.
progress.

You may also need to modify the **env_run.xml** variables
``$STOP_OPTION``, ``$STOP_N`` and/or ``$STOP_DATE`` as well as
``$REST_OPTION``, ``$REST_N`` and/or ``$REST_DATE``, and ``$RESUBMIT``
before resubmitting.

See :ref:`the basic example<use-cases-basic-example>` for a complete example
See :ref:`the basic example<use-cases-basic-example>` for a complete example
of how to run a case.

---------------------------------
Troubleshooting a job that fails
---------------------------------

There are several places to look for information if a job fails.
There are several places to look for information if a job fails.
Start with the **STDOUT** and **STDERR** file(s) in **$CASEROOT**.
If you don't find an obvious error message there, the
**$RUNDIR/$model.log.$datestamp** files will probably give you a
Expand All @@ -126,14 +126,14 @@ problems<troubleshooting>` for more information.
Input data
====================================================

The **check_input_data** script determines if the required data files
for your case exist on local disk in the appropriate subdirectory of
The **check_input_data** script determines if the required data files
for your case exist on local disk in the appropriate subdirectory of
``$DIN_LOC_ROOT``. It automatically downloads missing data.

The required input data sets needed for each component are found in the
**$CASEROOT/Buildconf** directory. These files are generated by a call
to **preview_namlists** and are in turn created by each component's
**buildnml** script. For example, for compsets consisting only of data
**$CASEROOT/Buildconf** directory. These files are generated by a call
to **preview_namlists** and are in turn created by each component's
**buildnml** script. For example, for compsets consisting only of data
models (``A`` compsets), the following files are created:
::

Expand Down Expand Up @@ -163,12 +163,12 @@ Controlling starting, stopping and restarting a run
====================================================

The file **env_run.xml** contains variables that may be modified at
initialization or any time during the course of a model run. Among
other features, the variables comprise coupler namelist settings for
the model stop time, restart frequency, coupler history frequency, and
initialization or any time during the course of a model run. Among
other features, the variables comprise coupler namelist settings for
the model stop time, restart frequency, coupler history frequency, and
a flag to determine if the run should be flagged as a continuation run.

At a minimum, you will need to set the variables ``$STOP_OPTION`` and
At a minimum, you will need to set the variables ``$STOP_OPTION`` and
``$STOP_N``. Other driver namelist settings then will have consistent and
reasonable default values. The default settings guarantee that
restart files are produced at the end of the model run.
Expand Down Expand Up @@ -203,10 +203,10 @@ The case initialization type is set using the ``$RUN_TYPE`` variable in

``startup``
In a startup run (the default), all components are initialized using
baseline states. These states are set independently by each component
and can include the use of restart files, initial files, external
baseline states. These states are set independently by each component
and can include the use of restart files, initial files, external
observed data files, or internal initialization (that is, a "cold start").
In a startup run, the coupler sends the start date to the components
In a startup run, the coupler sends the start date to the components
at initialization. In addition, the coupler does not need an input data file.
In a startup initialization, the ocean model does not start until the second
ocean coupling step.
Expand All @@ -231,14 +231,14 @@ The case initialization type is set using the ``$RUN_TYPE`` variable in
type of run. ``$RUN_REFCASE`` and ``$RUN_REFDATE`` are required for
branch runs. To set up a branch run, locate the restart tar file or
restart directory for ``$RUN_REFCASE`` and ``$RUN_REFDATE`` from a
previous run, then place those files in the ``$RUNDIR`` directory.
previous run, then place those files in the ``$RUNDIR`` directory.
See :ref:`setting up a branch
run<setting-up-a-branch-run>`.

``hybrid``
A hybrid run is initialized like a startup but it uses
initialization data sets from a previous case. It is similar
to a branch run with relaxed restart constraints.
to a branch run with relaxed restart constraints.
A hybrid run allows users to bring together
combinations of initial/restart files from a previous case
(specified by ``$RUN_REFCASE``) at a given model output date
Expand All @@ -259,10 +259,10 @@ run, the ``$CONTINUE_RUN`` variable is set to TRUE, and the model
restarts exactly using input files in a case, date, and bit-for-bit
continuous fashion.

The variable ``$RUN_STARTDATE`` is the start date (in yyyy-mm-dd format)
for either a startup run or a hybrid run. If the run is targeted to be
The variable ``$RUN_STARTDATE`` is the start date (in yyyy-mm-dd format)
for either a startup run or a hybrid run. If the run is targeted to be
a hybrid or branch run, you must specify values for ``$RUN_REFCASE`` and
``$RUN_REFDATE``.
``$RUN_REFDATE``.

.. _controlling-output-data:

Expand Down Expand Up @@ -303,13 +303,13 @@ Also:

- Users generally should turn off short-term archiving when developing new code.

Standard output generated from each component is saved in ``$RUNDIR``
in a *log file*. Each time the model is run, a single coordinated datestamp
Standard output generated from each component is saved in ``$RUNDIR``
in a *log file*. Each time the model is run, a single coordinated datestamp
is incorporated into the filename of each output log file.
The run script generates the datestamp in the form YYMMDD-hhmmss, indicating
the year, month, day, hour, minute and second that the run began
(ocn.log.040526-082714, for example). Log files are copied to a user-specified
directory using the variable ``$LOGDIR`` in **env_run.xml**. The default is a "logs"
(ocn.log.040526-082714, for example). Log files are copied to a user-specified
directory using the variable ``$LOGDIR`` in **env_run.xml**. The default is a "logs"
subdirectory in the **$CASEROOT** directory.

By default, each component also periodically writes history files
Expand Down Expand Up @@ -339,23 +339,23 @@ for a description of output data filenames.
Restarting a run
======================

Active components (and some data components) write restart files
Active components (and some data components) write restart files
at intervals that are dictated by the driver via the setting of the
``$REST_OPTION`` and ``$REST_N`` variables in **env_run.xml**. Restart
files allow the model to stop and then start again with bit-for-bit
exact capability; the model output is exactly the same as if the model
had not stopped. The driver coordinates the writing of restart
files as well as the time evolution of the model.

Runs that are initialized as branch or hybrid runs require
restart/initial files from previous model runs (as specified by the
Runs that are initialized as branch or hybrid runs require
restart/initial files from previous model runs (as specified by the
variables ``$RUN_REFCASE`` and ``$RUN_REFDATE``). Pre-stage these
iles to the case ``$RUNDIR`` (normally ``$EXEROOT/run``)
before the model run starts. Normally this is done by copying the contents
iles to the case ``$RUNDIR`` (normally ``$EXEROOT/run``)
before the model run starts. Normally this is done by copying the contents
of the relevant **$RUN_REFCASE/rest/$RUN_REFDATE.00000** directory.

Whenever a component writes a restart file, it also writes a restart
pointer file in the format **rpointer.$component**. Upon a restart, each
pointer file in the format **rpointer.$component**. Upon a restart, each
component reads the pointer file to determine which file to read in
order to continue the run. These are examples of pointer files created
for a component set using full active model components.
Expand All @@ -382,27 +382,27 @@ Backing up to a previous restart
---------------------------------

If a run encounters problems and crashes, you will normally have to
back up to a previous restart. If short-term archiving is enabled,
back up to a previous restart. If short-term archiving is enabled,
find the latest **$DOUT_S_ROOT/rest/yyyy-mm-dd-ssss/** directory
and copy its contents into your run directory (``$RUNDIR``).

Make sure that the new restart pointer files overwrite older files in
in ``$RUNDIR`` or the job may not restart in the correct place. You can
Make sure that the new restart pointer files overwrite older files in
in ``$RUNDIR`` or the job may not restart in the correct place. You can
then continue the run using the new restarts.

Occasionally, when a run has problems restarting, it is because the
pointer and restart files are out of sync. The pointer files
are text files that can be edited to match the correct dates
pointer and restart files are out of sync. The pointer files
are text files that can be edited to match the correct dates
of the restart and history files. All of the restart files should
have the same date.

============================
Archiving model output data
============================

When a job has run successfully, the component log files are copied
to the directory specified by the **env_run.xml** variable ``$LOGDIR``,
which is set to **$CASEROOT/logs** by default. If the job aborts, log
When a job has run successfully, the component log files are copied
to the directory specified by the **env_run.xml** variable ``$LOGDIR``,
which is set to **$CASEROOT/logs** by default. If the job aborts, log
files are NOT be copied out of the ``$RUNDIR`` directory.

The output data flow from a successful run depends on whether or not
Expand All @@ -421,7 +421,7 @@ Short-term archiving

If short-term archiving is enabled, component output files are moved
to the short-term archiving area on local disk, as specified by
``$DOUT_S_ROOT``. The directory normally is **$EXEROOT/../archive/$CASE.**
``$DOUT_S_ROOT``. The directory normally is **$EXEROOT/../archive/$CASE.**
and has the following directory structure: ::

rest/yyyy-mm-dd-sssss/
Expand All @@ -444,7 +444,7 @@ The **rest/** subdirectory contains a subset of directories that each contains
a *consistent* set of restart files, initial files and rpointer
files. Each subdirectory has a unique name corresponding to the model
year, month, day and seconds into the day when the files were created.
The contents of any restart directory can be used to create a branch run
The contents of any restart directory can be used to create a branch run
or a hybrid run or to back up to a previous restart date.

---------------------
Expand All @@ -457,4 +457,56 @@ long-term archiver tool that supported mass tape storage and HPSS systems.
However, with the industry migration away from tape archives, it is no longer
feasible for CIME to support all the possible archival schemes available.

============================
Data Assimilation and other External Processing
============================

CIME provides a capability to run a task on the compute nodes either
before or after the model run. CIME also provides a data assimilation
capability which will cycle the model and then a user defined task for
a user determined number of cycles.


---------------------
Pre and Post run scripts
---------------------

Variables ``PRERUN_SCRIPT`` and ``POSTRUN_SCRIPT`` can each be used to name
a script which should be exectuted immediately prior starting or
following completion of the CESM executable within the batch
environment. The script is expected to be found in the case directory
and will recieve one argument which is the full path to that
directory. If the script is written in python and contains a
subroutine with the same name as the script, it will be called as a
subroutine rather than as an external shell script.

---------------------
Data Assimilatin scripts
---------------------

Variables ``DATA_ASSIMILATION``, ``DATA_ASSIMILATION_SCRIPT``, and
``DATA_ASSIMILATION_CYCLES`` may also be used to externally control
model evolution. If ``DATA_ASSIMILATION`` is true after the model
completes the ``DATA_ASSIMILATION_SCRIPT`` will be run and then the
model will be started again ``DATA_ASSIMILATION_CYCLES`` times. The
script is expected to be found in the case directory and will recieve
two arguments, the full path to that directory and the cycle number.
If the script is written in python and contains a subroutine with the
same name as the script, it will be called as a subroutine rather than
as an external shell script.

..: A simple example pre run script.

::

#!/usr/bin/env python
import sys
from CIME.case import Case

def myprerun(caseroot):
with Case(caseroot) as case:
print ("rundir is ",case.get_value("RUNDIR"))

if __name__ == "__main__":
caseroot = sys.argv[1]
myprerun(caseroot)
3 changes: 2 additions & 1 deletion scripts/lib/CIME/SystemTests/dae.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import os.path
import logging
import glob
import gzip

import CIME.XML.standard_module_setup as sms
from CIME.SystemTests.system_tests_compare_two import SystemTestsCompareTwo
Expand Down Expand Up @@ -93,7 +94,7 @@ def run_phase(self): # pylint: disable=arguments-differ
for fname in da_files:
found_caseroot = False
found_cycle = False
with open(fname) as dfile:
with gzip.open(fname, "r") as dfile:
for line in dfile:
expect(line[0:5] != 'ERROR', "ERROR, error line found in {}".format(fname))
if line[0:8] == 'caseroot':
Expand Down
Loading

0 comments on commit 0027c84

Please sign in to comment.