cylc · hjoliver · Dec 17, 2020 · Dec 17, 2020 · Dec 17, 2020 · Dec 17, 2020
diff --git a/.gitignore b/.gitignore
@@ -24,3 +24,4 @@ src/appendices/command-ref.rst
 # auto-generated documentation
 src/user-guide/plugins/main-loop/built-in
 src/user-guide/batch-sys-handlers
+src/user-guide/job-runner-handlers
diff --git a/Makefile b/Makefile
@@ -18,6 +18,7 @@ clean:
 	# remove auto-generated content
 	rm -rf src/user-guide/plugins/main-loop/built-in
 	rm -rf src/user-guide/batch-sys-handlers
+	rm -rf src/user-guide/job-runner-handlers
 
 cleanall:
 	(cd doc; echo [0-9]*.*)

diff --git a/...mplates/automodule_batch_sys_handlers.rst → ...plates/automodule_job_runner_handlers.rst b/...mplates/automodule_batch_sys_handlers.rst → ...plates/automodule_job_runner_handlers.rst
diff --git a/src/glossary.rst b/src/glossary.rst
@@ -523,16 +523,29 @@ Glossary
       * :term:`job submission number`
 
    job host
-      The job host is the compute platform that a :term:`job` runs on. For
-      example ``some-host`` would be the job host for the task ``some-task`` in
-      the following suite:
+      The job host is the compute resource that a :term:`job` runs on. For
+      example ``node_1`` would be one of two possible job hosts on the
+      :term:`platform` ``my_hpc`` for the task ``some-task`` in the
+      following workflow:
 
       .. code-block:: cylc
+         :caption: global.cylc
+
+         [platforms]
+             [[my_hpc]]
+                 hosts = node_1, node_2
+                 job runner = slurm
+
+      .. code-block:: cylc
+         :caption: flow.cylc
 
          [runtime]
              [[some-task]]
-                 [[[remote]]]
-                     host = some-host
+                 platform = my_hpc
+
+      See also:
+
+      * :term:`platform`
 
    job submission number
       Cylc may run multiple :term:`jobs <job>` per :term:`task` (e.g. if the
@@ -545,9 +558,13 @@ Glossary
       * :term:`job`
       * :term:`job script`
 
+   job runner
    batch system
-      A batch system or job scheduler is a system for submitting
-      :term:`jobs <job>` onto a compute platform.
+      A job runner (also known as batch system or job scheduler) is a system
+      for submitting :term:`jobs <job>` to a :term:`job platform <platform>`.
+
+      Job runners are set on a per-platform basis in
+      :cylc:conf:`global.cylc[platforms][<platform name>]job runner`.
 
       See also:
 
@@ -556,15 +573,45 @@ Glossary
       * :term:`directive`
 
    directive
-      Directives are used by :term:`batch systems <batch system>` to determine
+      Directives are used by :term:`job runners <job runner>` to determine
       what a :term:`job's <job>` requirements are, e.g. how much memory
       it requires.
 
       Directives are set in :cylc:conf:`[runtime][<namespace>][directives]`.
 
       See also:
 
-      * :term:`batch system`
+      * :term:`job runner`
+
+   platform
+   job platform
+      A configured setup for running :term:`jobs <job>` on (usually remotely).
+      Platforms are primarily defined by the combination of a
+      :term:`job runner` and a group of :term:`hosts <job host>`
+      (which share a file system).
+
+      For example ``my_hpc`` could be the platform for the task ``some-task``
+      in the following workflow:
+
+      .. code-block:: cylc
+         :caption: global.cylc
+
+         [platforms]
+             [[my_hpc]]
+                 hosts = node_1, node_2
+                 job runner = slurm
+
+      .. code-block:: cylc
+         :caption: flow.cylc
+
+         [runtime]
+             [[some-task]]
+                 platform = my_hpc
+
+      See also:
+
+      * :term:`job host`
+      * :term:`job runner`
 
    scheduler
       When we say that a :term:`suite` is "running" we mean that the cylc

diff --git a/src/suite-design-guide/general-principles.rst b/src/suite-design-guide/general-principles.rst
@@ -273,7 +273,7 @@ submission until the expected data arrival time:
 Clock-triggered tasks typically have to handle late data arrival. Task
 execution *retry delays* can be used to simply retrigger the task at
 intervals until the data is found, but frequently retrying small tasks probably
-should not go to a batch scheduler, and multiple task failures will be logged
+should not go to a :term:`job runner`, and multiple task failures will be logged
 for what is a essentially a normal condition (at least it is normal until the
 data is really late).
 
@@ -300,9 +300,9 @@ so be sure to configure a reasonable interval between polls.
 Task Execution Time Limits
 --------------------------
 
-Instead of setting job wall clock limits directly in batch scheduler
+Instead of setting job wall clock limits directly in :term:`job runner`
 directives, use the ``execution time limit`` suite config item.
-Cylc automatically derives the correct batch scheduler directives from this,
+Cylc automatically derives the correct job runner directives from this,
 and it is also used to run ``background`` and ``at`` jobs via
 the ``timeout`` command, and to poll tasks that haven't reported in
 finished by the configured time limit.
@@ -439,8 +439,8 @@ by the vast majority of tasks. Over-sharing of via root, particularly of
 environment variables, is a maintenance risk because it can be very
 difficult to be sure which tasks are using which global variables.
 
-Any :cylc:conf:`[runtime]` settings can be shared - scripting, host
-and batch scheduler configuration, environment variables, and so on - from
+Any :cylc:conf:`[runtime]` settings can be shared - scripting, platform
+configuration, environment variables, and so on - from
 single items up to complete task or app configurations. At the latter extreme,
 it is quite common to have several tasks that inherit the same complete
 job configuration followed by minor task-specific additions:
@@ -618,7 +618,7 @@ graph:
                RUN_LEN = PT12H
 
 The few differences between ``short_fc`` and ``long_fc``,
-including batch scheduler resource requests, can be configured after common
+including :term:`job runner` resource requests, can be configured after common
 settings are inherited.
 
 At Start-Up

diff --git a/src/suite-design-guide/portable-suites.rst b/src/suite-design-guide/portable-suites.rst
@@ -3,6 +3,8 @@
 Portable Suites
 ===============
 
+.. TODO - platformise all the examples in here
+
 A *portable* or *interoperable* suite can run "out of the box" at
 different sites, or in different environments such as research and operations
 within a site. For convenience we just use the term *site portability*.

diff --git a/src/suites/inherit/single/one/flow.cylc b/src/suites/inherit/single/one/flow.cylc
@@ -15,6 +15,7 @@
             OBS:succeed-all => bar
         """
 
+# TODO: platformise
 [runtime]
     [[root]] # base namespace for all tasks (defines suite-wide defaults)
         [[[job]]]

diff --git a/src/tutorial/runtime/introduction.rst b/src/tutorial/runtime/introduction.rst
@@ -136,9 +136,9 @@ Tasks And Jobs
    Submitted
       When a :term:`task's <task>` dependencies have been met it is ready for
       submission. During this phase the :term:`job script` is created.
-      The :term:`job` is then submitted to the specified batch system.
+      The :term:`job` is then submitted to the specified :term:`job runner`.
       There is more about this in the :ref:`next section
-      <tutorial-batch-system>`.
+      <tutorial-job-runner>`.
    Running
       A :term:`task` is in the "Running" state as soon as the :term:`job` is
       executed.

diff --git a/src/tutorial/runtime/runtime-configuration.rst b/src/tutorial/runtime/runtime-configuration.rst
@@ -3,6 +3,8 @@
 Runtime Configuration
 =====================
 
+.. TODO - platformise all the examples in here
+
 In the last section we associated tasks with scripts and ran a simple suite. In
 this section we will look at how we can configure these tasks.
 
@@ -48,7 +50,7 @@ Environment Variables
    * ``CYLC_TASK_CYCLE_POINT``
 
 
-.. _tutorial-batch-system:
+.. _tutorial-job-runner:
 
 Job Submission
 --------------
@@ -77,8 +79,8 @@ Job Submission
 
    Cylc also executes jobs as `background processes`_ by default.
    When we are running jobs on other compute hosts we will often want to
-   use a :term:`batch system` (`job scheduler`_) to submit our job.
-   Cylc supports the following :term:`batch systems <batch system>`:
+   use a :term:`job runner` to submit our job.
+   Cylc supports the following :term:`job runners <job runner>`:
 
 * at
 * loadleveler
@@ -92,9 +94,9 @@ Job Submission
 
 .. ifnotslides::
 
-   :term:`Batch systems <batch system>` typically require
+   :term:`Job runners <job runner>` typically require
    :term:`directives <directive>` in some form. :term:`Directives <directive>`
-   inform the :term:`batch system` of the requirements of a :term:`job`, for
+   inform the job runner of the requirements of a :term:`job`, for
    example how much memory a given job requires or how many CPUs the job will
    run on. For example:
 
@@ -108,7 +110,7 @@ Job Submission
            [[[remote]]]
                host = big-computer
 
-           # Submit the job using the "slurm" batch system.
+           # Submit the job using the "slurm" job runner.
            [[[job]]]
                batch system = slurm
 
@@ -196,7 +198,7 @@ Start, Stop, Restart
    ``cylc stop --kill``
       When the ``--kill`` option is used Cylc will kill all running jobs
       before stopping. *Cylc can kill jobs on remote hosts and uses the
-      appropriate command when a* :term:`batch system` *is used.*
+      appropriate command when a* :term:`job runner` *is used.*
    ``cylc stop --now --now``
       When the ``--now`` option is used twice Cylc stops straight away, leaving
       any jobs running.
@@ -286,7 +288,7 @@ Start, Stop, Restart
 
       Run `cylc validate` to check for any errors::
 
-          cylc validate .
+         cylc validate .
 
    #. **Add Runtime Configuration For The** ``get_observations`` **Tasks.**
 
@@ -492,4 +494,3 @@ Start, Stop, Restart
            i.e. the final cycle point.
          * ``task-name`` - set this to "forecast".
          * ``submission-number`` - set this to "01".
-
diff --git a/src/user-guide/remote-job-management.rst b/src/user-guide/remote-job-management.rst
@@ -11,9 +11,9 @@ SSH-free Job Management?
 
 Some sites may want to restrict access to job hosts by whitelisting SSH
 connections to allow only ``rsync`` for file transfer, and allowing job
-execution only via a local batch system that sees the job hosts [1]_ .
+execution only via a local :term:`job runner` that sees the job hosts [1]_ .
 We are investigating the feasibility of SSH-free job management when a local
-batch system is available, but this is not yet possible unless your suite
+job runner is available, but this is not yet possible unless your suite
 and job hosts also share a filesystem, which allows Cylc to treat jobs as
 entirely local [2]_ .
 
@@ -25,14 +25,14 @@ Cylc does not have persistent agent processes running on job hosts to act on
 instructions received over the network [3]_ so instead we execute job
 management commands directly on job hosts over SSH. Reasons for this include:
 
-- It works equally for batch system and background jobs.
-- SSH is *required* for background jobs, and for batch jobs if the
-  batch system is not available on the suite host.
-- Querying the batch system alone is not sufficient for full job
+- It works equally for :term:`job runner` and background jobs.
+- SSH is *required* for background jobs, and for jobs in other job runners if the
+  job runner is not available on the suite host.
+- Querying the job runner alone is not sufficient for full job
   polling functionality.
-  
+
   - This is because jobs can complete (and then be forgotten by
-    the batch system) while the network, suite host, or :term:`scheduler` is
+    the job runner) while the network, suite host, or :term:`scheduler` is
     down (e.g. between suite shutdown and restart).
   - To handle this we get the automatic job wrapper code to write
     job messages and exit status to *job status files* that are
@@ -41,7 +41,7 @@ management commands directly on job hosts over SSH. Reasons for this include:
   - Job status files reside on the job host, so the interrogation
     is done over SSH.
 
-- Job status files also hold batch system name and job ID; this is
+- Job status files also hold job runner name and job ID; this is
   written by the job submit command, and read by job poll and kill commands
 
 
@@ -56,10 +56,10 @@ Other Cases Where Cylc Uses SSH Directly
 
 
 .. [1] A malicious script could be ``rsync``'d and run from a batch
-       job, but batch jobs are considered easier to audit.
+       job, but jobs in job runners are considered easier to audit.
 .. [2] The job ID must also be valid to query and kill the job via the local
-       batch system. This is not the case for Slurm, unless the ``--cluster``
-       option is explicitly used in job query and kill commands, otherwise
-       the job ID is not recognized by the local Slurm instance.
+       :term:`job runner`. This is not the case for Slurm, unless the
+       ``--cluster`` option is explicitly used in job query and kill commands,
+       otherwise the job ID is not recognized by the local Slurm instance.
 .. [3] This would be a more complex solution, in terms of implementation,
        administration, and security.
diff --git a/src/user-guide/running-suites.rst b/src/user-guide/running-suites.rst
@@ -3,6 +3,8 @@
 Running Suites
 ==============
 
+.. TODO - platformise
+
 This chapter currently features a diverse collection of topics related
 to running suites.
 
@@ -203,7 +205,7 @@ not automatically resubmitted at restart in case the underlying problem has not
 been addressed yet.
 
 Tasks recorded in the submitted or running states are automatically polled on
-restart, to see if they are still waiting in a batch queue, still running, or
+restart, to see if they are still waiting in a :term:`job runner` queue, still running, or
 if they succeeded or failed while the suite was down. The suite state will be
 updated automatically according to the poll results.
 
@@ -256,9 +258,9 @@ Authentication Files
 Cylc uses `CurveZMQ <http://curvezmq.org/page:read-the-docs/>`_ to ensure that
 any data, sent between the :term:`scheduler <scheduler>` and the client,
 remains protected during transmission. Public keys are used to encrypt the
-data, private keys for decryption. 
+data, private keys for decryption.
 
-Authentication files will be created in your 
+Authentication files will be created in your
 ``$HOME/cylc-run/WORKFLOW/.service/`` directory at start-up. You can expect to
 find one client public key per file system for remote jobs.
 
@@ -304,7 +306,7 @@ outage prevents task success or failure messages getting through, or if the
 :term:`scheduler` itself is down when tasks finish execution.
 
 To poll a task job the :term:`scheduler` interrogates the
-batch system, and the ``job.status`` file, on the job host. This
+:term:`job runner`, and the ``job.status`` file, on the job host. This
 information is enough to determine the final task status even if the
 job finished while the :term:`scheduler` was down or unreachable on
 the network.
@@ -457,7 +459,7 @@ As a suite runs, its task proxies may pass through the following states:
 - **ready** - ready to run (prerequisites satisfied) and
   handed to cylc's job submission sub-system.
 - **submitted** - submitted to run, but not executing yet
-  (could be waiting in an external batch scheduler queue).
+  (could be waiting in an external :term:`job runner` queue).
 - **submit-failed** - job submission failed *or*
   submitted job killed (cancelled) before commencing execution.
 - **submit-retrying** - job submission failed, but a submission retry
@@ -838,11 +840,11 @@ started running, and they still appear in the resource manager queue).
 Loadleveler jobs that are preempted by kill-and-requeue ("job vacation") are
 automatically returned to the submitted state by Cylc. This is possible
 because Loadleveler sends the SIGUSR1 signal before SIGKILL for preemption.
-Other batch schedulers just send SIGTERM before SIGKILL as normal, so Cylc
+Other :term:`job runners <job runner>` just send SIGTERM before SIGKILL as normal, so Cylc
 cannot distinguish a preemption job kill from a normal job kill. After this the
 job will poll as failed (correctly, because it was killed, and the job status
 file records that). To handle this kind of preemption automatically you could
-use a task failed or retry event handler that queries the batch scheduler queue
+use a task failed or retry event handler that queries the job runner queue
 (after an appropriate delay if necessary) and then, if the job has been
 requeued, uses ``cylc reset`` to reset the task to the submitted state.
 
@@ -1052,10 +1054,10 @@ run lengths.
 Limitations Of Suite Simulation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Dummy mode ignores batch scheduler settings because Cylc does not know which
+Dummy mode ignores :term:`job runner` settings because Cylc does not know which
 job resource directives (requested memory, number of compute nodes, etc.) would
 need to be changed for the dummy jobs. If you need to dummy-run jobs on a
-batch scheduler manually comment out ``script`` items and modify
+job runner manually comment out ``script`` items and modify
 directives in your live suite, or else use a custom live mode test suite.
 
 .. note::
@@ -1108,7 +1110,7 @@ a cylc upgrade will not break your own complex
 suites - the triggering check will catch any bug that causes a task to
 run when it shouldn't, for instance; even in a dummy mode reference
 test the full task job script (sans ``script`` items) executes on the
-proper task host by the proper batch system.
+proper task host by the proper :term:`job runner`.
 
 Reference tests can be configured with the following settings: