Platforms.eval platform cmd #3791

wxtim · 2020-08-26T14:50:33Z

main response to #3672

Synopsis

As per the deprecation examples in the platforms proposal Cylc 8 is to be able to evaluate either:

[TASK]platform = $(some-shell-script.sh)
[TASK][remote]host = $(some-shell-script.sh)
In a similar way at job submission time.

This PR attempts to create this behavior.

Work undertaken

Modified the logic to work with an example suite.
Written functional tests for each of the main scenarios this logic should handle:
- Not interfering with platform = name and host = name.
- Correctly expanding subshell host and platform settings.
- Throwing a wobbly (raise Exception) if one tries to set both host and platform before attempting to expand subshells to avoid unnecessary calls to remote_host_select.

Follow on work not undertaken

No attempt to update nomenclature of items in remote_host_select - should be straightforward, but would confuse the issue.
No attempt to stop using backticks for platform.

cylc/flow/task_job_mgr.py

wxtim · 2020-09-04T14:00:13Z

tests/functional/validate/74-host-to-platform-upgrade-fail-inherit.t

-# Check that platform upgrader will fail if, after inheritance but not
-# before a task has both old and new settings - This should be a fail on
-# Job Submit.
+# Parent and child tasks are both valid, before inheritance calculated.


Unrelated, but not worth a full, separate PR.

oliver-sanders

Looks fine, testing to follow.

One legitimate test failure - tests/f/job-submission/18-platform_select.t

cylc/flow/platforms.py

wxtim · 2020-09-04T16:00:11Z

tests/f/job-submission/18-platform_select.t

My fault - Dave suggested that I was raising the wrong error, so I changed the error I raised...

tests/functional/job-submission/15-garbage-host-command-2.t

hjoliver · 2020-09-14T00:01:50Z

One test failing - fixed by #3806?

Update cylc/flow/platforms.py Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk> fix broken test testfix

wxtim · 2020-09-14T07:58:10Z

One test failing - fixed by #3806?

Looks very much like it. 👍🏼

Can I request re-review from @oliver-sanders and @hjoliver

oliver-sanders · 2020-09-15T11:47:42Z

Closes #3672?

oliver-sanders · 2020-09-15T11:58:02Z

Looks good, testing I spotted some unexpected behaviour:

A non existent platform will result in a PlatformLookupError being raised which causes the suite to shutdown, I would have expected a submit failure:

platform = $(echo 'elephant')

2020-09-15T12:54:30+01:00 ERROR - PlatformLookupError: No matching platform "elephant" found

There seems to be some unintended pattern matching going on, for instance localhost-xyz-mess seems to get evaluated as localhost rather than resulting in a submit failure:

platform = $(echo 'localhost-xyz-mess')

2020-09-15T12:53:40+01:00 DEBUG - for task a.1: platform = $(echo 'localhost-xyz-mess') evaluated as localhost

wxtim · 2020-09-15T14:37:01Z

cylc/flow/task_job_mgr.py

-            _ = rtconfig[itask.tdef.run_mode + ' mode']['disable retries']
-        except KeyError:
-            retry = True
+        if (


This should be faster since in most cases the try was going to fail.

wxtim · 2020-09-15T14:38:42Z

A non existent platform will result in a PlatformLookupError being raised which causes the suite to shutdown, I would have expected a submit failure:

I think I've fixed this in commit 3175fee

There seems to be some unintended pattern matching going on, for instance localhost-xyz-mess seems to get evaluated as localhost rather than resulting in a submit failure:

I think I have fixed this in commit e6838a6

oliver-sanders

Getting a submit failure now for platform=$(echo elephant) 👍. Sadly also getting this (harmless) traceback:

2020-09-15T15:55:08+01:00 ERROR - [Errno 2] No such file or directory:
	'/Users/oliver/cylc-run/foo/log/job/1/foo/01/job-activity.log'
	Traceback (most recent call last):
	  File "/Users/oliver/cylc-flow/cylc/flow/task_job_mgr.py", line 865, in
	_prep_submit_task_job
	    platform = get_platform(rtconfig)
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 82, in get_platform
	    output = platform_from_name(task_conf['platform'])
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 184, in
	platform_from_name
	    f"No matching platform \"{platform_name}\" found")
	cylc.flow.exceptions.PlatformLookupError: No matching platform "elephant" found
	During handling of the above exception, another exception occurred:
	Traceback (most recent call last):
	  File "/Users/oliver/cylc-flow/cylc/flow/task_events_mgr.py", line 87, in
	log_task_job_activity
	    with open(os.path.expandvars(job_activity_log), "ab") as handle:
	FileNotFoundError: [Errno 2] No such file or directory: '/Users/oliver/cylc-
	run/foo/log/job/1/foo/01/job-activity.log'

I think the old logic may have created the job activity log on failure to avoid this?

wxtim · 2020-09-17T09:49:03Z

Yes, @oliver-sanders pointed out that at some point I had deleted the line self._create_job_log_path(suite, itask) in task_job_mgr which is required to create the job-activity.log

oliver-sanders

All good I think.

The log messages contain a bit of duplicated info and could do with being tidied up a bit:

DEBUG - ['bash', '-c', 'echo
	localhost']
DEBUG - [remote-host-select cmd] bash -c
	'echo localhost'
	[remote-host-select ret_code] 0
	[remote-host-select out] localhost
DEBUG - for task foo.1: platform =
	$(echo localhost) evaluated as localhost

One message should suffice for this case, note the second log message comes from the subprocpool and contains all the required info.

DEBUG - Traceback (most recent call
	last):
	  File "/Users/oliver/cylc-flow/cylc/flow/task_job_mgr.py", line 866, in
	_prep_submit_task_job
	    platform = get_platform(rtconfig)
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 82, in get_platform
	    output = platform_from_name(task_conf['platform'])
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 184, in
	platform_from_name
	    f"No matching platform \"{platform_name}\" found")
	cylc.flow.exceptions.PlatformLookupError: No matching platform "elephant" found

ERROR - No matching platform "elephant" found
ERROR - [jobs-submit cmd] (platform not defined)
	[jobs-submit ret_code] 1
	[jobs-submit err] No matching platform "elephant" found

One message should suffice for this case, the traceback isn't helpful here as we already know where the exception was raised from.

hjoliver

Looks good, thanks @wxtim 👍

hjoliver · 2020-09-23T03:54:16Z

@wxtim - can you address the log duplication as a follow-up?

wxtim self-assigned this Aug 26, 2020

wxtim marked this pull request as draft August 26, 2020 14:50

wxtim force-pushed the platforms.eval_platform_cmd_II branch from 98927a3 to 4c187ad Compare September 2, 2020 09:39

wxtim marked this pull request as ready for review September 2, 2020 11:03

wxtim marked this pull request as draft September 2, 2020 11:03

wxtim commented Sep 2, 2020

View reviewed changes

cylc/flow/task_job_mgr.py Outdated Show resolved Hide resolved

oliver-sanders added this to the cylc-8.0a3 milestone Sep 4, 2020

wxtim commented Sep 4, 2020

View reviewed changes

wxtim force-pushed the platforms.eval_platform_cmd_II branch 2 times, most recently from 03e7abe to 78f5559 Compare September 4, 2020 15:17

wxtim marked this pull request as ready for review September 4, 2020 15:22

wxtim requested review from datamel, MetRonnie and hjoliver and removed request for MetRonnie September 4, 2020 15:23

oliver-sanders reviewed Sep 4, 2020

View reviewed changes

cylc/flow/platforms.py Show resolved Hide resolved

cylc/flow/platforms.py Outdated Show resolved Hide resolved

oliver-sanders removed the request for review from datamel September 4, 2020 15:40

wxtim force-pushed the platforms.eval_platform_cmd_II branch 2 times, most recently from bf13226 to 87436b9 Compare September 4, 2020 15:54

wxtim force-pushed the platforms.eval_platform_cmd_II branch from 7880997 to d1082d2 Compare September 4, 2020 16:28

oliver-sanders reviewed Sep 11, 2020

View reviewed changes

tests/functional/job-submission/15-garbage-host-command-2.t Outdated Show resolved Hide resolved

wxtim added 2 commits September 14, 2020 08:53

evaluate platform subshell commands

cef4604

Update cylc/flow/platforms.py Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk> fix broken test testfix

remove unwanted test-platform

a015191

wxtim force-pushed the platforms.eval_platform_cmd_II branch from d1082d2 to a015191 Compare September 14, 2020 07:54

wxtim requested a review from oliver-sanders September 14, 2020 07:58

Rename garbage host to garbage platform tests

0f95934

fixed: PlatformLookUpError at job submit acts like job submit error.

3175fee

wxtim commented Sep 15, 2020

View reviewed changes

oliver-sanders reviewed Sep 15, 2020

View reviewed changes

wxtim added 2 commits September 15, 2020 16:01

fix host check in subshell_eval

e6838a6

fixed failure to create job log

e4fa018

wxtim mentioned this pull request Sep 18, 2020

Platforms.eval platform cmd.no backticks #3826

Merged

oliver-sanders approved these changes Sep 22, 2020

View reviewed changes

hjoliver approved these changes Sep 23, 2020

View reviewed changes

hjoliver merged commit 130950d into cylc:master Sep 23, 2020

This was referenced Sep 23, 2020

Cleaned up logging a little #3836

Merged

platforms: allow sub-shell platform names #3672

Closed

hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021

MetRonnie mentioned this pull request May 27, 2021

this tests appears to have been inoperative for some time #4229

Merged

7 tasks

wxtim deleted the platforms.eval_platform_cmd_II branch March 22, 2022 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Platforms.eval platform cmd #3791

Platforms.eval platform cmd #3791

wxtim commented Aug 26, 2020 •

edited

Loading

wxtim Sep 4, 2020

oliver-sanders left a comment •

edited

Loading

wxtim commented Sep 4, 2020

hjoliver commented Sep 14, 2020

wxtim commented Sep 14, 2020 •

edited

Loading

oliver-sanders commented Sep 15, 2020

oliver-sanders commented Sep 15, 2020

wxtim Sep 15, 2020

wxtim commented Sep 15, 2020 •

edited

Loading

oliver-sanders left a comment •

edited

Loading

wxtim commented Sep 17, 2020

oliver-sanders left a comment

hjoliver left a comment

hjoliver commented Sep 23, 2020

Platforms.eval platform cmd #3791

Platforms.eval platform cmd #3791

Conversation

wxtim commented Aug 26, 2020 • edited Loading

Synopsis

Work undertaken

Follow on work not undertaken

wxtim Sep 4, 2020

Choose a reason for hiding this comment

oliver-sanders left a comment • edited Loading

Choose a reason for hiding this comment

wxtim commented Sep 4, 2020

hjoliver commented Sep 14, 2020

wxtim commented Sep 14, 2020 • edited Loading

oliver-sanders commented Sep 15, 2020

oliver-sanders commented Sep 15, 2020

wxtim Sep 15, 2020

Choose a reason for hiding this comment

wxtim commented Sep 15, 2020 • edited Loading

oliver-sanders left a comment • edited Loading

Choose a reason for hiding this comment

wxtim commented Sep 17, 2020

oliver-sanders left a comment

Choose a reason for hiding this comment

hjoliver left a comment

Choose a reason for hiding this comment

hjoliver commented Sep 23, 2020

wxtim commented Aug 26, 2020 •

edited

Loading

oliver-sanders left a comment •

edited

Loading

wxtim commented Sep 14, 2020 •

edited

Loading

wxtim commented Sep 15, 2020 •

edited

Loading

oliver-sanders left a comment •

edited

Loading