Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platforms.eval platform cmd #3791

Merged
merged 6 commits into from
Sep 23, 2020
Merged

Conversation

wxtim
Copy link
Member

@wxtim wxtim commented Aug 26, 2020

main response to #3672

Synopsis

As per the deprecation examples in the platforms proposal Cylc 8 is to be able to evaluate either:

  • [TASK]platform = $(some-shell-script.sh)
  • [TASK][remote]host = $(some-shell-script.sh)
    In a similar way at job submission time.

This PR attempts to create this behavior.

Work undertaken

  • Modified the logic to work with an example suite.
  • Written functional tests for each of the main scenarios this logic should handle:
    • Not interfering with platform = name and host = name.
    • Correctly expanding subshell host and platform settings.
    • Throwing a wobbly (raise Exception) if one tries to set both host and platform before attempting to expand subshells to avoid unnecessary calls to remote_host_select.

Follow on work not undertaken

  • No attempt to update nomenclature of items in remote_host_select - should be straightforward, but would confuse the issue.
  • No attempt to stop using backticks for platform.

@wxtim wxtim self-assigned this Aug 26, 2020
@wxtim wxtim marked this pull request as draft August 26, 2020 14:50
@wxtim wxtim force-pushed the platforms.eval_platform_cmd_II branch from 98927a3 to 4c187ad Compare September 2, 2020 09:39
@wxtim wxtim marked this pull request as ready for review September 2, 2020 11:03
@wxtim wxtim marked this pull request as draft September 2, 2020 11:03
@oliver-sanders oliver-sanders added this to the cylc-8.0a3 milestone Sep 4, 2020
# Check that platform upgrader will fail if, after inheritance but not
# before a task has both old and new settings - This should be a fail on
# Job Submit.
# Parent and child tasks are both valid, before inheritance calculated.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but not worth a full, separate PR.

@wxtim wxtim force-pushed the platforms.eval_platform_cmd_II branch 2 times, most recently from 03e7abe to 78f5559 Compare September 4, 2020 15:17
@wxtim wxtim marked this pull request as ready for review September 4, 2020 15:22
@wxtim wxtim requested review from datamel, MetRonnie and hjoliver and removed request for MetRonnie September 4, 2020 15:23
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, testing to follow.

One legitimate test failure - tests/f/job-submission/18-platform_select.t

@oliver-sanders oliver-sanders removed the request for review from datamel September 4, 2020 15:40
@wxtim wxtim force-pushed the platforms.eval_platform_cmd_II branch 2 times, most recently from bf13226 to 87436b9 Compare September 4, 2020 15:54
@wxtim
Copy link
Member Author

wxtim commented Sep 4, 2020

tests/f/job-submission/18-platform_select.t

My fault - Dave suggested that I was raising the wrong error, so I changed the error I raised...

@wxtim wxtim force-pushed the platforms.eval_platform_cmd_II branch from 7880997 to d1082d2 Compare September 4, 2020 16:28
@hjoliver
Copy link
Member

One test failing - fixed by #3806?

Update cylc/flow/platforms.py

Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>

fix broken test

testfix
@wxtim wxtim force-pushed the platforms.eval_platform_cmd_II branch from d1082d2 to a015191 Compare September 14, 2020 07:54
@wxtim
Copy link
Member Author

wxtim commented Sep 14, 2020

One test failing - fixed by #3806?

Looks very much like it. 👍🏼

Can I request re-review from @oliver-sanders and @hjoliver

@oliver-sanders
Copy link
Member

Closes #3672?

@oliver-sanders
Copy link
Member

Looks good, testing I spotted some unexpected behaviour:

A non existent platform will result in a PlatformLookupError being raised which causes the suite to shutdown, I would have expected a submit failure:

platform = $(echo 'elephant')
2020-09-15T12:54:30+01:00 ERROR - PlatformLookupError: No matching platform "elephant" found

There seems to be some unintended pattern matching going on, for instance localhost-xyz-mess seems to get evaluated as localhost rather than resulting in a submit failure:

platform = $(echo 'localhost-xyz-mess')
2020-09-15T12:53:40+01:00 DEBUG - for task a.1: platform = $(echo 'localhost-xyz-mess') evaluated as localhost

_ = rtconfig[itask.tdef.run_mode + ' mode']['disable retries']
except KeyError:
retry = True
if (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be faster since in most cases the try was going to fail.

@wxtim
Copy link
Member Author

wxtim commented Sep 15, 2020

A non existent platform will result in a PlatformLookupError being raised which causes the suite to shutdown, I would have expected a submit failure:

I think I've fixed this in commit 3175fee

There seems to be some unintended pattern matching going on, for instance localhost-xyz-mess seems to get evaluated as localhost rather than resulting in a submit failure:

I think I have fixed this in commit e6838a6

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting a submit failure now for platform=$(echo elephant) 👍. Sadly also getting this (harmless) traceback:

2020-09-15T15:55:08+01:00 ERROR - [Errno 2] No such file or directory:
	'/Users/oliver/cylc-run/foo/log/job/1/foo/01/job-activity.log'
	Traceback (most recent call last):
	  File "/Users/oliver/cylc-flow/cylc/flow/task_job_mgr.py", line 865, in
	_prep_submit_task_job
	    platform = get_platform(rtconfig)
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 82, in get_platform
	    output = platform_from_name(task_conf['platform'])
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 184, in
	platform_from_name
	    f"No matching platform \"{platform_name}\" found")
	cylc.flow.exceptions.PlatformLookupError: No matching platform "elephant" found
	During handling of the above exception, another exception occurred:
	Traceback (most recent call last):
	  File "/Users/oliver/cylc-flow/cylc/flow/task_events_mgr.py", line 87, in
	log_task_job_activity
	    with open(os.path.expandvars(job_activity_log), "ab") as handle:
	FileNotFoundError: [Errno 2] No such file or directory: '/Users/oliver/cylc-
	run/foo/log/job/1/foo/01/job-activity.log'

I think the old logic may have created the job activity log on failure to avoid this?

@wxtim
Copy link
Member Author

wxtim commented Sep 17, 2020

Yes, @oliver-sanders pointed out that at some point I had deleted the line self._create_job_log_path(suite, itask) in task_job_mgr which is required to create the job-activity.log

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good I think.

The log messages contain a bit of duplicated info and could do with being tidied up a bit:

DEBUG - ['bash', '-c', 'echo
	localhost']
DEBUG - [remote-host-select cmd] bash -c
	'echo localhost'
	[remote-host-select ret_code] 0
	[remote-host-select out] localhost
DEBUG - for task foo.1: platform =
	$(echo localhost) evaluated as localhost

One message should suffice for this case, note the second log message comes from the subprocpool and contains all the required info.

DEBUG - Traceback (most recent call
	last):
	  File "/Users/oliver/cylc-flow/cylc/flow/task_job_mgr.py", line 866, in
	_prep_submit_task_job
	    platform = get_platform(rtconfig)
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 82, in get_platform
	    output = platform_from_name(task_conf['platform'])
	  File "/Users/oliver/cylc-flow/cylc/flow/platforms.py", line 184, in
	platform_from_name
	    f"No matching platform \"{platform_name}\" found")
	cylc.flow.exceptions.PlatformLookupError: No matching platform "elephant" found

ERROR - No matching platform "elephant" found
ERROR - [jobs-submit cmd] (platform not defined)
	[jobs-submit ret_code] 1
	[jobs-submit err] No matching platform "elephant" found

One message should suffice for this case, the traceback isn't helpful here as we already know where the exception was raised from.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @wxtim 👍

@hjoliver hjoliver merged commit 130950d into cylc:master Sep 23, 2020
@hjoliver
Copy link
Member

@wxtim - can you address the log duplication as a follow-up?

@hjoliver hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021
@wxtim wxtim deleted the platforms.eval_platform_cmd_II branch March 22, 2022 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants