Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix an error handling bug in Cylc Scan #5196

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,15 @@ updated. Only the first match gets replaced, so it's fine to leave the old
ones in. -->

-------------------------------------------------------------------------------
## __cylc-8.0.4 (<span actions:bind='release-date'>Pending YYYY-MM-DD</span>)__

Maintenance release.

### Fixes

[#5196](https://github.com/cylc/cylc-flow/pull/5196) - Replace traceback
with warning for scan error messages where Workflow is stopped.

## __cylc-8.0.3 (<span actions:bind='release-date'>Released 2022-10-17</span>)__

Maintenance release.
Expand Down
2 changes: 1 addition & 1 deletion cylc/flow/network/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ def _timeout_handler(workflow: str, host: str, port: Union[int, str]):
host (str): host name
port (Union[int, str]): port number
Raises:
ClientError: if the workflow has already stopped.
WorkflowStopped: if the workflow has already stopped.
"""
if workflow is None:
return
Expand Down
4 changes: 4 additions & 0 deletions cylc/flow/network/scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,7 @@ async def graphql_query(flow, fields, filters=None):
"""
query = f'query {{ workflows(ids: ["{flow["name"]}"]) {{ {fields} }} }}'
try:

client = WorkflowRuntimeClient(
flow['name'],
# use contact_info data if present for efficiency
Expand All @@ -468,6 +469,9 @@ async def graphql_query(flow, fields, filters=None):
'variables': {}
}
)
except WorkflowStopped:
LOG.warning(f'Workflow not running: {flow["name"]}')
return False
except ClientTimeout:
LOG.exception(
f'Timeout: name: {flow["name"]}, '
Expand Down
1 change: 0 additions & 1 deletion cylc/flow/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,6 @@ class Scheduler:
is_restart: bool

# directories
workflow_dir: str
workflow_log_dir: str
workflow_run_dir: str
workflow_share_dir: str
Expand Down
47 changes: 46 additions & 1 deletion tests/integration/test_scan_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ async def test_scan_cleans_stuck_contact_files(
rmtree(tmp_dir)

# the old contact file check uses the CLI command that the flow was run
# with to check that whether the flow is running. Because this is an
# with to check whether the flow is running. Because this is an
# integration test the process is the pytest process and it is still
# running so we need to change the command so that Cylc sees the flow as
# having crashed
Expand All @@ -289,3 +289,48 @@ async def test_scan_cleans_stuck_contact_files(

# the contact file should have been removed by the scan
assert not cont.exists()


async def test_scan_fail_well_when_client_unreachable(
start,
scheduler,
flow,
one_conf,
run_dir,
test_dir,
caplog,
):
"""It handles WorkflowRuntimeClient.async_request raising a WorkflowStopped
elegently.
"""
# create a flow
reg = flow(one_conf, name='-crashed-')
schd = scheduler(reg)
srv_dir = Path(run_dir, reg, WorkflowFiles.Service.DIRNAME)
tmp_dir = test_dir / 'srv'

# run the flow, copy the contact, stop the flow, copy back the contact
async with start(schd):
copytree(srv_dir, tmp_dir)
rmtree(srv_dir)
copytree(tmp_dir, srv_dir)
rmtree(tmp_dir)

# the old contact file check uses the CLI command that the flow was run
# with to check whether the flow is running. Because this is an
# integration test the process is the pytest process and it is still
# running so we need to change the command so that Cylc sees the flow as
# having crashed
contact_info = load_contact_file(reg)
contact_info[ContactFileFields.COMMAND] += 'xyz'
dump_contact_file(reg, contact_info)

# Run Cylc Scan
opts = ScanOptions(states='all', format='rich', ping=True)
flows = []
await main(opts, write=flows.append, scan_dir=test_dir)

# Check that the records contain a message but not an error
rec = caplog.records[-1]
assert not rec.exc_text
assert 'Workflow not running' in rec.msg