Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contact: replace ps system call with psutil #4421

Merged
merged 2 commits into from
Sep 27, 2021
Merged

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented Sep 21, 2021

Closes #4416
Closes #3965

  • Replaces the compound pid:command contact file fields with separate
    pid & contact fields.
  • Replaces system ps call with Python psutil.
  • Fixes a bug reported on Alpine Linux & improves portability.
  • Reviewed and updated the detection of stale contact files.
  • Added tests.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.py and
    conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • No documentation update required.

@oliver-sanders oliver-sanders added this to the cylc-8.0b3 milestone Sep 21, 2021
@oliver-sanders oliver-sanders self-assigned this Sep 21, 2021
@oliver-sanders oliver-sanders added the bug Something is wrong :( label Sep 21, 2021
@oliver-sanders oliver-sanders force-pushed the 4416 branch 3 times, most recently from 0861e15 to 2fad30e Compare September 21, 2021 12:42
cmd = ['cylc', 'psutil']
metric = f'[["Process", {pid}]]'
if is_remote_host(host):
cmd = _construct_ssh_cmd(cmd, host)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: construct_ssh_cmd takes platform as an argument and respects the platform config (incl SSH settings) whereas _construct_ssh_cmd takes host as an argument and uses the config for localhost.

We were doing construct_ssh_cmd(get_platform()) before which is effectively the same.

@@ -98,7 +97,6 @@ def daemonize(schd):
"host": schd.host,
"url": workflow_url,
"pub_url": pub_url,
"ps_opts": PS_OPTS,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed one field from the daemonization output.

Comment on lines 1007 to 1010
fields.PID:
str(proc.pid),
fields.COMMAND:
' '.join(proc.cmdline()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the PROCESS contact file field into PID and COMMAND.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look through, looks good, and much clearer than the old code 👍

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. I'm on Windows at the moment, so will test tomorrow with that alpine container from my Linux box 👍 thanks!

"""The process ID of the running workflow on ``CYLC_WORKFLOW_HOST``."""

COMMAND = 'CYLC_WORKFLOW_COMMAND'
"""The command that was used to run the workflow on ``CYLC_WORKFLOW_HOST```.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CYLC_WORKFLOW_HOST has one more quote on its right hand side?

@oliver-sanders oliver-sanders marked this pull request as draft September 22, 2021 14:24
@oliver-sanders
Copy link
Member Author

Integration test is failing on Mac OS (GitHub), putting this on hold until resolved.

Co-authored-by: Bruno P. Kinoshita <kinow@users.noreply.github.com>
@oliver-sanders oliver-sanders marked this pull request as ready for review September 23, 2021 14:04
@oliver-sanders
Copy link
Member Author

The MacOS failure was related to the parallelism of tests on the GH runner, the PID test I had written was flagging false positives which couldn't be safely fixed so I have removed it.

@kinow
Copy link
Member

kinow commented Sep 23, 2021

The MacOS failure was related to the parallelism of tests on the GH runner, the PID test I had written was flagging false positives which couldn't be safely fixed so I have removed it.

A GH macos job is still failing 😰

@oliver-sanders
Copy link
Member Author

Different test thankfully, should pass on rerun.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Alpine container with this diff:

diff --git a/cylc-flow/8.0b2/Dockerfile b/cylc-flow/8.0b2/Dockerfile
index 34ce558..804c351 100644
--- a/cylc-flow/8.0b2/Dockerfile
+++ b/cylc-flow/8.0b2/Dockerfile
@@ -43,7 +43,7 @@ RUN apk update && \
     g++ \
     linux-headers \
     python3-dev && \
-    pip install --no-cache-dir "cylc-flow==${cylc_docker_version}" && \
+    pip install --no-cache-dir git+https://github.com/oliver-sanders/cylc-flow.git@7377b3cffe57ac01102cbc77e0ae669266bd95cf#egg=cylc-flow && \
     apk del build-dependencies
 
 # Add a non-root user
@@ -69,7 +69,7 @@ RUN chown -R ${USER_NAME}:${USER_GROUP} ${USER_ROOT} && \
 
 # FIXME: https://github.com/cylc/cylc-flow/issues/4416
 #        hacking due to incompatible ps args
-RUN sed -i 's/wopid/opid/' /usr/local/lib/python3.7/site-packages/cylc/flow/workflow_files.py
+# RUN sed -i 's/wopid/opid/' /usr/local/lib/python3.7/site-packages/cylc/flow/workflow_files.py

Workflow ran with no issues this time 🎉 Thanks!

@hjoliver
Copy link
Member

tests/f/restart/00-pre-initial.t failing here

Comment on lines +293 to +296
raise CylcError(
f'The workflow is no longer running at {host}:{port}\n'
f'It has moved to {contact_host}:{contact_port}'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes cylc stop --poll to fail if the target workflow got restarted quickly, hence failure of tests/f/restart/00-pre-initial.t. I'll push a quick fix and merge the PR if it works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The fix is to intercept CylcError in the poller, not to avoid raising it here).

@hjoliver hjoliver merged commit 5d11aa7 into cylc:master Sep 27, 2021
@oliver-sanders oliver-sanders deleted the 4416 branch September 27, 2021 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
3 participants