ssh task comms method broken #2463

hjoliver · 2017-10-26T19:57:40Z

The ssh task communication method is broken in 7.5.0 and current master.

For remote command re-invocation by ssh, "--env" options are added to the command line at the remote end, to pass environment variables. In the case of ssh task messaging, this passes crucial elements of the task job environment to "cylc message" on the suite host. These options were being processed by bin/cylc but were omitted from recent refactor (Python to Bash) of that script.

This did not show up in our tests because if ssh task comms fails we automatically fall back to the normal HTTPS method, which works fine at our sites - so aside from some messages in task job.err from the initial failed messaging attempt everything works as expected.

Bug reported by @MartinDix

The text was updated successfully, but these errors were encountered:

hjoliver · 2017-10-26T20:05:44Z

A note on the reason for our general ssh client command re-invocation on remote hosts (reminding myself): it is required to support ssh task messaging, because task jobs may need to execute other client commands beyond just "cylc message" - and if messaging can't go back to the suite host via HTTPS then neither can anything else.

(At this point I somewhat regret the ssh comms method - it might have been more sensible to simply say if you want to use Cylc then get the right ports opened!)

matthewrmshin · 2017-10-26T20:52:00Z

Sorry for the breakage. I suppose:

Our current test is not adequate. I remember adding logic to prevent the test from using Pyro communication via backdoor. I guess the logic no longer works in our latest setup?
Unfortunately, SSH communication is still relevant because cylc suites use a range of ports, so this communication method is the only single port solution for inter-host communication. It is easy to ask a site to open up a single port, but may be more difficult to ask them to open a range of ports.

matthewrmshin · 2017-10-26T20:55:19Z

See also #2214 and #2302.

hjoliver · 2017-10-26T21:44:28Z

The fallback logic is in the program code. If you added logic to the ssh comms tests to somehow prevent fallback, then another possibility is that no one ran remote host tests locally before the 7.5.0 release - we're relying too much on Travis CI...

matthewrmshin · 2017-10-27T05:45:03Z

There should be a test. I run remote tests on our site regularly. I cannot fully trust Travis tests as we have very different environment.

hjoliver · 2017-10-27T06:00:20Z

I agree! Not quite sure how we missed this one, although I have looked.at the tests yet ... ⁣Sent from BlueMail

…

On 27 Oct 2017, 6:45 PM, at 6:45 PM, Matt Shin ***@***.***> wrote: There should be a test. I run remote tests on our site regularly. I cannot fully trust Travis tests as we have very different environment. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #2463 (comment)

matthewrmshin · 2017-10-27T07:55:20Z

Just looked at the test. We used to rely on the passphrase being installed manually on remote job hosts or by Rose. It is now done automatically by cylc job management. The comment in the test suggests that we rely on the old behaviour for it to test the right thing, (hence don't install the passphrase file). Definitely need a better test.

matthewrmshin · 2017-10-27T19:45:50Z

I wonder if we should combine #2214 (with essence of #2302) into the fix for this.

Can we assume that we don't need to support SSH communication from hosts with shared file system with suite host? This will make it very easy to do client commands in tasks: detect ssh comms #2214.
We should move the SSH+HTTP(S) communication logic from the messaging module to the network API client module. We'll introduce a simple command on the suite host to handle any method in the API client.
- The command will be expected to be invoked via SSH (although not necessarily), perhaps with STDIN being used to accept the API method arguments encoded in JSON. (Note: not quite Suite API - On-The-Fly #2123!)
- In theory, this should allow even the GUI on a remote host to do SSH+HTTP(S) communication with a suite.

matthewrmshin · 2017-10-30T15:40:48Z

(I have decided on a basic fix to restore the functionality for now.)

hjoliver · 2017-10-31T08:11:04Z

(yes, agreed that a basic fix is best for next release)

hjoliver · 2017-10-31T08:13:52Z

Can we assume that we don't need to support SSH communication from hosts with shared file system with suite host?

How do you figure that? Implement a new task comms method that uses the filesystem?

hjoliver · 2017-10-31T08:15:10Z

I like the suggested shift to network API

matthewrmshin · 2017-10-31T09:18:21Z

What I really meant was that, could we assume that a host with a shared file system with the suite host would be able to use HTTP(S) communication? (So we don't get into all sort of complicated scenarios discussed in #2214.)

hjoliver · 2017-10-31T09:33:05Z

I guess we could recommend if you have a shared FS between suite and task host, do not use ssh task comms. But obviously presence of a shared FS does not actually have any bearing on whether or or the HTTPS comms ports are open between the hosts. Or are you suggesting that it implies the two hosts are in the same security zone and thus getting the ports opened shouldn't be a problem?

matthewrmshin · 2017-10-31T09:51:10Z

Yes. In other word, can we get away with supporting SSH+HTTP(S) communication only in the case where the remote host does not have a shared file system with the suite host? Otherwise, we'll need to find ways to get around the complication discussed in #2214.

(Otherwise, we should consider ways to make HTTP(S) communication friendlier to sites that do not want to open up a range of ports.)

hjoliver · 2017-10-31T10:07:51Z

OK, I understand why we'd rather not support ssh comms if there is a shared filesystem, I'm just concerned that a shared filesystem does not automatically imply HTTPS comms will be allowed (and so so we can't assume that). However, I think it is more likely to be allowed at such sites, and it should be fine to simply document why we don't support ssh comms for that case.

hjoliver · 2017-10-31T20:34:40Z

(unless we can think of an easy way to support ssh comms even with a shared filesystem)

matthewrmshin · 2017-10-31T22:33:31Z

I suppose we can have a system such that when SSH+HTTP(S) communication is specified in the contact file, it will be used regardless unless the current host is the suite host itself. It may overkill (or not work?) on a mixed network, but that should be very unusual.

hjoliver · 2017-10-31T22:53:50Z

If the solution is overly complex (with ongoing support implications etc.) - and especially if it fails on a mixed network - then I'd actually rather not do it, and just document to users why they cannot use ssh comms in this situtation.

hjoliver added the bug Something is wrong :( label Oct 26, 2017

hjoliver added this to the next release milestone Oct 26, 2017

matthewrmshin self-assigned this Oct 30, 2017

matthewrmshin mentioned this issue Oct 30, 2017

Fix cylc message by SSH #2464

Merged

oliver-sanders closed this as completed in #2464 Nov 3, 2017

matthewrmshin mentioned this issue Nov 3, 2017

client commands in tasks: detect ssh comms #2214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ssh task comms method broken #2463

ssh task comms method broken #2463

hjoliver commented Oct 26, 2017 •

edited

Loading

hjoliver commented Oct 26, 2017

matthewrmshin commented Oct 26, 2017

matthewrmshin commented Oct 26, 2017

hjoliver commented Oct 26, 2017

matthewrmshin commented Oct 27, 2017

hjoliver commented Oct 27, 2017 via email

matthewrmshin commented Oct 27, 2017

matthewrmshin commented Oct 27, 2017

matthewrmshin commented Oct 30, 2017

hjoliver commented Oct 31, 2017

hjoliver commented Oct 31, 2017

hjoliver commented Oct 31, 2017

matthewrmshin commented Oct 31, 2017

hjoliver commented Oct 31, 2017

matthewrmshin commented Oct 31, 2017

hjoliver commented Oct 31, 2017

hjoliver commented Oct 31, 2017

matthewrmshin commented Oct 31, 2017

hjoliver commented Oct 31, 2017

ssh task comms method broken #2463

ssh task comms method broken #2463

Comments

hjoliver commented Oct 26, 2017 • edited Loading

hjoliver commented Oct 26, 2017

matthewrmshin commented Oct 26, 2017

matthewrmshin commented Oct 26, 2017

hjoliver commented Oct 26, 2017

matthewrmshin commented Oct 27, 2017

hjoliver commented Oct 27, 2017 via email

matthewrmshin commented Oct 27, 2017

matthewrmshin commented Oct 27, 2017

matthewrmshin commented Oct 30, 2017

hjoliver commented Oct 31, 2017

hjoliver commented Oct 31, 2017

hjoliver commented Oct 31, 2017

matthewrmshin commented Oct 31, 2017

hjoliver commented Oct 31, 2017

matthewrmshin commented Oct 31, 2017

hjoliver commented Oct 31, 2017

hjoliver commented Oct 31, 2017

matthewrmshin commented Oct 31, 2017

hjoliver commented Oct 31, 2017

hjoliver commented Oct 26, 2017 •

edited

Loading