-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ssh task comms method broken #2463
Comments
A note on the reason for our general ssh client command re-invocation on remote hosts (reminding myself): it is required to support ssh task messaging, because task jobs may need to execute other client commands beyond just "cylc message" - and if messaging can't go back to the suite host via HTTPS then neither can anything else. (At this point I somewhat regret the ssh comms method - it might have been more sensible to simply say if you want to use Cylc then get the right ports opened!) |
Sorry for the breakage. I suppose:
|
The fallback logic is in the program code. If you added logic to the ssh comms tests to somehow prevent fallback, then another possibility is that no one ran remote host tests locally before the 7.5.0 release - we're relying too much on Travis CI... |
There should be a test. I run remote tests on our site regularly. I cannot fully trust Travis tests as we have very different environment. |
I agree! Not quite sure how we missed this one, although I have looked.at the tests yet ...
Sent from BlueMail
…On 27 Oct 2017, 6:45 PM, at 6:45 PM, Matt Shin ***@***.***> wrote:
There should be a test. I run remote tests on our site regularly. I
cannot fully trust Travis tests as we have very different environment.
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#2463 (comment)
|
Just looked at the test. We used to rely on the passphrase being installed manually on remote job hosts or by Rose. It is now done automatically by cylc job management. The comment in the test suggests that we rely on the old behaviour for it to test the right thing, (hence don't install the passphrase file). Definitely need a better test. |
I wonder if we should combine #2214 (with essence of #2302) into the fix for this.
|
(I have decided on a basic fix to restore the functionality for now.) |
(yes, agreed that a basic fix is best for next release) |
How do you figure that? Implement a new task comms method that uses the filesystem? |
I like the suggested shift to network API |
What I really meant was that, could we assume that a host with a shared file system with the suite host would be able to use HTTP(S) communication? (So we don't get into all sort of complicated scenarios discussed in #2214.) |
I guess we could recommend if you have a shared FS between suite and task host, do not use ssh task comms. But obviously presence of a shared FS does not actually have any bearing on whether or or the HTTPS comms ports are open between the hosts. Or are you suggesting that it implies the two hosts are in the same security zone and thus getting the ports opened shouldn't be a problem? |
Yes. In other word, can we get away with supporting SSH+HTTP(S) communication only in the case where the remote host does not have a shared file system with the suite host? Otherwise, we'll need to find ways to get around the complication discussed in #2214. (Otherwise, we should consider ways to make HTTP(S) communication friendlier to sites that do not want to open up a range of ports.) |
OK, I understand why we'd rather not support ssh comms if there is a shared filesystem, I'm just concerned that a shared filesystem does not automatically imply HTTPS comms will be allowed (and so so we can't assume that). However, I think it is more likely to be allowed at such sites, and it should be fine to simply document why we don't support ssh comms for that case. |
(unless we can think of an easy way to support ssh comms even with a shared filesystem) |
I suppose we can have a system such that when SSH+HTTP(S) communication is specified in the contact file, it will be used regardless unless the current host is the suite host itself. It may overkill (or not work?) on a mixed network, but that should be very unusual. |
If the solution is overly complex (with ongoing support implications etc.) - and especially if it fails on a mixed network - then I'd actually rather not do it, and just document to users why they cannot use ssh comms in this situtation. |
The ssh task communication method is broken in 7.5.0 and current master.
For remote command re-invocation by ssh, "--env" options are added to the command line at the remote end, to pass environment variables. In the case of ssh task messaging, this passes crucial elements of the task job environment to "cylc message" on the suite host. These options were being processed by
bin/cylc
but were omitted from recent refactor (Python to Bash) of that script.This did not show up in our tests because if ssh task comms fails we automatically fall back to the normal HTTPS method, which works fine at our sites - so aside from some messages in task job.err from the initial failed messaging attempt everything works as expected.
Bug reported by @MartinDix
The text was updated successfully, but these errors were encountered: