Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return maxrc properly for Rsh Worker #448

Merged
merged 6 commits into from
Sep 24, 2020
Merged

Conversation

hawartens
Copy link
Contributor

The current version of the Rsh Worker does not properly return
the max error code with the -S option. This commit adds in a little
"magic" to the end of the rsh command to pass back the return code
when requested. Basically works exactly as pdsh does to extract the
return code.

The current version of the Rsh Worker does not properly return
the max error code with the -S option. This commit adds in a little
"magic" to the end of the rsh command to pass back the return code
when requested. Basically works exactly as pdsh does to extract the
return code.
@degremont
Copy link
Collaborator

Thanks @hawartens for your PR.

I'm seeing that your code change is actually adding this little "magic" and also adding maxrc as new clush.conf option. This is probably worthing adding in the commit or make 2 commits of that.

I'm not against adding this option in config file is you think this is useful, but it seems you did this to propagate the maxrc flag value down to the Worker code. If this is just for this, you can simply add your echo command all the time whatever the maxrc value is.
The problem is I don't understand why maxrc is not working for Rsh. It looks to me rsh is a simple command working like any command runnable with an ExecWorker or like a SshWorker.
Could you explain to me (and maybe in the commit message) what was the problem you are solving with this code?

Checking the code, that would means ExecClient._close() is not working as expected for rsh. Could you clarify what you are seeing?

@hawartens
Copy link
Contributor Author

Thanks @degremont. Correct I added the maxrc option so that I could propagate it nicely down to the worker code a little more cleanly. I wanted the code to behave more like pdsh in this respect and not add the magic unless it was necessary. It is possible to grab maxrc in the Worker by looking at self.worker.eh._display.maxrc but it felt that would be violating your intent. If you prefer, I can package this as two commits (I did it together for the exact reason you surmised).

@degremont
Copy link
Collaborator

Understood for the maxrc config change. But what about the original problem?
Could you explain in details what is not working with RshWorker which is working with ssh or exec ?

@hawartens
Copy link
Contributor Author

Understood for the maxrc config change. But what about the original problem?
Could you explain in details what is not working with RshWorker which is working with ssh or exec ?

Sure. rsh does not correctly set return codes. As far as I know, it never has. See example below:

> cat /tmp/foo.sh
#!/bin/bash
exit 1

> ssh localhost /tmp/foo.sh
> echo $?
1

> rsh localhost /tmp/foo.sh
> echo $?
0

This manage shows the reason:
https://docs.oracle.com/cd/E36784_01/html/E36870/rsh-1.html

rsh does not return the exit status code of command.

@degremont
Copy link
Collaborator

Oh! I understood now how this trick makes sense.
But, in this case, why not simply apply this trick all the time and just update the local rc for each rsh command, as this is done for ssh or exec?
that way, the -S will be automatically handled.

@hawartens
Copy link
Contributor Author

Oh! I understood now how this trick makes sense.
But, in this case, why not simply apply this trick all the time and just update the local rc for each rsh command, as this is done for ssh or exec?
that way, the -S will be automatically handled.

I patterned it after the way it works in Pdsh. We only do the "magic" if the user requests it. This way we do not do any extra work if it is not necessary. If you would prefer to have it done all the time, that is okay with me, was just trying to avoid doing it unless it was specifically asked for.

Copy link
Collaborator

@thiell thiell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR. I would prefer to avoid adding a new clush option maxrc which would be confusing, and rather fix the max return code with the Rsh worker. Users of library could also benefit from this when the Rsh worker is used, and thus get proper return codes in that case. I think the overhead is minimal. Let us know if you can update your PR in that sense.

lib/ClusterShell/Worker/Rsh.py Outdated Show resolved Hide resolved
@degremont
Copy link
Collaborator

I add some coding guidelines for this feature. I'm inline with @thiell and limit this PR to the rc feature. Don't include maxrc change here.

The current version of the Rsh Worker does not properly return
the max error code with the -S option. This commit adds in a little
"magic" to the end of the rsh command to pass back the return code.
@hawartens hawartens requested a review from thiell September 21, 2020 14:24
Copy link
Collaborator

@degremont degremont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! The principles are what I was looking for. I requested small changes.

lib/ClusterShell/Worker/Rsh.py Outdated Show resolved Hide resolved
lib/ClusterShell/Worker/Rsh.py Outdated Show resolved Hide resolved
lib/ClusterShell/Worker/Rsh.py Outdated Show resolved Hide resolved
lib/ClusterShell/Worker/Rsh.py Outdated Show resolved Hide resolved
Make sure to call parent ExecClient._on_nodeset_msgline() instead of
self.worker._on_node_msgline() to ensure special handling takes place.

Make sure to call parent ExecClient._on_nodeset_close() instead of
self.worker._on_node_close() to ensure special handling takes place.

Override _on_nodeset_start() as well.  Wanted to ensure that instance
variable rsh_rc would be set to None (seemed cleaner than overriding
__init__).

Fix error where magic string could be seen by client.
@hawartens hawartens requested a review from degremont September 22, 2020 10:22
Copy link
Collaborator

@degremont degremont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm ok. Just a minor comment. I will let @thiell double check, especially the utf thing.

lib/ClusterShell/Worker/Rsh.py Outdated Show resolved Hide resolved
Move initialization of rsh_rc to __init__()
Copy link
Collaborator

@thiell thiell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good overall. I would just fix the decode() method to avoid the explicit utf-8 encoding. Thanks so much.

lib/ClusterShell/Worker/Rsh.py Outdated Show resolved Hide resolved
@thiell thiell merged commit f46f718 into cea-hpc:master Sep 24, 2020
@hawartens hawartens deleted the maxrc branch September 25, 2020 00:26
@thiell thiell added this to the 1.8.4 milestone Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants