Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile.sh misdetects login shell #473

Closed
mw-a opened this issue Sep 12, 2022 · 3 comments
Closed

profile.sh misdetects login shell #473

mw-a opened this issue Sep 12, 2022 · 3 comments
Labels
Milestone

Comments

@mw-a
Copy link

mw-a commented Sep 12, 2022

Describe the bug

We're seeing error messages from OpenPBS jobs as generated by the reproducer further down. This happens for users whose login shell is /bin/tcsh but override their job to use /bin/bash for executing the job script. It can be traced to /etc/profile.d/modules.sh which sources /usr/share/Modules/init/tcsh because it wrongly detects the executing shell to be tcsh. Further investigation reveals this to be caused by the optimisation/fix introduced in #173 to use the $BASH variable to make detection more efficient and robust. Unfortunately this runs afoul of two very specific circumstances in our scenario:

  • PBS executes /bin/bash with argv[0] set to -bash in order to get it to behave like a login shell and then feeds the path to the job script to execute to it via stdin
  • bash in turn defaults to using the login shell value from getpwent() when called as a login shell (without further information)

To Reproduce

Steps to reproduce the behavior:

  1. Create a user with login shell tcsh:
$ getent passwd $USER
testuser:*:237:1120:test user:/home/testuser:/bin/tcsh
  1. Have them execute the following minimal reproducer command:
$ echo echo '$BASH' | bash -c 'exec -a -bash /bin/bash'
/usr/share/Modules/init/tcsh: eval: line 3: syntax error near unexpected token `alias'
/usr/share/Modules/init/tcsh: eval: line 3: `if ( $?histchars && $?prompt ) alias module 'set _histchars = $histchars; unset histchars; set _prompt=$prompt:q; set prompt=""; eval "`/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl tcsh \!*:q`"; set _exit="$status"; set histchars = $_histchars; unset _histchars; set prompt=$_prompt:q; unset _prompt; test 0 = $_exit' ;'
/usr/share/Modules/init/tcsh: line 15: syntax error near unexpected token `alias'
/usr/share/Modules/init/tcsh: line 15: `      if ( $?histchars && $?prompt ) alias module 'set _histchars = $histchars; unset histchars; set _prompt=$prompt:q; set prompt=""; eval `/usr/share/Modules/libexec/modulecmd-compat tcsh \!*`; set _exit="$status";  set histchars = $_histchars; unset _histchars; set prompt=$_prompt:q; unset _prompt; test 0 = $_exit' ;'
/bin/tcsh

No modulerc or modulefiles are involved in this problem because it's the initialisation process that's failing.

Expected behavior

The shell should execute the command without error messages.

Error and debugging information

It could be argued whether bash's or PBS's behaviour are strictly correct here. On the other hand, changing their established behaviour, particularly that of bash, is likely to break other things. So any ideas how to work around this problem on the modules level would be very much appreciated.

Modules version and configuration

$ module --version
Modules Release 4.5.2 (2020-07-30)
$ module config --dump-state
Modules Release 4.5.2 (2020-07-30)

- Config. name ---------.- Value (set by if default overridden) ---------------
advanced_version_spec     0
auto_handling             0
avail_indepth             1
avail_report_dir_sym      1
avail_report_mfile_sym    1
collection_pin_version    0
collection_target         <undef>
color                     never
colors                    hi=1:db=2:se=2:er=91:wa=93:me=95:in=94:mp=1;94:di=94:al=96:sy=95:de=4:cm=92
contact                   root@localhost
extended_default          0
extra_siteconfig          <undef>
home                      /usr/share/Modules (env-var)
icase                     never
ignored_dirs              CVS RCS SCCS .svn .git .SYNC .sos
implicit_default          1
locked_configs
ml                        1
pager                     /usr/bin/less -eFKRX
rcfile                    <undef>
run_quarantine            <undef>
search_match              starts_with
set_shell_startup         0
silent_shell_debug        <undef>
siteconfig                /etc/environment-modules/siteconfig.tcl
tcl_ext_lib               /usr/lib64/libtclenvmodules.so
term_background           dark
unload_match_order        returnlast
verbosity                 normal
wa_277                    0

- State name -----------.- Value ----------------------------------------------
cmdline                   /usr/share/Modules/libexec/modulecmd.tcl tcsh config --dump-state --no-pager
extra_siteconfig_loaded   0
force                     0
is_stderr_tty             1
is_win                    0
machine                   x86_64
os                        Linux 4.18.0-372.9.1.el8.x86_64
pager_started             0
paginate                  0
path_separator            :
rc_loaded                 <undef>
siteconfig_loaded         1
shell                     tcsh
shelltype                 csh
subcmd                    config
subcmd_args               --dump-state
tcl_ext_lib_loaded        1
tcl_version               8.6.8
term_columns              158

- Env. variable --------.- Value ----------------------------------------------
LOADEDMODULES
MODULEPATH                /app/modulefiles:/app/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles
MODULEPATH_modshare       /usr/share/modulefiles:1:/usr/share/Modules/modulefiles:1:/etc/modulefiles:1
MODULESHOME               /usr/share/Modules
MODULES_CMD               /usr/share/Modules/libexec/modulecmd.tcl
MODULES_USE_COMPAT_VERSION 0

Additional context

  • We're experiencing this on RHEL 8.6 but have no reason to believe this wouldn't affect any newer version of modules as well.
  • In our use-case the error messages are purely cosmetic but annoying and generate support churn. It's also likely that there are cases where this additional error output might confuse software which is parsing the output of commands it runs in a similar fashion.
@mw-a mw-a added the bug label Sep 12, 2022
@xdelaruelle
Copy link
Collaborator

Many thanks for your detailed report and for the reproducer.

As spotted, issue comes from the following lines of sh profile script:

https://github.com/cea-hpc/modules/blob/b5c3c30477fee50c1ef51b62f7a56ead4c70f88c/init/profile.sh.in#L5-L6

Code may be fixed by restraining the possibilities when detecting that the BASH variable is set. If BASH is found set, I assume we need to source either sh or bash version of the Modules initialization script. So I think the following code change should fix the issue:

--- orig	2022-09-13 06:55:10.807140403 +0200
+++ /etc/profile.d/modules.sh	2022-09-13 06:55:42.119445769 +0200
@@ -1,7 +1,11 @@
 # get current shell name by querying shell variables or looking at parent
 # process name
 if [ -n "${BASH:-}" ]; then
-   shell=${BASH##*/}
+   if [ "${BASH##*/}" = 'sh' ]; then
+      shell='sh'
+   else
+      shell='bash'
+   fi
 elif [ -n "${ZSH_NAME:-}" ]; then
    shell=$ZSH_NAME
 else

@mw-a
Copy link
Author

mw-a commented Sep 14, 2022

Thanks for the quick response and sorry for not responding as quickly.

I have now tested the proposed fix in our setup and can confirm that it fixes the problem with the minimal reproducer as well as for PBS jobs. Thank you!

@lzaoral
Copy link
Contributor

lzaoral commented Oct 5, 2022

@mw-a, thank you for this detailed report!
@xdelaruelle, thank you for letting me know that you have created a fix for this bug (rhbz#1815047)!

I've already fixed it in CentOS Stream 8 & 9 so this will be fixed in upcoming RHEL releases.

@xdelaruelle xdelaruelle added this to the 5.2 milestone Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants