-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port illumos' shell linter improvements #353
Conversation
This commit ports over two improvements to the shell linter from illumos (original patch written by Andy Fiddaman). Links to the relevant bug reports and the original patch: https://www.illumos.org/issues/13601 https://www.illumos.org/issues/13631 illumos/illumos-gate@c7b656f The first improvement is to the lint warning for arithmetic operators in [[ ... ]]. The ksh linter now suggests the correct equivalent operator to use in ((...)). Example: $ ksh -nc '[[ 30 -gt 25 ]]' # Original warning warning: line 1: -gt within [[ ... ]] obsolete, use ((...)) # New warning warning: line 1: [[ ... -gt ... ]] obsolete, use ((... > ...)) The second improvement pertains to variable expansion in arithmetic expressions. The ksh linter now suggests referencing variable names directly: $ ksh -nc 'integer foo=40; (($foo < 50 ))' # Old warning warning: line 1: variable expansion makes arithmetic evaluation less efficient # New warning warning: line 1: in '(($foo < 50))', using '$' is unnecessary, incurs a penalty and can introduce rounding errors src/cmd/ksh93/{data/lexstates,sh/lex,sh/parse}.c: - Port the improved shell lint warnings from illumos to ksh93u+m. - The original checks for arithmetic operators involved a bunch of if statements with inefficient calls to strcmp(3). These were replaced with a more efficient switch statement that avoids strcmp.
Here is a further optimization if you wish. When I examined src/cmd/ksh93/include/test.h, all the six TESTs aligned up in a row thus an array can be employed avoiding all the branching just need to substract 36 (octal 040+4=36) to adjust to position 0. ksh/src/cmd/ksh93/include/test.h Lines 38 to 47 in 6d3796c
diff --git a/NEWS b/NEWS
index 613058d..014bad5 100644
--- a/NEWS
+++ b/NEWS
@@ -3,6 +3,11 @@ For full details, see the git log at: https://github.com/ksh93/ksh
Any uppercase BUG_* names are modernish shell bug IDs.
+2021-11-28:
+
+- The shell linter's warnings for obsolete arithmetic operators in [[ ... ]]
+ and unnecessary variable expansion in ((...)) have been improved.
+
2021-11-24:
- The --posix mode was amended to stop the '.' command (but not 'source') from
diff --git a/src/cmd/ksh93/data/lexstates.c b/src/cmd/ksh93/data/lexstates.c
index 2465e26..bf490fe 100644
--- a/src/cmd/ksh93/data/lexstates.c
+++ b/src/cmd/ksh93/data/lexstates.c
@@ -430,13 +430,13 @@ const char e_lexsyntax2[] = "syntax error: `%s' %s";
const char e_lexsyntax3[] = "syntax error at line %d: duplicate label %s";
const char e_lexsyntax4[] = "syntax error at line %d: invalid reference list";
const char e_lexsyntax5[] = "syntax error at line %d: `<<%s' here-document not contained within command substitution";
-const char e_lexwarnvar[] = "line %d: variable expansion makes arithmetic evaluation less efficient";
+const char e_lexwarnvar[] = "line %d: in '((%s))', using '$' is unnecessary, incurs a penalty and can introduce rounding errors";
const char e_lexlabignore[] = "line %d: label %s ignored";
const char e_lexlabunknown[] = "line %d: %s unknown label";
const char e_lexobsolete1[] = "line %d: `...` obsolete, use $(...)";
const char e_lexobsolete2[] = "line %d: -a obsolete, use -e";
const char e_lexobsolete3[] = "line %d: '=' obsolete, use '=='";
-const char e_lexobsolete4[] = "line %d: %s within [[ ... ]] obsolete, use ((...))";
+const char e_lexobsolete4[] = "line %d: [[ ... %s ... ]] obsolete, use ((... %s ...))";
const char e_lexobsolete5[] = "line %d: set %s obsolete";
const char e_lexobsolete6[] = "line %d: `{' instead of `in' is obsolete";
const char e_lexusebrace[] = "line %d: use braces to avoid ambiguities with $id[...]";
diff --git a/src/cmd/ksh93/include/version.h b/src/cmd/ksh93/include/version.h
index 6b18caf..2d48982 100644
--- a/src/cmd/ksh93/include/version.h
+++ b/src/cmd/ksh93/include/version.h
@@ -21,7 +21,7 @@
#define SH_RELEASE_FORK "93u+m" /* only change if you develop a new ksh93 fork */
#define SH_RELEASE_SVER "1.1.0-alpha" /* semantic version number: https://semver.org */
-#define SH_RELEASE_DATE "2021-11-24" /* must be in this format for $((.sh.version)) */
+#define SH_RELEASE_DATE "2021-11-28" /* must be in this format for $((.sh.version)) */
#define SH_RELEASE_CPYR "(c) 2020-2021 Contributors to ksh " SH_RELEASE_FORK
/* Scripts sometimes field-split ${.sh.version}, so don't change amount of whitespace. */
diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index c30c2d8..0fd25f1 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -1435,7 +1435,11 @@ breakloop:
if(lp->lex.testop2)
{
if(lp->lexd.warn && (c&TEST_ARITH))
- errormsg(SH_DICT,ERROR_warn(0),e_lexobsolete4,shp->inlineno,state);
+ {
+ /* test.h: octal 040+4=36, (TEST_EQ, TEST_GE, TEST_GT, TEST_LE, TEST_LT, TEST_NE) */
+ static const char *alt[] = { "==", ">=", ">", "<=", "<", "!=" };
+ errormsg(SH_DICT, ERROR_warn(0), e_lexobsolete4, shp->inlineno, state, alt[c-36]);
+ }
if(c&TEST_STRCMP)
lp->lex.incase = 1;
else if(c==TEST_REP)
diff --git a/src/cmd/ksh93/sh/parse.c b/src/cmd/ksh93/sh/parse.c
index 7f55ce5..74c5a75 100644
--- a/src/cmd/ksh93/sh/parse.c
+++ b/src/cmd/ksh93/sh/parse.c
@@ -304,7 +304,7 @@ static Shnode_t *getanode(Lex_t *lp, struct argnod *ap)
else
{
if(sh_isoption(SH_NOEXEC) && (ap->argflag&ARG_MAC) && paramsub(ap->argval))
- errormsg(SH_DICT,ERROR_warn(0),e_lexwarnvar,lp->sh->inlineno);
+ errormsg(SH_DICT,ERROR_warn(0),e_lexwarnvar,lp->sh->inlineno,ap->argval);
t->ar.arcomp = 0;
}
return(t); |
I prefer |
Here's some relevant reading: From Switch Statement Down to Machine Code |
This commit also adds some comments to improve readability.
Good point. I'll revert the patch to use a more readable
The performance gain probably is negligible, but any performance gain to be had isn't from using a |
Right, that was actually your version I prefer, sorry about that. I hadn't actually looked at the illumos version. Even though compilers do inline and optimize |
I've two comments about that warning:
|
Here's the script I used to try to produce a rounding error. The minimum value representable in ksh arithmetic is 0.00000000000000001. Add a zero and it rounds to zero. So you'd think adding that value repeatedly should eventually result in a rounding error, but the decimal fraction keeps corresponding to the number of iterations. check()
{
print -n "i==$i x==$x"
tr=$i
while [[ $tr == *0 ]]; do tr=${tr%0}; done
[[ $x == *"$tr" ]] || print -n " <=== ROUNDING ERROR"
print
}
LC_NUMERIC=C
#typeset -F17 x y
x="1.00000000000000000"
y="0.00000000000000001"
integer i=0
(
trap 'check' USR1
while ((++i)); do
x=$(( $x + $y ))
#(( x += y ))
done
) &
trap 'kill %1' EXIT
while sleep 1; do
kill -s USR1 %1
done A fragment of output on my system:
Very occasionally a rounding error occurs. But it does not propagate. Each iteration's value depends on the previous one being correct, so it must be an output problem only. What's more, if we replace the Now, if we declare
But again, it makes no difference whether you use the dreaded dollar sign or not; the results are identical, except the values increase a great deal faster with the float declaration (supporting the performance argument).
So, I want to see some actual evidence that this warning is correct before committing it. |
The wording there was based on the OpenSolaris shell guide (which I think was originally written by Roland Mainz) - there's a copy at http://www.fiddaman.net/shell/#avoid_unnecessary_string_number_conversions |
I can provide evidence along with other detail. |
I will try to provide some evidence to help understand what is going on when working with ksh floats and how that relates to ((...))s. I cannot explain it all and I am sure I will skip over something but the following I believe should help people understand that generally avoiding a
The
I do not agree 100% with this part as the coder may want to round a numeric value to a certain precision.
Let me provide some insights: For a In ksh, if you want to dump out the full decimal representation of a float numeric then issue $((var_name)).
If your precision level exceeds maximum allowable safe digits for the platform then ksh will output all the digits it can up to that limit and then tack on zeroes for the rest.
Now, let's see what happens with ksh when we try to add .1 and .2 together, see reference:
Output from x86_64:
Output from arrach64:
Swinging back to the impact of the $var_name expansion. More work is performed to round numeric variable x to its precision and then reintroduce its value back into the arithmetic expression thus incurring rounding and conversation errors. By using the variable name without expansion, ksh will operate on a defined numeric's stored raw value.
Now, for the generalization of the error message applying to text variables. Since variable a is text and not defined as a numeric, ksh is going to have to do a conversion no matter what to a long double on my platforms. Still, ksh is going to have to do less work as it does not have to insert the text value via the $ expansion then pull it back out and operate on it.
That does it for me on this. Hopefully this helps. I could be wrong but I have tried. |
Thanks @hyenias, I really appreciate that, I learned a few things. |
This commit ports over two improvements to the shell linter from illumos (original patch written by Andy Fiddaman). Links to the relevant bug reports and the original patch: https://www.illumos.org/issues/13601 https://www.illumos.org/issues/13631 illumos/illumos-gate@c7b656f The first improvement is to the lint warning for arithmetic operators in [[ ... ]]. The ksh linter now suggests the correct equivalent operator to use in ((...)). Example: $ ksh -nc '[[ 30 -gt 25 ]]' # Original warning warning: line 1: -gt within [[ ... ]] obsolete, use ((...)) # New warning warning: line 1: [[ ... -gt ... ]] obsolete, use ((... > ...)) The second improvement pertains to variable expansion in arithmetic expressions. The ksh linter now suggests referencing variable names directly: $ ksh -nc 'integer foo=40; (($foo < 50 ))' # Old warning warning: line 1: variable expansion makes arithmetic evaluation less efficient # New warning warning: line 1: in '(($foo < 50))', using '$' is slower and can introduce rounding errors src/cmd/ksh93/{data/lexstates,sh/lex,sh/parse}.c: - Port the improved shell lint warnings from illumos to ksh93u+m. - The original checks for arithmetic operators involved a bunch of if statements with inefficient calls to strcmp(3). These were replaced with a more efficient switch statement that avoids strcmp.
This commit adds onto <ksh93#353> by porting over two additional improvements to the shell linter: 1) The changes in the aforementioned pull request were merged into illumos-gate with an additional change.[*] The illumos revision of the patch improved the warning for (( $foo = $? )) to specify '$foo' causes the warning.[**] Example: $ ksh -n -c '(( $? != $bar ))' ksh: warning: line 1: in '(( $? != $bar ))', using '$' as in '$bar' is slower and can introduce rounding errors While I was porting the illumos patch I did notice one problem. The string it uses from paramsub() skips over the initial '{' in '${var}', resulting in the warning printing '$var}' instead: $ ksh -n -c '(( ${.sh.pid} != $$ ))' ... in '(( ${.sh.pid} != $$ ))', using '$' as in '$.sh.pid}' is slower ... This was fixed by including the missing '{' in the string returned by paramsub for ${var} variables. 2) In ksh93v-, parsing x=$((expr)) with the shell linter will cause ksh to warn the user x=$((expr)) is slower than ((x=expr)). This improvement has been backported with a modified warning: # Result from this commit $ ksh -n -c 'x=$((1 + 2))' ksh: warning: line 1: x=$((1 + 2)) is slower than ((x=1 + 2)) # Result from ksh93v- $ ksh93v -n -c 'x=$((1 + 2))' ksh93v: warning: line 1: ((x=1 + 2)) is more efficient than x=$((1 + 2)) Minor note: the ksh93v- patch had an invalid use of memcmp; this version of the patch uses strncmp instead. References: illumos/illumos-gate@be548e8 https://code.illumos.org/c/illumos-gate/+/1834/comment/65722363_22fdf8e7/
This commit adds onto <#353> by porting over two additional improvements to the shell linter: 1) The changes in the aforementioned pull request were merged into illumos-gate with an additional change.[*] The illumos revision of the patch improved the warning for (( $foo = $? )) to specify '$foo' causes the warning.[**] Example: $ ksh -n -c '(( $? != $bar ))' ksh: warning: line 1: in '(( $? != $bar ))', using '$' as in '$bar' is slower and can introduce rounding errors While I was porting the illumos patch I did notice one problem. The string it uses from paramsub() skips over the initial '{' in '${var}', resulting in the warning printing '$var}' instead: $ ksh -n -c '(( ${.sh.pid} != $$ ))' ... in '(( ${.sh.pid} != $$ ))', using '$' as in '$.sh.pid}' is slower ... This was fixed by including the missing '{' in the string returned by paramsub for ${var} variables. 2) In ksh93v-, parsing x=$((expr)) with the shell linter will cause ksh to warn the user x=$((expr)) is slower than ((x=expr)). This improvement has been backported with a modified warning: # Result from this commit $ ksh -n -c 'x=$((1 + 2))' ksh: warning: line 1: x=$((1 + 2)) is slower than ((x=1 + 2)) # Result from ksh93v- $ ksh93v -n -c 'x=$((1 + 2))' ksh93v: warning: line 1: ((x=1 + 2)) is more efficient than x=$((1 + 2)) Minor note: the ksh93v- patch had an invalid use of memcmp; this version of the patch uses strncmp instead. References: illumos/illumos-gate@be548e8 https://code.illumos.org/c/illumos-gate/+/1834/comment/65722363_22fdf8e7/
This commit adds onto <#353> by porting over two additional improvements to the shell linter: 1) The changes in the aforementioned pull request were merged into illumos-gate with an additional change.[*] The illumos revision of the patch improved the warning for (( $foo = $? )) to specify '$foo' causes the warning.[**] Example: $ ksh -n -c '(( $? != $bar ))' ksh: warning: line 1: in '(( $? != $bar ))', using '$' as in '$bar' is slower and can introduce rounding errors While I was porting the illumos patch I did notice one problem. The string it uses from paramsub() skips over the initial '{' in '${var}', resulting in the warning printing '$var}' instead: $ ksh -n -c '(( ${.sh.pid} != $$ ))' ... in '(( ${.sh.pid} != $$ ))', using '$' as in '$.sh.pid}' is slower ... This was fixed by including the missing '{' in the string returned by paramsub for ${var} variables. 2) In ksh93v-, parsing x=$((expr)) with the shell linter will cause ksh to warn the user x=$((expr)) is slower than ((x=expr)). This improvement has been backported with a modified warning: # Result from this commit $ ksh -n -c 'x=$((1 + 2))' ksh: warning: line 1: x=$((1 + 2)) is slower than ((x=1 + 2)) # Result from ksh93v- $ ksh93v -n -c 'x=$((1 + 2))' ksh93v: warning: line 1: ((x=1 + 2)) is more efficient than x=$((1 + 2)) Minor note: the ksh93v- patch had an invalid use of memcmp; this version of the patch uses strncmp instead. References: illumos/illumos-gate@be548e8 https://code.illumos.org/c/illumos-gate/+/1834/comment/65722363_22fdf8e7/
This pull request ports over two improvements to the shell linter from illumos (original patch written by Andy Fiddaman). Links to the relevant bug reports and the original patch:
https://www.illumos.org/issues/13601
https://www.illumos.org/issues/13631
illumos/illumos-gate@c7b656f
The first improvement is to the lint warning for arithmetic operators in
[[ ... ]]
. The ksh linter now suggests the correct equivalent operator to use in((...))
. Example:The second improvement pertains to variable expansion in arithmetic expressions. The ksh linter now suggests referencing variable names directly:
src/cmd/ksh93/{data/lexstates,sh/lex,sh/parse}.c:
- Port the improved shell lint warnings from illumos to ksh93u+m.
- The original checks for arithmetic operators involved a bunch of if statements with inefficient calls to
strcmp
. These were replaced with a more efficient switch statement that avoidsstrcmp
.