Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17256][Deploy, Windows]Check before adding double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character such as mysql connection string. #14807

Closed
wants to merge 2 commits into from

Conversation

qualiu
Copy link

@qualiu qualiu commented Aug 25, 2016

What changes were proposed in this pull request?

Remove double quotes in 3 cmd files in spark/bin to avoid cutting off argument that has special characters and double quoted.

How was this patch tested?

Just copy and paste following command then execute it in spark/bin
spark-submit.cmd --jars just-to-start "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123" my_table

It will not even be started, the argument of mysql connection string was cut off :

bin\spark-submit2.cmd" --jars just-to-start "jdbc:mysql://localhost:3306/lzdb?user' is not recognized as an internal or external command,
operable program or batch file.
'password' is not recognized as an internal or external command,
operable program or batch file.

This's a conservative fix that keeps the rules of using cmd /V /E /C and will work as it's able to work.

It's not a best idea to use /V /E to avoid polluting environment and enable /V /E /C , this will restrain the argument passing.
(1) Cannot start XXX.cmd itself if it's full path has white spaces even if quoted :
cmd /V /E /C "%~dp0XXX.cmd" xxx "%~dp0XXX.cmd" xxx
(2) Cannot pass double quoted arguments to spark-submit.cmd (as mentioned above) under :
cmd /V /E /C
May be it's better to force set the variables that need to avoid pollution, since it's difficult to require the users to keep using "SetLocal EnableDelayedExpansion" alike in the batch files (.cmd/.bat).
By the way, I didn't change the pyspark , R, beeline etc. scripts because they seems work fine for long.

What's more in addition

A tool to fast change/restore the files in spark/bin if you like : https://github.com/qualiu/lzmw

  • Remove quotes (delete the --nf xxx will modify all matched files):

    lzmw -i -t "(^cmd.*?/[VCE])\s+\"+(%~dp0\S+\.cmd)\"+" -o "$1 $2" --nf "pyspark|sparkR|beeline|example" -p . -R
  • Add/Restore quotes :
    lzmw -it "\"*(%~dp0\S+\.cmd)\"*" -o "\"$1\"" -p . -R
  • Or remove the head :
    lzmw -f "\.cmd$" -it "^cmd /V /E /C " -o "" -p %CD% -R

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Aug 25, 2016

I don't think we can do that, because then paths with spaces don't work for example. don't you just need to escape your argument?

@qualiu
Copy link
Author

qualiu commented Aug 25, 2016

@srowen : thanks for your reply.
But as I've mentioned above , path with spaces is not working now:
Run spark-submit.cmd in it's directory "D:\opengit\spark\bin - Copy" which has space in the name , then "D:\opengit\spark\bin - Copy" was cut off by spark-submit.cmd to 'D:\opengit\spark\bin' as following:

D:\opengit\spark\bin - Copy>spark-submit.cmd --jars just-to-start "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123" my_table
'D:\opengit\spark\bin' is not recognized as an internal or external command,
operable program or batch file.
'password' is not recognized as an internal or external command,
operable program or batch file.

And the unchanged batch files has quotes as following
image

@qualiu
Copy link
Author

qualiu commented Aug 25, 2016

And what's more, current spark-submit.cmd will cause an extra error of "The input line is too long." if you call spark-submit.cmd in an batch file under some case

XXX.cmd --jar xxx  --xxx *** Pi* d:\tmp "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123"
The input line is too long.
'password' is not recognized as an internal or external command,
operable program or batch file.

@srowen : What I did was a conservative fix (already in the original text) that not break current fine-working cases, and let more cases go .

@srowen
Copy link
Member

srowen commented Aug 25, 2016

I see, your example shows, I think, submitting an invalid JAR file right? you've specified your JAR with --jars but then don't specify a main JAR. That may be the root cause.

I see your point that your path has spaces though.
See #10789 and CC @tritab
I believe it was supposed to fix that.

Am I right that this is then about spaces in path? I don't yet see a case about special characters or quoting here.

@qualiu
Copy link
Author

qualiu commented Aug 25, 2016

@srowen : Sorry that there's something misleading in my example about the --jar.

The key points in short are :

  • The double quotes with spark-submit.cmd doesn't work** if the full path of spark-submit.cmd has spaces.
    • Proven by the above example "D:\opengit\spark\bin - Copy"
    • The reason is there's cmd /V /E /C at the line head: spark-submit.cmd:

      cmd /V /E /C "%~dp0spark-submit2.cmd" %*
  • My example of the invalid jar in --jars just-to-start just to show that:
    • spark-submit.cmd could not be started if there's one argument double quoted.

Essential problem

  • cmd /V /E /C "~xxx.cmd" conflicts with/cannot accept "double quoted argument"

Possible solution for discussion

  • Keep quotes & Remove cmd /V /E /C just like current bin/spark-submit2.cmd
    "%~dp0spark-class2.cmd" %CLASS% %*
    • Then everything will work fine.
    • Except possible pollution to the environment in current cmd window and derived (like by start cmd)
  • Take my conservative fix
    • Keep working: Everything working fine now.
    • Keep not working : cannot use it -just start submiting if full path has space shown above.
    • Will working by this fix : argument has special character and double quoted will not be cut off.

@srowen
Copy link
Member

srowen commented Aug 25, 2016

I can't evaluate this since I don't use windows, but maybe others who made the original change can. You should check https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark BTW.

@qualiu
Copy link
Author

qualiu commented Aug 25, 2016

@srowen : Thank you! Glad to hear opinions and maybe need more discussion.

@qualiu qualiu changed the title Remove double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character. [Deploy, Windows]Remove double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character. Aug 25, 2016
lqm678 added 2 commits August 26, 2016 08:10
…guments that double quoted as contain special character.

To simply validate (for example, mysql connection string), cannot just start it :
spark-submit.cmd --jars just-to-start "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123" my_table

After this fix :
(1) Keep not working : full path has space like "D:\opengit\spark\bin - Copy\spark-submit.cmd" --jars any-jars "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123" my_table
(2) Keep working : currently can works.
(3) Will work by this fix: argument has quotes if full path no space.
By the way, I didn't change the pyspark , R, beeline etc. scripts because they seems work fine for long.

What's more in addition, a tool to quickly change the files in spark/bin if you like : https://github.com/qualiu/lzmw
(1) Remove quotes : lzmw -i -t "(^cmd.*?/[VCE])\s+\"+(%~dp0\S+\.cmd)\"+" -o "$1 $2" --nf "pyspark|sparkR|beeline|example" -p . -R
(2) Add/Restore   : lzmw -it "\"*(%~dp0\S+\.cmd)\"*" -o "\"$1\"" -p . -R
(3) Or remove the head : lzmw -f "\.cmd$" -it "^cmd /V /E /C " -o "" -p %CD% -R
…mit.cmd files. In fact the same effect as last commit because currently it doesn't work if full path to spark-submit.cmd has space.
@qualiu qualiu force-pushed the remove-quotes-in-bin-cmd branch from 685bd47 to 7059639 Compare August 26, 2016 00:20
@qualiu qualiu changed the title [Deploy, Windows]Remove double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character. [Deploy, Windows]Check before adding double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character such as mysql connection string. Aug 26, 2016
@qualiu
Copy link
Author

qualiu commented Aug 26, 2016

@srowen @tsudukim @tritab @andrewor14 : Hello, I've updated to a more conservative fix, please review it, thanks!
I didn't push my former fix which is now pushed, just because they have same effects in fact, but this/former involves more line changes.

JIRA issue : https://issues.apache.org/jira/browse/SPARK-17256

@qualiu qualiu changed the title [Deploy, Windows]Check before adding double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character such as mysql connection string. [SPARK-17256][Deploy, Windows]Check before adding double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character such as mysql connection string. Aug 26, 2016
@tsudukim
Copy link
Contributor

Hi @qualiu, I had a quick look.
I believe spark-submit.cmd that contains space in its path worked fine when #10789 is merged, so I wonder if it is the problem of cmd /V /E /C.

@qualiu
Copy link
Author

qualiu commented Aug 29, 2016

@tsudukim : Yes, you're right(detail see previous talk above , the "Possible solution" ) :

One solution is removing cmd /V /E /C like current bin/spark-submit2.cmd

And I did the validation mentioned above again for path space and snap the picture as following:
image

@tritab
Copy link
Contributor

tritab commented Aug 29, 2016

Does it work if you escape the internal jdbc quotes with a caret ^ ?

On Aug 28, 2016 8:33 PM, "Quanmao LIU" notifications@github.com wrote:

@tsudukim https://github.com/tsudukim : I did the validation mentioned
above and snap the picture as following:
[image: image]
https://cloud.githubusercontent.com/assets/18525294/18038476/4bbbdea8-6dcb-11e6-8dd3-07cae45cdada.png


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14807 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AG6QKAduazm0SwMBOwkPt9zpT46RkJLmks5qkjcBgaJpZM4Js70V
.

@qualiu
Copy link
Author

qualiu commented Aug 29, 2016

@tritab : Thanks for your reply! Updated the picture.
Yes, I had tried that last week but not put them on the context.
(1) Cannot start if has space in spark-submit.cmd full path.
(2) Cannot start and cut off quoted argument no matter there's space & even if escaped and with double quotes.
image

@roryodonnell
Copy link

roryodonnell commented Feb 24, 2017

This causes issues starting Spark using the SparkLauncher under Windows, only when one sets an additional config parameter using the SparkLauncher.setConf method. I needed to call the setConf method to set the SparkLauncher.DRIVER_EXTRA_CLASSPATH and SparkLauncher.EXECUTOR_EXTRA_CLASSPATH.

c:\somepath\Spark-2.0.2\bin\spark-submit2.cmd" --verbose --master local[1] --conf "spark.executor.extraClassPath' is not recognized as an internal or external command,
=> operable program or batch file.

If I remove the quotes from the spark-submit.cmd from
cmd /V /E /C "%~dp0spark-submit2.cmd" %*
to
cmd /V /E /C %~dp0spark-submit2.cmd %*

it works. Just thought you'd like to know. I was pulling my hair out for 10 hours locating this. And I didn't have spaces in any of my folder names, or env folder names

@tritab
Copy link
Contributor

tritab commented Feb 24, 2017 via email

@gatorsmile
Copy link
Member

We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!

@asfgit asfgit closed this in b32bd00 Jun 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants