-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17256][Deploy, Windows]Check before adding double quotes in spark/bin batch files to avoid cutting off arguments that double quoted as contain special character such as mysql connection string. #14807
Conversation
Can one of the admins verify this patch? |
I don't think we can do that, because then paths with spaces don't work for example. don't you just need to escape your argument? |
@srowen : thanks for your reply.
|
And what's more, current
@srowen : What I did was a conservative fix (already in the original text) that not break current fine-working cases, and let more cases go . |
I see, your example shows, I think, submitting an invalid JAR file right? you've specified your JAR with --jars but then don't specify a main JAR. That may be the root cause. I see your point that your path has spaces though. Am I right that this is then about spaces in path? I don't yet see a case about special characters or quoting here. |
@srowen : Sorry that there's something misleading in my example about the --jar. The key points in short are :
Essential problem
Possible solution for discussion
|
I can't evaluate this since I don't use windows, but maybe others who made the original change can. You should check https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark BTW. |
@srowen : Thank you! Glad to hear opinions and maybe need more discussion. |
…guments that double quoted as contain special character. To simply validate (for example, mysql connection string), cannot just start it : spark-submit.cmd --jars just-to-start "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123" my_table After this fix : (1) Keep not working : full path has space like "D:\opengit\spark\bin - Copy\spark-submit.cmd" --jars any-jars "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123" my_table (2) Keep working : currently can works. (3) Will work by this fix: argument has quotes if full path no space. By the way, I didn't change the pyspark , R, beeline etc. scripts because they seems work fine for long. What's more in addition, a tool to quickly change the files in spark/bin if you like : https://github.com/qualiu/lzmw (1) Remove quotes : lzmw -i -t "(^cmd.*?/[VCE])\s+\"+(%~dp0\S+\.cmd)\"+" -o "$1 $2" --nf "pyspark|sparkR|beeline|example" -p . -R (2) Add/Restore : lzmw -it "\"*(%~dp0\S+\.cmd)\"*" -o "\"$1\"" -p . -R (3) Or remove the head : lzmw -f "\.cmd$" -it "^cmd /V /E /C " -o "" -p %CD% -R
…mit.cmd files. In fact the same effect as last commit because currently it doesn't work if full path to spark-submit.cmd has space.
685bd47
to
7059639
Compare
@srowen @tsudukim @tritab @andrewor14 : Hello, I've updated to a more conservative fix, please review it, thanks! JIRA issue : https://issues.apache.org/jira/browse/SPARK-17256 |
@tsudukim : Yes, you're right(detail see previous talk above , the "Possible solution" ) : One solution is removing
|
Does it work if you escape the internal jdbc quotes with a caret ^ ? On Aug 28, 2016 8:33 PM, "Quanmao LIU" notifications@github.com wrote:
|
@tritab : Thanks for your reply! Updated the picture. |
This causes issues starting Spark using the SparkLauncher under Windows, only when one sets an additional config parameter using the SparkLauncher.setConf method. I needed to call the setConf method to set the SparkLauncher.DRIVER_EXTRA_CLASSPATH and SparkLauncher.EXECUTOR_EXTRA_CLASSPATH. c:\somepath\Spark-2.0.2\bin\spark-submit2.cmd" --verbose --master local[1] --conf "spark.executor.extraClassPath' is not recognized as an internal or external command, If I remove the quotes from the spark-submit.cmd from it works. Just thought you'd like to know. I was pulling my hair out for 10 hours locating this. And I didn't have spaces in any of my folder names, or env folder names |
It would probably be a good idea to get some unit tests for these
scenarios. Would anyone be willing to write some tests?
…On Feb 24, 2017 10:13 AM, "roryodonnell" ***@***.***> wrote:
This causes issues starting Spark using the SparkLauncher under Windows,
only when one sets an additional config parameter using the
SparkLauncher.setConf method
c:\somepath\Spark-2.0.2\bin\spark-submit2.cmd" --verbose --master
local[1] --conf "spark.executor.extraClassPath' is not recognized as an
internal or external command,
=> operable program or batch file.
If I remove the quotes from the spark-submit.cmd from
cmd /V /E /C "%~dp0spark-submit2.cmd" %*
to
cmd /V /E /C "%~dp0spark-submit2.cmd" %*
it works. Just thought you'd like to know. I was pulling my hair out for
10 hours locating this
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14807 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AG6QKKXLv5nAi2JkRERGYXx9R7h6h2ipks5rfwGwgaJpZM4Js70V>
.
|
We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks! |
What changes were proposed in this pull request?
Remove double quotes in 3 cmd files in spark/bin to avoid cutting off argument that has special characters and double quoted.
How was this patch tested?
Just copy and paste following command then execute it in spark/bin
spark-submit.cmd --jars just-to-start "jdbc:mysql://localhost:3306/lzdb?user=guest&password=abc123" my_table
It will not even be started, the argument of mysql connection string was cut off :
This's a conservative fix that keeps the rules of using cmd /V /E /C and will work as it's able to work.
It's not a best idea to use /V /E to avoid polluting environment and enable /V /E /C , this will restrain the argument passing.
(1) Cannot start XXX.cmd itself if it's full path has white spaces even if quoted :
cmd /V /E /C "%~dp0XXX.cmd" xxx "%~dp0XXX.cmd" xxx
(2) Cannot pass double quoted arguments to spark-submit.cmd (as mentioned above) under :
cmd /V /E /C
May be it's better to force set the variables that need to avoid pollution, since it's difficult to require the users to keep using "SetLocal EnableDelayedExpansion" alike in the batch files (.cmd/.bat).
By the way, I didn't change the pyspark , R, beeline etc. scripts because they seems work fine for long.
What's more in addition
A tool to fast change/restore the files in spark/bin if you like : https://github.com/qualiu/lzmw
lzmw -i -t "(^cmd.*?/[VCE])\s+\"+(%~dp0\S+\.cmd)\"+" -o "$1 $2" --nf "pyspark|sparkR|beeline|example" -p . -R
lzmw -it "\"*(%~dp0\S+\.cmd)\"*" -o "\"$1\"" -p . -R
lzmw -f "\.cmd$" -it "^cmd /V /E /C " -o "" -p %CD% -R