[SPARK-20741][Spark Submit] Added cleanup of JARs archive generated by SparkSubmit #17986

liorregev · 2017-05-15T13:59:18Z

What changes were proposed in this pull request?

Deleted generated JARs archive after distribution to HDFS

How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a pull request.

srowen · 2017-05-15T14:23:32Z

That seems OK to me. It might be a good time to address similar issues elsewhere. For instance, look at Client.createConfArchive. The one place it's called, I think the file can be deleted after it's uploaded. There are a few other potential situations like this we could clean up.

vanzin · 2017-05-15T16:45:03Z

I don't think this is really necessary. These files are created in Utils.getLocalDir, which on the launcher side is a temporary directory (see Utils.getOrCreateLocalRootDirsImpl). Meaning that as soon as the launcher exits, these files will be deleted.

If you really want to fix this instance, it may be better to follow Sean's suggestion and fix all instances, creating an explicit temporary directory where the files are stored. All this is going to do, though, is to delete the files earlier - they'd still be deleted when the process exits.

liorregev · 2017-05-15T16:52:45Z

Actually I ran into a problem with this not getting cleaned up.
After your explanation I can understand why it wasn't deleted.
I am running spark on EMR and the easiest way to programmatically submit applications to the cluster was to create an HTTP service that accepts the application details and programmatically calls SparkSubmit.main so the process never really exits.
I managed to solve this with spark.yarn.archive so I don't need this implemented, I just figured it would have been a better solution.

vanzin · 2017-05-15T16:56:18Z

I just figured it would have been a better solution.

It might be a good idea to do it, but then you can't just add this one line, you have to look at all the temp files that Client.scala generates.

srowen · 2017-05-19T06:44:09Z

@liorregev if you'll take care of a couple other cases like this here, it looks OK to merge. Proactively cleaning up seems reasonable.

SparkQA · 2017-05-23T16:56:10Z

Test build #3751 has finished for PR 17986 at commit cb03d8a.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-05-24T07:28:44Z

Test build #3753 has finished for PR 17986 at commit cb03d8a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-05-25T16:08:45Z

Merged to master/2.2. It's a win and on second look it wasn't obvious that there's another instance of this that can safely be cleaned up.

…y SparkSubmit ## What changes were proposed in this pull request? Deleted generated JARs archive after distribution to HDFS ## How was this patch tested? Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Lior Regev <lioregev@gmail.com> Closes #17986 from liorregev/master. (cherry picked from commit 7306d55) Signed-off-by: Sean Owen <sowen@cloudera.com>

[SPARK-20741] Added cleanup of JARs archive generated by SparkSubmit

cb03d8a

asfgit closed this in 7306d55 May 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20741][Spark Submit] Added cleanup of JARs archive generated by SparkSubmit #17986

[SPARK-20741][Spark Submit] Added cleanup of JARs archive generated by SparkSubmit #17986

liorregev commented May 15, 2017

srowen commented May 15, 2017

vanzin commented May 15, 2017

liorregev commented May 15, 2017 •

edited

Loading

vanzin commented May 15, 2017

srowen commented May 19, 2017

SparkQA commented May 23, 2017

SparkQA commented May 24, 2017

srowen commented May 25, 2017

[SPARK-20741][Spark Submit] Added cleanup of JARs archive generated by SparkSubmit #17986

[SPARK-20741][Spark Submit] Added cleanup of JARs archive generated by SparkSubmit #17986

Conversation

liorregev commented May 15, 2017

What changes were proposed in this pull request?

How was this patch tested?

srowen commented May 15, 2017

vanzin commented May 15, 2017

liorregev commented May 15, 2017 • edited Loading

vanzin commented May 15, 2017

srowen commented May 19, 2017

SparkQA commented May 23, 2017

SparkQA commented May 24, 2017

srowen commented May 25, 2017

liorregev commented May 15, 2017 •

edited

Loading