Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20741][Spark Submit] Added cleanup of JARs archive generated by SparkSubmit #17986

Closed
wants to merge 1 commit into from

Conversation

liorregev
Copy link
Contributor

What changes were proposed in this pull request?

Deleted generated JARs archive after distribution to HDFS

How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a pull request.

@srowen
Copy link
Member

srowen commented May 15, 2017

That seems OK to me. It might be a good time to address similar issues elsewhere. For instance, look at Client.createConfArchive. The one place it's called, I think the file can be deleted after it's uploaded. There are a few other potential situations like this we could clean up.

@vanzin
Copy link
Contributor

vanzin commented May 15, 2017

I don't think this is really necessary. These files are created in Utils.getLocalDir, which on the launcher side is a temporary directory (see Utils.getOrCreateLocalRootDirsImpl). Meaning that as soon as the launcher exits, these files will be deleted.

If you really want to fix this instance, it may be better to follow Sean's suggestion and fix all instances, creating an explicit temporary directory where the files are stored. All this is going to do, though, is to delete the files earlier - they'd still be deleted when the process exits.

@liorregev
Copy link
Contributor Author

liorregev commented May 15, 2017

Actually I ran into a problem with this not getting cleaned up.
After your explanation I can understand why it wasn't deleted.
I am running spark on EMR and the easiest way to programmatically submit applications to the cluster was to create an HTTP service that accepts the application details and programmatically calls SparkSubmit.main so the process never really exits.
I managed to solve this with spark.yarn.archive so I don't need this implemented, I just figured it would have been a better solution.

@vanzin
Copy link
Contributor

vanzin commented May 15, 2017

I just figured it would have been a better solution.

It might be a good idea to do it, but then you can't just add this one line, you have to look at all the temp files that Client.scala generates.

@srowen
Copy link
Member

srowen commented May 19, 2017

@liorregev if you'll take care of a couple other cases like this here, it looks OK to merge. Proactively cleaning up seems reasonable.

@SparkQA
Copy link

SparkQA commented May 23, 2017

Test build #3751 has finished for PR 17986 at commit cb03d8a.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #3753 has finished for PR 17986 at commit cb03d8a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 25, 2017

Merged to master/2.2. It's a win and on second look it wasn't obvious that there's another instance of this that can safely be cleaned up.

asfgit pushed a commit that referenced this pull request May 25, 2017
…y SparkSubmit

## What changes were proposed in this pull request?

Deleted generated JARs archive after distribution to HDFS

## How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Lior Regev <lioregev@gmail.com>

Closes #17986 from liorregev/master.

(cherry picked from commit 7306d55)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in 7306d55 May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants