Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this" #2163

Closed
wants to merge 1 commit into from

Conversation

vidaha
Copy link
Contributor

@vidaha vidaha commented Aug 27, 2014

... copy the spark_cluster_tag from a spot instance requests over to the instances.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@vidaha
Copy link
Contributor Author

vidaha commented Aug 27, 2014

@JoshRosen

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@JoshRosen
Copy link
Contributor

Does this solve the "launch more like this" issue that we discussed, where the "Name" tag wasn't being copied to the new instances?

@JoshRosen
Copy link
Contributor

Jenkins, add to whitelist.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have started for PR 2163 at commit cfd0391.

  • This patch merges cleanly.

@vidaha vidaha changed the title Add a spark-cluster-tag tag instead of overloading the cluster name, and... Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this" Aug 27, 2014
@vidaha
Copy link
Contributor Author

vidaha commented Aug 27, 2014

@JoshRosen

Yes, this pull request does fix that issue.

Although it was not the "Name" tag that was causing the problem - I still made that change though to tag with Spark Cluster Tag since I think that's better.

It turns out the problem is - When using "Launch more like this", the tags get associated with the spot request instead of the instance. There is now code in get_existing_cluster that checks spot_requests for the Spark Cluster Tag, and if it finds that, copies that tag over to the associated instances.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have finished for PR 2163 at commit cfd0391.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

One small concern is users who are testing spark-ec2 using the master branch and have already launched clusters that are identified by the Name tag. After this patch, will those users be able to login to or shut down those clusters?

@JoshRosen
Copy link
Contributor

For complete backwards-compatibility, maybe we should also set the Name tag? Or just stick with using Name and keep the rest of this PR (the copying of the tag from spot requests to instances, etc)?

@jkbradley
Copy link
Member

Yep, I was just finding that the updated script can't find my old clusters.

@vidaha
Copy link
Contributor Author

vidaha commented Aug 27, 2014

@jkbradley

Oh right - I launched with this version of the script as well...updated - please try one more time - fingers crossed!

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have started for PR 2163 at commit d4f9fae.

  • This patch merges cleanly.

@jkbradley
Copy link
Member

The latest update to the script lets it find my old machines when I start them. However, it does not find spot instances created from the slave machine.

… 'Launch More Like This' and using Spot Requests
@vidaha
Copy link
Contributor Author

vidaha commented Aug 27, 2014

@jkbradley

one more time - i mixed up state and status. ugh.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have started for PR 2163 at commit 5070a70.

  • This patch merges cleanly.

@jkbradley
Copy link
Member

@vidaha Yep! It finds the spot instances created from my old machines now. Thanks a lot!

@pwendell
Copy link
Contributor

Is "launch more like this" only a feature for spot instances? Or do we need a fix also for on-demand instances?

@jkbradley
Copy link
Member

You can launch both spot and on-demand instances using "launch more like this." I haven't tried on-demand ones yet, but I can do that now.

@JoshRosen
Copy link
Contributor

I talked to Vida, who said that this only affects spot instances, since she couldn't reproduce the issue when launching on-demand instances through "launch more like this." I guess the issue is that the tags are copied from the running instance to the spot request, but not to the launched spot instance. This might be an EC2 bug / issue, since this behavior is counter-intuitive.

@pwendell
Copy link
Contributor

ah okay

@JoshRosen
Copy link
Contributor

This looks good to me; I'm going to merge it into master and branch-1.1. Thanks!

@vidaha
Copy link
Contributor Author

vidaha commented Aug 27, 2014

Awesome - thanks - sorry that took a long time, but I think we have settled on the right long term solution IMHO!

@asfgit asfgit closed this in 7faf755 Aug 27, 2014
asfgit pushed a commit that referenced this pull request Aug 27, 2014
…th "Launch More like this"

... copy the spark_cluster_tag from a spot instance requests over to the instances.

Author: Vida Ha <vida@databricks.com>

Closes #2163 from vidaha/vida/spark-3213 and squashes the following commits:

5070a70 [Vida Ha] Spark-3214 Fix issue with spark-ec2 not detecting slaves created with 'Launch More Like This' and using Spot Requests

(cherry picked from commit 7faf755)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
@pwendell
Copy link
Contributor

Thanks @vidaha and @jkbradley for helping with this!

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have finished for PR 2163 at commit d4f9fae.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have finished for PR 2163 at commit 5070a70.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Sep 2, 2014
This reverts #1899 and #2163, two patches that modified `spark-ec2` so that clusters are identified using tags instead of security groups.  The original motivation for this patch was to allow multiple clusters to run in the same security group.

Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of `spark-ec2` launching instances and crashing before they can be tagged, effectively orphaning those instances.  The orphaned instances won't belong to any cluster, so the `spark-ec2` script will be unable to clean them up.

Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters.  For the 1.1.0 release, though, I propose that we just revert this patch.

Author: Josh Rosen <joshrosen@apache.org>

Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the following commits:

0c18e86 [Josh Rosen] Revert "SPARK-2333 - spark_ec2 script should allow option for existing security group"
c2ca2d4 [Josh Rosen] Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this""
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…th "Launch More like this"

... copy the spark_cluster_tag from a spot instance requests over to the instances.

Author: Vida Ha <vida@databricks.com>

Closes apache#2163 from vidaha/vida/spark-3213 and squashes the following commits:

5070a70 [Vida Ha] Spark-3214 Fix issue with spark-ec2 not detecting slaves created with 'Launch More Like This' and using Spot Requests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants