-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this" #2163
Conversation
Can one of the admins verify this patch? |
Jenkins, this is ok to test. |
Does this solve the "launch more like this" issue that we discussed, where the "Name" tag wasn't being copied to the new instances? |
Jenkins, add to whitelist. |
QA tests have started for PR 2163 at commit
|
Yes, this pull request does fix that issue. Although it was not the "Name" tag that was causing the problem - I still made that change though to tag with Spark Cluster Tag since I think that's better. It turns out the problem is - When using "Launch more like this", the tags get associated with the spot request instead of the instance. There is now code in get_existing_cluster that checks spot_requests for the Spark Cluster Tag, and if it finds that, copies that tag over to the associated instances. |
QA tests have finished for PR 2163 at commit
|
One small concern is users who are testing spark-ec2 using the master branch and have already launched clusters that are identified by the Name tag. After this patch, will those users be able to login to or shut down those clusters? |
For complete backwards-compatibility, maybe we should also set the Name tag? Or just stick with using Name and keep the rest of this PR (the copying of the tag from spot requests to instances, etc)? |
Yep, I was just finding that the updated script can't find my old clusters. |
cfd0391
to
d4f9fae
Compare
Oh right - I launched with this version of the script as well...updated - please try one more time - fingers crossed! |
Jenkins, retest this please. |
QA tests have started for PR 2163 at commit
|
The latest update to the script lets it find my old machines when I start them. However, it does not find spot instances created from the slave machine. |
… 'Launch More Like This' and using Spot Requests
d4f9fae
to
5070a70
Compare
one more time - i mixed up state and status. ugh. |
QA tests have started for PR 2163 at commit
|
@vidaha Yep! It finds the spot instances created from my old machines now. Thanks a lot! |
Is "launch more like this" only a feature for spot instances? Or do we need a fix also for on-demand instances? |
You can launch both spot and on-demand instances using "launch more like this." I haven't tried on-demand ones yet, but I can do that now. |
I talked to Vida, who said that this only affects spot instances, since she couldn't reproduce the issue when launching on-demand instances through "launch more like this." I guess the issue is that the tags are copied from the running instance to the spot request, but not to the launched spot instance. This might be an EC2 bug / issue, since this behavior is counter-intuitive. |
ah okay |
This looks good to me; I'm going to merge it into |
Awesome - thanks - sorry that took a long time, but I think we have settled on the right long term solution IMHO! |
…th "Launch More like this" ... copy the spark_cluster_tag from a spot instance requests over to the instances. Author: Vida Ha <vida@databricks.com> Closes #2163 from vidaha/vida/spark-3213 and squashes the following commits: 5070a70 [Vida Ha] Spark-3214 Fix issue with spark-ec2 not detecting slaves created with 'Launch More Like This' and using Spot Requests (cherry picked from commit 7faf755) Signed-off-by: Josh Rosen <joshrosen@apache.org>
Thanks @vidaha and @jkbradley for helping with this! |
QA tests have finished for PR 2163 at commit
|
QA tests have finished for PR 2163 at commit
|
This reverts #1899 and #2163, two patches that modified `spark-ec2` so that clusters are identified using tags instead of security groups. The original motivation for this patch was to allow multiple clusters to run in the same security group. Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of `spark-ec2` launching instances and crashing before they can be tagged, effectively orphaning those instances. The orphaned instances won't belong to any cluster, so the `spark-ec2` script will be unable to clean them up. Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters. For the 1.1.0 release, though, I propose that we just revert this patch. Author: Josh Rosen <joshrosen@apache.org> Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the following commits: 0c18e86 [Josh Rosen] Revert "SPARK-2333 - spark_ec2 script should allow option for existing security group" c2ca2d4 [Josh Rosen] Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this""
…th "Launch More like this" ... copy the spark_cluster_tag from a spot instance requests over to the instances. Author: Vida Ha <vida@databricks.com> Closes apache#2163 from vidaha/vida/spark-3213 and squashes the following commits: 5070a70 [Vida Ha] Spark-3214 Fix issue with spark-ec2 not detecting slaves created with 'Launch More Like This' and using Spot Requests
... copy the spark_cluster_tag from a spot instance requests over to the instances.