[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags #2225

JoshRosen · 2014-08-31T21:56:07Z

This reverts #1899 and #2163, two patches that modified spark-ec2 so that clusters are identified using tags instead of security groups. The original motivation for this patch was to allow multiple clusters to run in the same security group.

Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of spark-ec2 launching instances and crashing before they can be tagged, effectively orphaning those instances. The orphaned instances won't belong to any cluster, so the spark-ec2 script will be unable to clean them up.

Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters. For the 1.1.0 release, though, I propose that we just revert this patch.

…eated with "Launch More like this"" This reverts commit 3cb4e17.

…g security group" This reverts commit c3952b0.

JoshRosen · 2014-08-31T21:57:21Z

Both patches reverted cleanly using git revert, but I'd still appreciate manual review to make sure I haven't broken anything. Going to test this shortly.

SparkQA · 2014-08-31T21:59:04Z

QA tests have started for PR 2225. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19546/consoleFull

SparkQA · 2014-08-31T23:24:49Z

QA results for PR 2225:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19546/consoleFull

JoshRosen · 2014-08-31T23:28:32Z

Since the original two commits, there were only two intervening commits to spark_ec2.py, c71b5c6 and 0c94a5b, both of which have the changes correctly preserved here.

(see https://github.com/JoshRosen/spark/commits/revert-ec2-cluster-naming/ec2/spark_ec2.py)

shivaram · 2014-09-01T05:53:51Z

LGTM. Hmm - This is unfortunate and something that I feared given the flaky experiences I have had with tags before. I checked the intervening commits too and the reverts look good to me.

Could you also manually test by launching a EC2 cluster just to be doubly sure ?

JoshRosen · 2014-09-01T05:54:54Z

I launched a cluster earlier using this version of the script and shut it down using the v1.0.2 one.

nchammas · 2014-09-01T07:45:43Z

FYI: It looks like branch-1.1 has an out-of-date version of run-tests-jenkins. Might want to update that.

pwendell · 2014-09-02T04:11:30Z

okay cool - let's merge this then (it isn't tested by jenkins anyways) although it would be nice to update that script in branch-1.1.

nchammas · 2014-09-02T17:38:01Z

it would be nice to update that script in branch-1.1

@pwendell I've done this in #2237.

JoshRosen · 2014-09-02T17:48:53Z

I've merged this into branch-1.1.

This reverts #1899 and #2163, two patches that modified `spark-ec2` so that clusters are identified using tags instead of security groups. The original motivation for this patch was to allow multiple clusters to run in the same security group. Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of `spark-ec2` launching instances and crashing before they can be tagged, effectively orphaning those instances. The orphaned instances won't belong to any cluster, so the `spark-ec2` script will be unable to clean them up. Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters. For the 1.1.0 release, though, I propose that we just revert this patch. Author: Josh Rosen <joshrosen@apache.org> Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the following commits: 0c18e86 [Josh Rosen] Revert "SPARK-2333 - spark_ec2 script should allow option for existing security group" c2ca2d4 [Josh Rosen] Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this""

JoshRosen added 2 commits August 31, 2014 14:46

Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves cr…

c2ca2d4

…eated with "Launch More like this"" This reverts commit 3cb4e17.

Revert "SPARK-2333 - spark_ec2 script should allow option for existin…

0c18e86

…g security group" This reverts commit c3952b0.

JoshRosen closed this Sep 2, 2014

andrewor14 mentioned this pull request Nov 26, 2014

[Spark-4509] Revert EC2 tag-based cluster membership patch #3453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags #2225

[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags #2225

JoshRosen commented Aug 31, 2014

JoshRosen commented Aug 31, 2014

SparkQA commented Aug 31, 2014

SparkQA commented Aug 31, 2014

JoshRosen commented Aug 31, 2014

shivaram commented Sep 1, 2014

JoshRosen commented Sep 1, 2014

nchammas commented Sep 1, 2014

pwendell commented Sep 2, 2014

nchammas commented Sep 2, 2014

JoshRosen commented Sep 2, 2014

[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags #2225

[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags #2225

Conversation

JoshRosen commented Aug 31, 2014

JoshRosen commented Aug 31, 2014

SparkQA commented Aug 31, 2014

SparkQA commented Aug 31, 2014

JoshRosen commented Aug 31, 2014

shivaram commented Sep 1, 2014

JoshRosen commented Sep 1, 2014

nchammas commented Sep 1, 2014

pwendell commented Sep 2, 2014

nchammas commented Sep 2, 2014

JoshRosen commented Sep 2, 2014