Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/ec2 fleet integration #471

Merged
merged 26 commits into from
Oct 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d26db7d
Carve instanceManager from instances.go into instance_manager.go
cristim Aug 24, 2021
557589e
Move instanceManager tests to instance_manager_test.go
cristim Aug 24, 2021
2a7c511
Move read-only instance query functions to instance_queries.go
cristim Aug 24, 2021
1ebed81
Move additional code to instance_queries.go and instance_queries_test.go
cristim Aug 24, 2021
8c107ab
Move OD->Spot conversion helpers to instance_conversion.go and instan…
cristim Aug 24, 2021
a9dd78c
Fix linter issues in autoscaling_test.go
cristim Aug 25, 2021
c821b95
Fix linter issues in mock_test.go
cristim Aug 25, 2021
fb6709c
Fix linter issues in region.go and spot_price.go
cristim Aug 25, 2021
2cb7eb6
Convert RunInstances to instant EC2 Fleet API call
cristim Aug 25, 2021
6885f3a
Convert tests for createRunInstancesInput to createLaunchTemplateData
cristim Aug 26, 2021
4d9505b
Small log message fix
cristim Sep 16, 2021
d7805f3
Add additional EC2 mocks
cristim Sep 16, 2021
b35c48f
Implement support for configurable Spot allocation strategies
cristim Sep 16, 2021
30d7fa3
Move small utility functions to util.go
cristim Sep 16, 2021
2c64c16
Fix codeclimate issue
cristim Sep 16, 2021
99bbb46
Extract complex if condition into its own function, pass instance typ…
cristim Sep 16, 2021
6d4e40d
Further simplifications for codeclimate
cristim Sep 16, 2021
5fde799
Use latest version of golang and build for amd64
cristim Sep 16, 2021
8812c3f
Expose spot_allocation_strategy on CloudFormation
cristim Sep 16, 2021
21b09e7
Ensure the AMI ID comes from the LaunchConfiguration/Template
cristim Sep 16, 2021
d6e6814
Add required IAM permissions
cristim Sep 16, 2021
46dd23f
Set priority for capacity-optimized-prioritized
cristim Sep 16, 2021
13fcb35
Pass missing LaunchTemplate fields, such as UserData and KeyName
cristim Sep 29, 2021
af23f69
Expand test coverage
cristim Oct 1, 2021
54ffb80
Document the capacity-optimized prioritized
cristim Oct 1, 2021
9207c67
Small readme changes
cristim Oct 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
FROM golang:1.16-alpine as golang
FROM golang:alpine as golang
RUN apk add -U --no-cache ca-certificates git make
COPY . /src
WORKDIR /src
RUN FLAVOR=nightly CGO_ENABLED=0 GOPROXY=direct make
RUN GOARCH=amd64 FLAVOR=nightly CGO_ENABLED=0 GOPROXY=direct make

FROM scratch
COPY LICENSE BINARY_LICENSE THIRDPARTY /
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.build
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM golang:1.16-alpine
FROM golang:alpine

ARG flavor

Expand Down
4 changes: 2 additions & 2 deletions Dockerfile.marketplace
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
FROM golang:1.16-alpine as golang
FROM golang:alpine as golang
RUN apk add -U --no-cache ca-certificates git make
COPY . /src
WORKDIR /src
RUN FLAVOR=stable CGO_ENABLED=0 GOPROXY=direct make

FROM alpine:3.14.1
FROM alpine:latest
COPY LICENSE BINARY_LICENSE THIRDPARTY /
COPY --from=golang /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=golang /src/AutoSpotting .
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ BUILD := $(DOCKER_IMAGE_VERSION)-$(FLAVOR)-$(SHA)
EXPIRATION := $(shell go run ./scripts/expiration_date.go)
SAVINGS_CUT ?= 5

GOARCH ?= amd64

ifneq ($(FLAVOR), custom)
LICENSE_FILES += BINARY_LICENSE
endif
Expand Down
41 changes: 24 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,22 @@ It is usually set up to monitor existing long-running AutoScaling groups,
replacing their instances with Spot instances with minimal configuration
changes.

Often all it needs is just tagging them with `spot-enabled=true`, but
even that can be avoided in some cases, yielding the usual 70%-90% Spot cost
Often all it needs is just tagging them with `spot-enabled=true`, (in some cases
even that can be avoided), yielding the usual 70%-90% Spot cost
savings but in a better integrated and easier to adopt way
than other alternative tools and solutions, especially if you run infrastructure
that for whatever reasons you can't afford to update to Spot by other means.
than other alternative tools and solutions.

It is particularly useful if you have a large footprint that you want to migrate
to Spot quickly due to management pressure but with minimal effort and configuration
changes.

## Guiding principles ##

- Customer-focused, designed to maximize user benefits and reduce adoption friction
- Safe and secure, hosted in your AWS account and with minimal required set of IAM permissions
- Auditable OSS code base developed in the open
- Inexpensive, easy to install and supported builds offered through the AWS Marketplace
- Simple, minimalist implementation

## How does it work? ##

Expand All @@ -45,22 +56,20 @@ replaced with spot clones within seconds of being launched.

If this fails temporarily due to insufficient spot capacity, AutoSpotting will
continuously attempt to replace them every few minutes until successful after
spot capacity becomes available again. When launching Spot instances, the
compatible instance types are attempted in increasing order of their price,
until one is successfully launched, lazily achieving diversification in case of
temporary unavailability of certain instance types.
spot capacity becomes available again.

When launching Spot instances, the compatible instance types are chosen by
default using a the
[capacity-optimized-prioritized](https://docs.amazonaws.cn/en_us/AWSEC2/latest/UserGuide/ec2-fleet-examples.html#ec2-fleet-config11)
allocation strategy, which is given a list of instance types sorted by price. This
configuration offers a good tradeoff between low cost and significantly reduced
interruption rates. The lowest-price allocation strategy is still available as a
configuration option.

This process can partly be seen in action below, you can click to expand the animation:

![Workflow](https://autospotting.org/img/autospotting.gif)

Additionally, it implements some advanced logic that is aware of spot and on
demand prices, including for different spot products and configurable discounts
for reserved instances or large volume customers. It also considers the specs of
all instance types and automatically launches the cheapest available instance
types based on flexible configuration set globally or overridden at the group
level using additional tags, but these overrides are often not needed.

A single installation can handle all enabled groups from an entire AWS account in
parallel across all available AWS regions, but it can be restricted to fewer
regions if desired in certain situations.
Expand All @@ -75,8 +84,6 @@ the traffic would automatically be drained on termination.
The savings it generates are in the 60-90% range usually seen when using spot
instances, but they may vary depending on region and instance type.

![Savings](https://autospotting.org/img/savings.png)

## What's under the hood? ##

The entire logic described above is implemented in a set of Lambda functions
Expand Down
2 changes: 2 additions & 0 deletions autospotting.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ var conf autospotting.Config

// Version represents the build version being used
var Version = "number missing"

// SavingsCut stores the saving percentage charged for the stable builds
var SavingsCut = "0"

// ExpirationDate represents the date at which the version will expire
Expand Down
18 changes: 18 additions & 0 deletions cloudformation/stacks/AutoSpotting/template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,19 @@
bucket stored on another region, but it can process AutoScaling groups
from any other regions. Example: 'us-east-1,eu-west-1'"
Type: CommaDelimitedList
SpotAllocationStrategy:
Type: "String"
Description: >
"Controls the Spot allocation strategy for
launching Spot instances. Allowed options:
'capacity-optimized-prioritized' (default), 'capacity-optimized',
'lowest-price'. Further information on this is available at
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet-allocation-strategy.html"
AllowedValues:
- "capacity-optimized-prioritized"
- "capacity-optimized"
- "lowest-price"
Default: "capacity-optimized-prioritized"
SpotPricePercentageBuffer:
Default: "10.0"
Description: >
Expand Down Expand Up @@ -382,6 +395,8 @@
Fn::Join:
- ","
- Ref: "Regions"
SPOT_ALLOCATION_STRATEGY:
Ref: SpotAllocationStrategy
SPOT_PRICE_BUFFER_PERCENTAGE:
Ref: "SpotPricePercentageBuffer"
SPOT_PRODUCT_DESCRIPTION:
Expand Down Expand Up @@ -435,6 +450,9 @@
- "aws-marketplace:RegisterUsage"
- "cloudformation:Describe*"
- "ec2:CreateTags"
- "ec2:CreateLaunchTemplate"
- "ec2:CreateFleet"
- "ec2:DeleteLaunchTemplate"
- "ec2:DeleteTags"
- "ec2:DescribeImages"
- "ec2:DescribeInstanceAttribute"
Expand Down
2 changes: 1 addition & 1 deletion core/action.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ type launchSpotReplacement struct {
func (lsr launchSpotReplacement) run() {
spotInstanceID, err := lsr.target.onDemandInstance.launchSpotReplacement()
if err != nil {
log.Printf("Could not launch cheapest spot instance: %s", err)
log.Printf("Could not launch replacement spot instance: %s", err)
return
}
log.Printf("Successfully launched spot instance %s, exiting...", *spotInstanceID)
Expand Down
9 changes: 4 additions & 5 deletions core/autoscaling.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ type autoScalingGroup struct {
launchConfiguration *launchConfiguration
launchTemplate *launchTemplate
instances instances
minOnDemand int64
config AutoScalingConfig
}

Expand Down Expand Up @@ -121,20 +120,20 @@ func (a *autoScalingGroup) loadLaunchTemplate() (*launchTemplate, error) {
func (a *autoScalingGroup) needReplaceOnDemandInstances() (bool, int64) {
onDemandRunning, totalRunning := a.alreadyRunningInstanceCount(false, nil)
debug.Printf("onDemandRunning=%v totalRunning=%v a.minOnDemand=%v",
onDemandRunning, totalRunning, a.minOnDemand)
onDemandRunning, totalRunning, a.config.MinOnDemand)

if totalRunning == 0 {
log.Printf("The group %s is currently empty or in the process of launching new instances",
a.name)
return true, totalRunning
}

if onDemandRunning > a.minOnDemand {
if onDemandRunning > a.config.MinOnDemand {
log.Println("Currently more than enough OnDemand instances running")
return true, totalRunning
}

if onDemandRunning == a.minOnDemand {
if onDemandRunning == a.config.MinOnDemand {
log.Println("Currently OnDemand running equals to the required number, skipping run")
return false, totalRunning
}
Expand All @@ -150,7 +149,7 @@ func (a *autoScalingGroup) terminateRandomSpotInstanceIfHavingEnough(totalRunnin
}

if allInstancesAreRunning, onDemandRunning := a.allInstancesRunning(); allInstancesAreRunning {
if a.instances.count64() == *a.DesiredCapacity && onDemandRunning == a.minOnDemand {
if a.instances.count64() == *a.DesiredCapacity && onDemandRunning == a.config.MinOnDemand {
log.Println("Currently Spot running equals to the required number, skipping termination")
return nil
}
Expand Down
Loading