Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudprovider: add DigitalOcean #2245

Merged
merged 2 commits into from
Aug 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion cluster-autoscaler/cloudprovider/builder/builder_all.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// +build !gce,!aws,!azure,!kubemark,!alicloud,!magnum
// +build !gce,!aws,!azure,!kubemark,!alicloud,!magnum,!digitalocean

/*
Copyright 2018 The Kubernetes Authors.
Expand All @@ -24,6 +24,7 @@ import (
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/aws"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/baiducloud"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/digitalocean"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/gce"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/magnum"
"k8s.io/autoscaler/cluster-autoscaler/config"
Expand Down Expand Up @@ -54,6 +55,8 @@ func buildCloudProvider(opts config.AutoscalingOptions, do cloudprovider.NodeGro
return alicloud.BuildAlicloud(opts, do, rl)
case cloudprovider.BaiducloudProviderName:
return baiducloud.BuildBaiducloud(opts, do, rl)
case cloudprovider.DigitalOceanProviderName:
return digitalocean.BuildDigitalOcean(opts, do, rl)
case cloudprovider.MagnumProviderName:
return magnum.BuildMagnum(opts, do, rl)
}
Expand Down
42 changes: 42 additions & 0 deletions cluster-autoscaler/cloudprovider/builder/builder_digitalocean.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
// +build digitalocean

/*
Copyright 2019 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package builder

import (
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/digitalocean"
"k8s.io/autoscaler/cluster-autoscaler/config"
)

// AvailableCloudProviders supported by the digtalocean cloud provider builder.
var AvailableCloudProviders = []string{
digitalocean.ProviderName,
}

// DefaultCloudProvider for do-only build is DigitalOcean.
const DefaultCloudProvider = digitalocean.ProviderName

func buildCloudProvider(opts config.AutoscalingOptions, do cloudprovider.NodeGroupDiscoveryOptions, rl *cloudprovider.ResourceLimiter) cloudprovider.CloudProvider {
switch opts.CloudProviderName {
case digitalocean.ProviderName:
return digitalocean.BuildDigitalOcean(opts, do, rl)
}

return nil
}
2 changes: 2 additions & 0 deletions cluster-autoscaler/cloudprovider/cloud_provider.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ const (
AwsProviderName = "aws"
// BaiducloudProviderName gets the provider name of baiducloud
BaiducloudProviderName = "baiducloud"
// DigitalOceanProviderName gets the provider name of digitalocean
DigitalOceanProviderName = "digitalocean"
// GceProviderName gets the provider name of gce
GceProviderName = "gce"
// MagnumProviderName gets the provider name of magnum
Expand Down
6 changes: 6 additions & 0 deletions cluster-autoscaler/cloudprovider/digitalocean/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
approvers:
- andrewsykim
reviewers:
- andrewsykim


58 changes: 58 additions & 0 deletions cluster-autoscaler/cloudprovider/digitalocean/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Cluster Autoscaler for DigitalOcean

The cluster autoscaler for DigitalOcean scales worker nodes within any
specified DigitalOcean Kubernetes cluster's node pool. This is part of the DOKS
offering which can be enabled/disable dynamically for an existing cluster.

# Configuration

The `cluster-autoscaler` dynamically runs based on tags associated with node
pools. These are the current valid tags:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely different from how you run CA on different providers. Any particular reason for this design?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This let us dynamically enable/disable the cluster-autoscaler and make it by default as opt-out. So to opt-in you need to add those tags. Because our UI is not ready yet, this will allow some customers to enable it by adding the tags themself. Also this doesn't require us fiddle with CA flag arguments. It's easier to maintain and ship cluster-autoscaler and requires zero to minimum interaction once it's deployed. Is there something you believe is wrong in this approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't realize you're planning to host CA pod (or at least start it automatically). I don't think there is anything inherently wrong with your approach, it's just inconsistent with other providers.
Given that autodiscovery already works differently for each provider I don't think it's a blocker for merging this either. Just a suboptimal experience for someone moving between clouds / running multicloud / etc.


```
k8s-cluster-autoscaler-enabled:true
k8s-cluster-autoscaler-min:3
k8s-cluster-autoscaler-max:10
```

The syntax is in form of `key:value`.

* If `k8s-cluster-autoscaler-enabled:true` is absent or
`k8s-cluster-autoscaler-enabled` is **not** set to `true`, the
`cluster-autoscaler` will not process the node pool by default.
* To set the minimum number of nodes to use `k8s-cluster-autoscaler-min`
* To set the maximum number of nodes to use `k8s-cluster-autoscaler-max`


If you don't set the minimum and maximum tags, node pools will have the
following default limits:

```
minimum number of nodes: 1
maximum number of nodes: 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is significantly different UX from other providers - I think all (?) existing providers require explicitly setting at least max limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default limit for a given node group right now. It's out upper limit. I can change it to 50 or something else if that makes more sense to you? In the UI (cloud.digitalocean.com) we're going to expose a UI that will have sensible pre-default values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any specific recommendation for the value. I was referring to the fact that there is a default at all, rather than an explicit requirement to specify a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. This is just a way of limiting the hard maximum we allow. The user experience will be different and our API will validate requests.

```

# Development

Make sure you're inside the root path of the [autoscaler
repository](https://github.com/kubernetes/autoscaler)

1.) Build the `cluster-autoscaler` binary:


```
make build-in-docker
```

2.) Build the docker image:

```
docker build -t digitalocean/cluster-autoscaler:dev .
```


3.) Push the docker image to Docker hub:

```
docker push digitalocean/cluster-autoscaler:dev
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
/*
Copyright 2019 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package digitalocean

import (
"io"
"os"

apiv1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
"k8s.io/autoscaler/cluster-autoscaler/utils/errors"
"k8s.io/klog"
)

var _ cloudprovider.CloudProvider = (*digitaloceanCloudProvider)(nil)
MaciekPytel marked this conversation as resolved.
Show resolved Hide resolved

const (
// GPULabel is the label added to nodes with GPU resource.
GPULabel = "cloud.digitalocean.com/gpu-node"
)

// digitaloceanCloudProvider implements CloudProvider interface.
type digitaloceanCloudProvider struct {
manager *Manager
resourceLimiter *cloudprovider.ResourceLimiter
}

func newDigitalOceanCloudProvider(manager *Manager, rl *cloudprovider.ResourceLimiter) (*digitaloceanCloudProvider, error) {
if err := manager.Refresh(); err != nil {
return nil, err
}

return &digitaloceanCloudProvider{
manager: manager,
resourceLimiter: rl,
}, nil
}

// Name returns name of the cloud provider.
func (d *digitaloceanCloudProvider) Name() string {
return cloudprovider.DigitalOceanProviderName
}

// NodeGroups returns all node groups configured for this cloud provider.
func (d *digitaloceanCloudProvider) NodeGroups() []cloudprovider.NodeGroup {
nodeGroups := make([]cloudprovider.NodeGroup, len(d.manager.nodeGroups))
for i, ng := range d.manager.nodeGroups {
nodeGroups[i] = ng
}
return nodeGroups
}

// NodeGroupForNode returns the node group for the given node, nil if the node
// should not be processed by cluster autoscaler, or non-nil error if such
// occurred. Must be implemented.
func (d *digitaloceanCloudProvider) NodeGroupForNode(node *apiv1.Node) (cloudprovider.NodeGroup, error) {
nodeID, ok := node.Labels[nodeIDLabel]
if !ok {
// CA creates fake node objects to represent upcoming VMs that haven't
// registered as nodes yet. They have node.Spec.ProviderID set. Use
// that as nodeID.
nodeID = node.Spec.ProviderID
}

klog.V(5).Infof("checking nodegroup for node ID: %q", nodeID)

// NOTE(arslan): the number of node groups per cluster is usually very
// small. So even though this looks like quadratic runtime, it's OK to
// proceed with this.
for _, group := range d.manager.nodeGroups {
klog.V(5).Infof("iterating over node group %q", group.Id())
nodes, err := group.Nodes()
if err != nil {
return nil, err
}

for _, node := range nodes {
klog.V(6).Infof("checking node have: %q want: %q", node.Id, nodeID)
if node.Id != nodeID {
continue
}

return group, nil
}
}

// there is no "ErrNotExist" error, so we have to return a nil error
return nil, nil
}

// Pricing returns pricing model for this cloud provider or error if not
// available. Implementation optional.
func (d *digitaloceanCloudProvider) Pricing() (cloudprovider.PricingModel, errors.AutoscalerError) {
return nil, cloudprovider.ErrNotImplemented
}

// GetAvailableMachineTypes get all machine types that can be requested from
// the cloud provider. Implementation optional.
func (d *digitaloceanCloudProvider) GetAvailableMachineTypes() ([]string, error) {
return []string{}, nil
}

// NewNodeGroup builds a theoretical node group based on the node definition
// provided. The node group is not automatically created on the cloud provider
// side. The node group is not returned by NodeGroups() until it is created.
// Implementation optional.
func (d *digitaloceanCloudProvider) NewNodeGroup(
machineType string,
labels map[string]string,
systemLabels map[string]string,
taints []apiv1.Taint,
extraResources map[string]resource.Quantity,
) (cloudprovider.NodeGroup, error) {
return nil, cloudprovider.ErrNotImplemented
}

// GetResourceLimiter returns struct containing limits (max, min) for
// resources (cores, memory etc.).
func (d *digitaloceanCloudProvider) GetResourceLimiter() (*cloudprovider.ResourceLimiter, error) {
return d.resourceLimiter, nil
}

// GPULabel returns the label added to nodes with GPU resource.
func (d *digitaloceanCloudProvider) GPULabel() string {
return GPULabel
}

// GetAvailableGPUTypes return all available GPU types cloud provider supports.
func (d *digitaloceanCloudProvider) GetAvailableGPUTypes() map[string]struct{} {
return nil
}

// Cleanup cleans up open resources before the cloud provider is destroyed,
// i.e. go routines etc.
func (d *digitaloceanCloudProvider) Cleanup() error {
return nil
}

// Refresh is called before every main loop and can be used to dynamically
// update cloud provider state. In particular the list of node groups returned
// by NodeGroups() can change as a result of CloudProvider.Refresh().
func (d *digitaloceanCloudProvider) Refresh() error {
klog.V(4).Info("Refreshing node group cache")
return d.manager.Refresh()
}

// BuildDigitalOcean builds the DigitalOcean cloud provider.
func BuildDigitalOcean(
opts config.AutoscalingOptions,
do cloudprovider.NodeGroupDiscoveryOptions,
rl *cloudprovider.ResourceLimiter,
) cloudprovider.CloudProvider {
var configFile io.ReadCloser
if opts.CloudConfig != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that error/fatal if CloudConfig == ""?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resembles the same layout as the other cloud providers. It doesn't have to error, because in the next step , newManager() will throw an error because of an invalid configuration.

var err error
configFile, err = os.Open(opts.CloudConfig)
if err != nil {
klog.Fatalf("Couldn't open cloud provider configuration %s: %#v", opts.CloudConfig, err)
}
defer configFile.Close()
}

manager, err := newManager(configFile)
if err != nil {
klog.Fatalf("Failed to create DigitalOcean manager: %v", err)
}

// the cloud provider automatically uses all node pools in DigitalOcean.
// This means we don't use the cloudprovider.NodeGroupDiscoveryOptions
// flags (which can be set via '--node-group-auto-discovery' or '-nodes')
provider, err := newDigitalOceanCloudProvider(manager, rl)
if err != nil {
klog.Fatalf("Failed to create DigitalOcean cloud provider: %v", err)
}

return provider
}
Loading