-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloudprovider: add DigitalOcean #2245
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
// +build digitalocean | ||
|
||
/* | ||
Copyright 2019 The Kubernetes Authors. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package builder | ||
|
||
import ( | ||
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider" | ||
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/digitalocean" | ||
"k8s.io/autoscaler/cluster-autoscaler/config" | ||
) | ||
|
||
// AvailableCloudProviders supported by the digtalocean cloud provider builder. | ||
var AvailableCloudProviders = []string{ | ||
digitalocean.ProviderName, | ||
} | ||
|
||
// DefaultCloudProvider for do-only build is DigitalOcean. | ||
const DefaultCloudProvider = digitalocean.ProviderName | ||
|
||
func buildCloudProvider(opts config.AutoscalingOptions, do cloudprovider.NodeGroupDiscoveryOptions, rl *cloudprovider.ResourceLimiter) cloudprovider.CloudProvider { | ||
switch opts.CloudProviderName { | ||
case digitalocean.ProviderName: | ||
return digitalocean.BuildDigitalOcean(opts, do, rl) | ||
} | ||
|
||
return nil | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
approvers: | ||
- andrewsykim | ||
reviewers: | ||
- andrewsykim | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# Cluster Autoscaler for DigitalOcean | ||
|
||
The cluster autoscaler for DigitalOcean scales worker nodes within any | ||
specified DigitalOcean Kubernetes cluster's node pool. This is part of the DOKS | ||
offering which can be enabled/disable dynamically for an existing cluster. | ||
|
||
# Configuration | ||
|
||
The `cluster-autoscaler` dynamically runs based on tags associated with node | ||
pools. These are the current valid tags: | ||
|
||
``` | ||
k8s-cluster-autoscaler-enabled:true | ||
k8s-cluster-autoscaler-min:3 | ||
k8s-cluster-autoscaler-max:10 | ||
``` | ||
|
||
The syntax is in form of `key:value`. | ||
|
||
* If `k8s-cluster-autoscaler-enabled:true` is absent or | ||
`k8s-cluster-autoscaler-enabled` is **not** set to `true`, the | ||
`cluster-autoscaler` will not process the node pool by default. | ||
* To set the minimum number of nodes to use `k8s-cluster-autoscaler-min` | ||
* To set the maximum number of nodes to use `k8s-cluster-autoscaler-max` | ||
|
||
|
||
If you don't set the minimum and maximum tags, node pools will have the | ||
following default limits: | ||
|
||
``` | ||
minimum number of nodes: 1 | ||
maximum number of nodes: 200 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is significantly different UX from other providers - I think all (?) existing providers require explicitly setting at least max limit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the default limit for a given node group right now. It's out upper limit. I can change it to 50 or something else if that makes more sense to you? In the UI (cloud.digitalocean.com) we're going to expose a UI that will have sensible pre-default values. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have any specific recommendation for the value. I was referring to the fact that there is a default at all, rather than an explicit requirement to specify a value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see what you mean. This is just a way of limiting the hard maximum we allow. The user experience will be different and our API will validate requests. |
||
``` | ||
|
||
# Development | ||
|
||
Make sure you're inside the root path of the [autoscaler | ||
repository](https://github.com/kubernetes/autoscaler) | ||
|
||
1.) Build the `cluster-autoscaler` binary: | ||
|
||
|
||
``` | ||
make build-in-docker | ||
``` | ||
|
||
2.) Build the docker image: | ||
|
||
``` | ||
docker build -t digitalocean/cluster-autoscaler:dev . | ||
``` | ||
|
||
|
||
3.) Push the docker image to Docker hub: | ||
|
||
``` | ||
docker push digitalocean/cluster-autoscaler:dev | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
/* | ||
Copyright 2019 The Kubernetes Authors. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package digitalocean | ||
|
||
import ( | ||
"io" | ||
"os" | ||
|
||
apiv1 "k8s.io/api/core/v1" | ||
"k8s.io/apimachinery/pkg/api/resource" | ||
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider" | ||
"k8s.io/autoscaler/cluster-autoscaler/config" | ||
"k8s.io/autoscaler/cluster-autoscaler/utils/errors" | ||
"k8s.io/klog" | ||
) | ||
|
||
var _ cloudprovider.CloudProvider = (*digitaloceanCloudProvider)(nil) | ||
MaciekPytel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
const ( | ||
// GPULabel is the label added to nodes with GPU resource. | ||
GPULabel = "cloud.digitalocean.com/gpu-node" | ||
) | ||
|
||
// digitaloceanCloudProvider implements CloudProvider interface. | ||
type digitaloceanCloudProvider struct { | ||
manager *Manager | ||
resourceLimiter *cloudprovider.ResourceLimiter | ||
} | ||
|
||
func newDigitalOceanCloudProvider(manager *Manager, rl *cloudprovider.ResourceLimiter) (*digitaloceanCloudProvider, error) { | ||
if err := manager.Refresh(); err != nil { | ||
return nil, err | ||
} | ||
|
||
return &digitaloceanCloudProvider{ | ||
manager: manager, | ||
resourceLimiter: rl, | ||
}, nil | ||
} | ||
|
||
// Name returns name of the cloud provider. | ||
func (d *digitaloceanCloudProvider) Name() string { | ||
return cloudprovider.DigitalOceanProviderName | ||
} | ||
|
||
// NodeGroups returns all node groups configured for this cloud provider. | ||
func (d *digitaloceanCloudProvider) NodeGroups() []cloudprovider.NodeGroup { | ||
nodeGroups := make([]cloudprovider.NodeGroup, len(d.manager.nodeGroups)) | ||
for i, ng := range d.manager.nodeGroups { | ||
nodeGroups[i] = ng | ||
} | ||
return nodeGroups | ||
} | ||
|
||
// NodeGroupForNode returns the node group for the given node, nil if the node | ||
// should not be processed by cluster autoscaler, or non-nil error if such | ||
// occurred. Must be implemented. | ||
func (d *digitaloceanCloudProvider) NodeGroupForNode(node *apiv1.Node) (cloudprovider.NodeGroup, error) { | ||
nodeID, ok := node.Labels[nodeIDLabel] | ||
if !ok { | ||
// CA creates fake node objects to represent upcoming VMs that haven't | ||
// registered as nodes yet. They have node.Spec.ProviderID set. Use | ||
// that as nodeID. | ||
nodeID = node.Spec.ProviderID | ||
} | ||
|
||
klog.V(5).Infof("checking nodegroup for node ID: %q", nodeID) | ||
|
||
// NOTE(arslan): the number of node groups per cluster is usually very | ||
// small. So even though this looks like quadratic runtime, it's OK to | ||
// proceed with this. | ||
for _, group := range d.manager.nodeGroups { | ||
klog.V(5).Infof("iterating over node group %q", group.Id()) | ||
nodes, err := group.Nodes() | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
for _, node := range nodes { | ||
klog.V(6).Infof("checking node have: %q want: %q", node.Id, nodeID) | ||
if node.Id != nodeID { | ||
continue | ||
} | ||
|
||
return group, nil | ||
} | ||
} | ||
|
||
// there is no "ErrNotExist" error, so we have to return a nil error | ||
return nil, nil | ||
} | ||
|
||
// Pricing returns pricing model for this cloud provider or error if not | ||
// available. Implementation optional. | ||
func (d *digitaloceanCloudProvider) Pricing() (cloudprovider.PricingModel, errors.AutoscalerError) { | ||
return nil, cloudprovider.ErrNotImplemented | ||
} | ||
|
||
// GetAvailableMachineTypes get all machine types that can be requested from | ||
// the cloud provider. Implementation optional. | ||
func (d *digitaloceanCloudProvider) GetAvailableMachineTypes() ([]string, error) { | ||
return []string{}, nil | ||
} | ||
|
||
// NewNodeGroup builds a theoretical node group based on the node definition | ||
// provided. The node group is not automatically created on the cloud provider | ||
// side. The node group is not returned by NodeGroups() until it is created. | ||
// Implementation optional. | ||
func (d *digitaloceanCloudProvider) NewNodeGroup( | ||
machineType string, | ||
labels map[string]string, | ||
systemLabels map[string]string, | ||
taints []apiv1.Taint, | ||
extraResources map[string]resource.Quantity, | ||
) (cloudprovider.NodeGroup, error) { | ||
return nil, cloudprovider.ErrNotImplemented | ||
} | ||
|
||
// GetResourceLimiter returns struct containing limits (max, min) for | ||
// resources (cores, memory etc.). | ||
func (d *digitaloceanCloudProvider) GetResourceLimiter() (*cloudprovider.ResourceLimiter, error) { | ||
return d.resourceLimiter, nil | ||
} | ||
|
||
// GPULabel returns the label added to nodes with GPU resource. | ||
func (d *digitaloceanCloudProvider) GPULabel() string { | ||
return GPULabel | ||
} | ||
|
||
// GetAvailableGPUTypes return all available GPU types cloud provider supports. | ||
func (d *digitaloceanCloudProvider) GetAvailableGPUTypes() map[string]struct{} { | ||
return nil | ||
} | ||
|
||
// Cleanup cleans up open resources before the cloud provider is destroyed, | ||
// i.e. go routines etc. | ||
func (d *digitaloceanCloudProvider) Cleanup() error { | ||
return nil | ||
} | ||
|
||
// Refresh is called before every main loop and can be used to dynamically | ||
// update cloud provider state. In particular the list of node groups returned | ||
// by NodeGroups() can change as a result of CloudProvider.Refresh(). | ||
func (d *digitaloceanCloudProvider) Refresh() error { | ||
klog.V(4).Info("Refreshing node group cache") | ||
return d.manager.Refresh() | ||
} | ||
|
||
// BuildDigitalOcean builds the DigitalOcean cloud provider. | ||
func BuildDigitalOcean( | ||
opts config.AutoscalingOptions, | ||
do cloudprovider.NodeGroupDiscoveryOptions, | ||
rl *cloudprovider.ResourceLimiter, | ||
) cloudprovider.CloudProvider { | ||
var configFile io.ReadCloser | ||
if opts.CloudConfig != "" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't that error/fatal if CloudConfig == ""? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This resembles the same layout as the other cloud providers. It doesn't have to error, because in the next step , |
||
var err error | ||
configFile, err = os.Open(opts.CloudConfig) | ||
if err != nil { | ||
klog.Fatalf("Couldn't open cloud provider configuration %s: %#v", opts.CloudConfig, err) | ||
} | ||
defer configFile.Close() | ||
} | ||
|
||
manager, err := newManager(configFile) | ||
if err != nil { | ||
klog.Fatalf("Failed to create DigitalOcean manager: %v", err) | ||
} | ||
|
||
// the cloud provider automatically uses all node pools in DigitalOcean. | ||
// This means we don't use the cloudprovider.NodeGroupDiscoveryOptions | ||
// flags (which can be set via '--node-group-auto-discovery' or '-nodes') | ||
provider, err := newDigitalOceanCloudProvider(manager, rl) | ||
if err != nil { | ||
klog.Fatalf("Failed to create DigitalOcean cloud provider: %v", err) | ||
} | ||
|
||
return provider | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely different from how you run CA on different providers. Any particular reason for this design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This let us dynamically enable/disable the cluster-autoscaler and make it by default as
opt-out
. So to opt-in you need to add those tags. Because our UI is not ready yet, this will allow some customers to enable it by adding the tags themself. Also this doesn't require us fiddle with CA flag arguments. It's easier to maintain and ship cluster-autoscaler and requires zero to minimum interaction once it's deployed. Is there something you believe is wrong in this approach?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize you're planning to host CA pod (or at least start it automatically). I don't think there is anything inherently wrong with your approach, it's just inconsistent with other providers.
Given that autodiscovery already works differently for each provider I don't think it's a blocker for merging this either. Just a suboptimal experience for someone moving between clouds / running multicloud / etc.