Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automated resource request scaling #628

Closed

Conversation

dhellmann
Copy link
Contributor

This enhancement describes an approach to allow us to scale the
resource requests for the control plane services to reduce consumption
for constrained environments. This will be especially useful for
single-node production deployments, where the user wants to reserve
most of the CPU resources for their own workloads and needs to
configure OpenShift to run on a fixed number of CPUs within the host.

/cc @smarterclayton @derekwaynecarr @markmc @browsell

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2021
@openshift-ci-robot
Copy link

@dhellmann: GitHub didn't allow me to request PR reviews from the following users: browsell.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This enhancement describes an approach to allow us to scale the
resource requests for the control plane services to reduce consumption
for constrained environments. This will be especially useful for
single-node production deployments, where the user wants to reserve
most of the CPU resources for their own workloads and needs to
configure OpenShift to run on a fixed number of CPUs within the host.

/cc @smarterclayton @derekwaynecarr @markmc @browsell

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch from 005b965 to 290b755 Compare February 4, 2021 17:10
@russellb
Copy link
Member

russellb commented Feb 5, 2021

/priority important-soon

@openshift-ci-robot openshift-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 5, 2021
@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch 4 times, most recently from e9cca10 to 684e6c5 Compare February 11, 2021 16:04
@dhellmann
Copy link
Contributor Author

/retest

@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch from 684e6c5 to e61d7e7 Compare February 15, 2021 19:09
@dhellmann dhellmann changed the title WIP: automated resource request scaling automated resource request scaling Feb 15, 2021
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 15, 2021
@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch from e61d7e7 to 151c4a0 Compare February 15, 2021 19:13
@dhellmann
Copy link
Contributor Author

I have updated the enhancement with more details in the proposal section. Please take another look.

@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch from 151c4a0 to 2fdf015 Compare February 16, 2021 18:41
3. Have a new operator in the release payload perform the scaling for
control plane operators

It would use the new API field to determine how to scale (see above).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have some operators for it - https://github.com/openshift/cluster-resource-override-admission-operator, we can improve it and add new API fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we can modify this section to say it doesn't need to be in the release payload but that it would need to be installed separately and link to the existing operator.

We'll also need to say that each openshift operator will need to make the appropriate adjustments to ignore changes to the resource requests to avoid contention with the cluster-resource-override-admission web hook. It looks like the way to enable the admission web hook is to add a label to the namespace and then update the code to ignore resource requests on operands when comparing for equality. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cynepco3hahue The latest draft elevates the cluster-resource-override-admission-operator to the primary approach being proposed. Let me know what you think.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch from df0a664 to 251add0 Compare March 8, 2021 14:42
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from dhellmann after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dhellmann
Copy link
Contributor Author

The latest draft of this enhancement reflects the changes to the approach that we came up with today, using https://github.com/openshift/cluster-resource-override-admission. There are still some details to work out, but I think the basics are all here now. All of the old approaches have been folded into the alternatives section at the end of the document.

@derekwaynecarr @browsell @markmc please take a look and let me know if I've missed anything.

@dhellmann
Copy link
Contributor Author

I forgot to mention that I left the changes in a separate patch in case that made reviewing easier. I will squash the commits before we merge this enhancement.

@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch 3 times, most recently from 56a59a3 to c969eba Compare March 12, 2021 21:56
Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
@dhellmann dhellmann force-pushed the automated-resource-request-scaling branch from c969eba to ceeca77 Compare March 12, 2021 22:09
a tight timeline for delivering the work described in this
enhancement, that may help with scheduling.

1. Add a new field to `ClusterResourceOverride` API with a name like

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a general case where a pod has a limit set, would need to decide the precendence if both cpuRequestPercent and cpuRequestToLimitPercent are set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It seems like the limit-based scaling (the existing behavior) should take precedence, and the new field should be ignored if the old field is present. What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@dhellmann
Copy link
Contributor Author

/cc @rphillips @sjenning

@deads2k
Copy link
Contributor

deads2k commented Mar 16, 2021

This enhancement doesn't appear to mention the runlevel restrictions created in the clusterresourceoverride's MutatingWebhookConfiguration which allows us to ensure re-bootstrapping a cluster that has been shut down. We label namespaces in openshift with openshift.io/run-level=0 and openshift.io/run-level=1 to indicate namespaces that are required to have pods before admission webhooks can be called.

This includes 36 out of 61 openshift-* namespaces on our 4.7 CI cluster. Removing this restriction to shrink the pod resource requests/limits would be fraught.

One example flow that will work in a demo and be at risk in a production cluster is

  1. install cluster
  2. stop skipping runlevel=0 namespaces. this will include the SDN and kube-apiserver namespaces, among others
  3. delete pods in these namespaces and they will be restarted successfully
  4. shutdown the cluster
  5. start the cluster
  6. there is no SDN running
  7. etcd starts from a static pod
  8. the kube-apiserver starts from a static pod
  9. the kube-controller-manager starts from a static pod
  10. the kube-controller-manager tries to create a daemonset pod for the SDN
  11. the kube-apiserver enforces admission webhook configuration, which tries to call the cluster-resource-overrides admission plugin
  12. there is no pod running for the cluster-resource-overrides admission plugin and even if there was, it would unreachable because the SDN pod doesn't exist
  13. the cluster is live-locked without the ability to run pods

We created the runlevels and skips to ensure that we did not wedge in such a situation. While it may be possible to tune the labeling slightly tighter for a single-node use case (I don't know one way or the other), having an accident at this level can result in situations where clusters cannot be safely restarted.

Knowing that more than half of the namespaces we have today will not be restricted and knowing that we need a way to scale resource requests and limits on pods in these 36 namespaces and on static pods, having a single solution for the additional 25 namespaces doesn't seem so bad.

dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 18, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Mar 19, 2021
This enhancement presents an alternative approach to replace openshift#628.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
@dhellmann
Copy link
Contributor Author

Closing this in favor of #703

@dhellmann dhellmann closed this Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants