All containers in a TaskRun's Pod are combined to calculate required resources #598

abayer · 2019-03-08T13:11:12Z

Expected Behavior

We're only running one container per TaskRun pod at a time, so the required resources for that pod should be the max values needed amongst all the containers we're running. While we were using init containers, that's how things worked.

Actual Behavior

With containers instead of init containers, the pod's required resources are the sum of the required resources for each container. See https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources for more info on how pod resource requirements are calculated. The result is that a TaskRun of 10 steps results in a pod that thinks it needs the resources to be able to run all 10 step containers concurrently. So the required resources jumps a ton from what it had been with init containers.

Steps to Reproduce the Problem

Run a TaskRun with 10 steps, on a cluster with sufficient resources available for a pod with <10 concurrent containers with the resource requirements of one of those steps.

Additional Info

The text was updated successfully, but these errors were encountered:

abayer · 2019-03-08T13:23:05Z

fyi @pivotal-nader-ziada @imjasonh @bobcatfish

bbrowning · 2019-03-08T13:25:04Z

Given this (same for both Memory and CPU requests/limits) from https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#specify-a-memory-request-that-is-too-big-for-your-nodes

Memory requests and limits are associated with Containers, but it
is useful to think of a Pod as having a memory request and
limit. The memory request for the Pod is the sum of the memory
requests for all the Containers in the Pod. Likewise, the memory
limit for the Pod is the sum of the limits of all the Containers
in the Pod.

Pod scheduling is based on requests. A Pod is scheduled to run on
a Node only if the Node has enough available memory to satisfy
the Pod’s memory request.

I believe we could choose the largest of the memory/cpu requests, stuff that on a single Container, and leave the requests unset on all the other Containers in the Pod. Containers could keep their individual memory/CPU limits as scheduling is not based on limits.

An example:

Container A requests 512MB, Container B 2048MB, Container C 1024MB.

Today:
Pod gets scheduled with a request of 3584MB, as the requests of each Container get summed up.

Proposal:
Pod gets scheduled with a request of 2048MB, as the requests for all containers are set to 0 except the largest (or first? does it matter?) container, Container B.

abayer · 2019-03-08T13:37:28Z

Aaaaaah, that sounds reasonable - if I understand correctly, does that mean we'd effectively use the full resources (assuming no limits specified on steps) of the whole pod for each step?

bbrowning · 2019-03-08T13:51:03Z

To answer my own question above about leaving the resource request on the largest or first pod - you leave it on the largest pod. That's the only pod guaranteed to have a limit greater than or equal to the largest request.

bbrowning · 2019-03-08T13:53:58Z

@abayer If you don't specify limits then yes, you can use as much CPU/memory as the Node has available. But I believe that behavior is unchanged from InitContainers or the current Containers implementation.

You can still set CPU/memory limits per Container and those limits will get enforced. The requests value is just to help the scheduler decide which Node your Pod gets placed on.

abayer · 2019-03-08T13:56:58Z

@bbrowning Gotcha - that's actually a nice result anyway. =)

abayer · 2019-03-08T14:39:29Z

@dwnusbaum has a memory of this coming up at the working group meeting on Wednesday, but I wasn't paying enough attention to remember myself. =) But regardless, yeah, I endorse this approach.

nader-ziada · 2019-03-08T14:40:07Z

@bbrowning I thought the containers without any limits set would get the max or default limit, would setting a high limit on one container override that for all containers? or do set this limit on the pod?

If you do not specify a memory limit for a Container, one of the following situations applies:
The Container has no upper bound on the amount of memory it uses. The Container could use all of the memory available on the Node where it is running.
The Container is running in a namespace that has a default memory limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the memory limit.

there is anything in the api currently to set limits on a taskrun so @abayer doesn't that mean it's not set and you are still getting the problem? or are you setting them somehow?

abayer · 2019-03-08T14:41:53Z

I think we're seeing it with no explicit limits set - @jstrachan, can you verify?

bbrowning · 2019-03-08T15:03:38Z

@pivotal-nader-ziada The scheduling problem stems from a Container's requests - not its limits. And Task Steps are just Containers -

pipeline/pkg/apis/pipeline/v1alpha1/task_types.go

Line 49 in bbe504a

Steps []corev1.Container `json:"steps,omitempty"`

Containers can set memory and/or CPU requests and/or limits. Unless you're explicitly forbidding the setting of those somewhere in a webhook or reconcile loop that I missed.

Limits are enforced per Container as far as I know. Requests, on the other hand, are summed together for all Containers in a Pod and used by the scheduler when choosing node placement.

dwnusbaum · 2019-03-08T15:13:42Z

has a memory of this coming up at the working group meeting on Wednesday

Yeah, as I recall the discussion was that this was a known issue that didn't make it into the PR to switch from init container to containers, but that we definitely want to do something to fix it.

Relevant discussion starts around 9:20 in the 3/6/2019 meeting recording.

nader-ziada · 2019-03-08T15:38:34Z

Containers can set memory and/or CPU requests and/or limits. Unless you're explicitly forbidding the setting of those somewhere in a webhook or reconcile loop that I missed.

@bbrowning you are correct, we don't forbid that anywhere

bobcatfish · 2019-03-13T22:06:54Z

@pivotal-nader-ziada The scheduling problem stems from a Container's requests - not its limits. And Task Steps are just Containers -

@bbrowning @abayer trying to catch up here - as far as I can tell (and I think you're saying this above), the taskrun controller isnt setting any of these values, this would only come into play if a user set the request and/or limit values themselves in the Container spec for the step, is that right?

In steps to repro right now we've got:

Run a TaskRun with 10 steps, on a cluster with sufficient resources available for a pod with <10 concurrent containers with the resource requirements of one of those steps.

but you'd have to actually set the request values for the step in order to for this to be a problem, is that right?

If I'm understanding correctly it seems like we'd have a few options:

@bbrowning 's suggestion, which I think is to take the values the user specified for request and actually change them?
Warn the user but not change anything
Document this behavior but not actually change any code (probably not a great solution 😅 )

(Not sure I'm understanding the problem entirely tho - maybe we can see some example container specs that would have this problem?)

abayer · 2019-03-20T17:34:55Z

I don't think you do need to manually set the request values - @jstrachan hit this in Jenkins X, and we don't explicitly set request values anywhere when generating our Tasks.

Anyway, lemme create a concrete reproduction.

abayer · 2019-03-20T17:56:14Z

got it - https://gist.github.com/abayer/41cc9db17600dfe845f06002fb0492ed - when you get somewhere between 30 and 40 steps in a single Task with no explicit request, with a cluster of 3 n1-standard-4s, autoscaling up to 10, you get

message: 'pod status "PodScheduled":"False"; message: "0/3 nodes are available:
        3 Insufficient cpu."'

abayer · 2019-03-20T18:08:36Z

And fwiw, https://gist.github.com/abayer/1fe81f3813d777167590b2523a3fff6e is the same Task but with an explicit request on the first step, and zeroed-out requests on the others. This one actually works.

abayer · 2019-03-20T18:11:00Z

And it also works with no explicit request on the first step and the rest zeroed out.

bobcatfish · 2019-03-20T18:43:40Z

Awesome, thanks for adding the repro case @abayer !

when you get somewhere between 30 and 40 steps in a single Task with no explicit request

Quick question, are you creating Tasks that have 30+ steps in them?

abayer · 2019-03-20T19:24:07Z

Apparently? =) More likely is that the cases where this showed up were on a cluster running other pipelines etc, so that the nodes weren't as empty as they were in my test case.

bbrowning · 2019-03-20T23:38:05Z

Whether you have to explicitly define requests in your steps or not depends on how your underlying Kubernetes cluster is configured. Some clusters are configured to provide default request values for all containers if one isn't otherwise specified. See https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/

In the cluster @abayer is using it sounds like perhaps there is a LimitRange providing default request values here.

abayer · 2019-03-21T00:12:25Z

It was GKE, fwiw

dwnusbaum · 2019-03-27T21:29:23Z

I'd like to pick this up. @bbrowning's proposal to unset all of the requests except for the largest one seems good to me. A few things that I'm not sure about are:

What should we do if there are no requests set at all? I guess we check if there is a LimitRange in the current namespace and use its defaultRequest setting?
What happens if a container specifies a cpu/memory limit but not a request? This page says that for CPU, the request will be automatically set to the limit. My understanding is that all containers in a Pod for a Task start simultaneously (please correct me if that is wrong!), but their entrypoints wait and cause the containers to be executed in the order specified in the Task. If that is the case, then the requests could easily go over the limit, but I'm not sure what would actually happen (i.e. if everyone requested the maximum amount of memory up front, will the node run out of memory, or will it be ok as long as the containers are just waiting in the entrypoint?). Here are the most relevant docs I could find:

If a Container exceeds its memory request, it is likely that its Pod will be evicted whenever the node runs out of memory.
A Container might or might not be allowed to exceed its CPU limit for extended periods of time. However, it will not be killed for excessive CPU usage.

/assign @dwnusbaum

dwnusbaum · 2019-04-03T18:23:05Z

Here's an example from some local testing I've done showing that we can't just unset requests other than the largest one. Say we have a cluster whose Nodes all have 5Gi of memory, and we want to run a Task with 3 steps with the following resource settings:

steps:
  - name: step1
    ...
    resources:
      limits:
        memory: "3Gi"
      requests:
        memory: "1Gi"
  - name: step2
    ...
    resources:
      limits:
        memory: "3Gi"
      requests:
        memory: "1Gi"
  - name: step3
    ...
    resources:
      limits:
        memory: "3Gi"
      requests:
        memory: "2Gi"

The resulting Pod requires a Node with 4Gi of memory, and each container has a 3Gi limit. If we delete the requests on step1 and step2 so that we only request 2Gi initially, then the limits for step1 and step2 get used as their requests and we end up requesting 8Gi of memory instead, oops!

However, based on my testing it looks like we can set the requested memory to 0 (or maybe better to use some small value like 10-50m for the entrypoint command?) for the containers that aren't requesting the max value. From what I can tell each container is able to use memory as needed up to its limit without subsequent containers being affected.

bbrowning · 2019-04-03T20:50:05Z

@dwnusbaum Great catch, and you're right we can't just clear most of the requests but instead need to explicitly request something very small or perhaps 0. I don't actually know if 0 has any special meaning compared to a very small request value.

Before this change, if memory requests were set in a Task's Steps, (which are Containers) the generated Pod would require the sum of all of the Steps requests. However, because Tekton overwrites Container entrypoints in Tasks to effectively make the Containers execute one at a time, we want to make Pods generated by the TaskRun only request the memory of the Container with the largest memory request rather than the sum of all requests. To make this happen, when generating a Pod for a Task, we find the Step with largest memory request among all Steps, and set the memory request for all other steps to 0. If no Step has an explicit memory request, all requests are set to 0. If we unset memory requests instead of setting them to 0 explicitly, then the memory limit would be used as the request, which would defeat the purpose of unsetting the request (and could end up making the Pod request more memory than it did in the first place). Fixes tektoncd#598

Before this change, if memory or CPU requests were set in a Task's Steps, (which are Containers) the generated Pod would require the sum of all of the Steps requests to be scheduled on a Node. However, because Tekton overwrites Container entrypoints in Tasks to effectively make the Containers execute one at a time, we want to make Pods generated by the TaskRun only request the maximum resources that will be necessary for any single Container rather than the sum of all resource requests. To make this happen, when generating a Pod for a Task, we find the Step with largest cpu/memory requests among all Steps, and set the cpu/memory requests for all other steps to 0, respectively. If no Step has an explicit cpu/memory request, all requests are set to 0. If we unset resource requests instead of setting them to 0 explicitly, then the limits would be used for the requests, which would defeat the purpose of unsetting the requested values (and could end up making the Pod request more memory than it did in the first place). Fixes tektoncd#598

Before this change, if CPU, memory, or ephemeral storage resource requests were set in a Task's steps (which are Containers), the generated Pod would require the sum of all of the steps' requests to be scheduled on a Node. However, because Tekton overwrites Container entrypoints in Tasks to make the Containers logically execute one at a time, we want to make Pods generated by the TaskRun only request the maximum resources that will be necessary for any single Container rather than the sum of all resource requests. To make this happen, when generating a Pod for a Task, we find the Step with largest resource requests among all Steps, and set the resource requests for all other steps to 0 for the respective resource. If no Step has an explicit resource request, all requests are set to 0. If we unset resource requests instead of setting them to 0 explicitly, then the limits would be used for the requests, which would defeat the purpose of unsetting the requested values (and could end up making the Pod request more memory than it did in the first place). Fixes tektoncd#598

Before this change, if CPU, memory, or ephemeral storage resource requests were set in a Task's steps (which are Containers), the generated Pod would require the sum of all of the steps' requests to be scheduled on a Node. However, because Tekton overwrites Container entrypoints in Tasks to make the Containers logically execute one at a time, we want to make Pods generated by the TaskRun only request the maximum resources that will be necessary for any single Container rather than the sum of all resource requests. To make this happen, when generating a Pod for a Task, we find the Step with largest resource requests among all Steps, and set the resource requests for all other steps to 0 for the respective resource. If no Step has an explicit resource request, all requests are set to 0. If we unset resource requests instead of setting them to 0 explicitly, then the limits would be used for the requests, which would defeat the purpose of unsetting the requested values (and could end up making the Pod request more memory than it did in the first place). Fixes #598

jorgemoralespou · 2019-07-10T18:28:26Z

As reported here: #1045 if there's a LimitRange setting the container requested resources to 0 makes the deployment not work.

This commit includes two fixes: 1. Replaces `CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE` with an explicit call to `gcloud auth activate-service-account`. This is the same fix as tektoncd#2847. 2. Pass images with tags to ko_parse.py to validate the images. This bug was introduced in tektoncd#598 Signed-off-by: Dibyo Mukherjee <dibyo@google.com>

bobcatfish added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. design This task is about creating and discussing a design labels Mar 20, 2019

tekton-robot assigned dwnusbaum Mar 27, 2019

dwnusbaum mentioned this issue Apr 4, 2019

Make generated pods only request the maximum necessary resources #723

Merged

3 tasks

tekton-robot closed this as completed in #723 Apr 12, 2019

dwnusbaum mentioned this issue Jul 10, 2019

Leave Resource Request Empty Instead of Requesting 0 #1045

Closed

This was referenced Jul 17, 2020

As an operator I would like to be able to define init-container resources #2931

Closed

taskrun pod of resource.request settings is not good enough for customization #2984

Closed

danielhelfand mentioned this issue Jul 28, 2020

Improve UX of Step Resource Requests #2986

Closed

dfuessl mentioned this issue Jun 16, 2021

We need the pods that are executing a taskRun to be in qosClass Guaranteed #4046

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All containers in a TaskRun's Pod are combined to calculate required resources #598

All containers in a TaskRun's Pod are combined to calculate required resources #598

abayer commented Mar 8, 2019

abayer commented Mar 8, 2019

bbrowning commented Mar 8, 2019 •

edited

Loading

abayer commented Mar 8, 2019

bbrowning commented Mar 8, 2019

bbrowning commented Mar 8, 2019 •

edited

Loading

abayer commented Mar 8, 2019

abayer commented Mar 8, 2019

nader-ziada commented Mar 8, 2019

abayer commented Mar 8, 2019

bbrowning commented Mar 8, 2019

dwnusbaum commented Mar 8, 2019

nader-ziada commented Mar 8, 2019

bobcatfish commented Mar 13, 2019

abayer commented Mar 20, 2019

abayer commented Mar 20, 2019

abayer commented Mar 20, 2019

abayer commented Mar 20, 2019

bobcatfish commented Mar 20, 2019

abayer commented Mar 20, 2019

bbrowning commented Mar 20, 2019

abayer commented Mar 21, 2019

dwnusbaum commented Mar 27, 2019

dwnusbaum commented Apr 3, 2019

bbrowning commented Apr 3, 2019

jorgemoralespou commented Jul 10, 2019

All containers in a TaskRun's Pod are combined to calculate required resources #598

All containers in a TaskRun's Pod are combined to calculate required resources #598

Comments

abayer commented Mar 8, 2019

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

abayer commented Mar 8, 2019

bbrowning commented Mar 8, 2019 • edited Loading

abayer commented Mar 8, 2019

bbrowning commented Mar 8, 2019

bbrowning commented Mar 8, 2019 • edited Loading

abayer commented Mar 8, 2019

abayer commented Mar 8, 2019

nader-ziada commented Mar 8, 2019

abayer commented Mar 8, 2019

bbrowning commented Mar 8, 2019

dwnusbaum commented Mar 8, 2019

nader-ziada commented Mar 8, 2019

bobcatfish commented Mar 13, 2019

abayer commented Mar 20, 2019

abayer commented Mar 20, 2019

abayer commented Mar 20, 2019

abayer commented Mar 20, 2019

bobcatfish commented Mar 20, 2019

abayer commented Mar 20, 2019

bbrowning commented Mar 20, 2019

abayer commented Mar 21, 2019

dwnusbaum commented Mar 27, 2019

dwnusbaum commented Apr 3, 2019

bbrowning commented Apr 3, 2019

jorgemoralespou commented Jul 10, 2019

bbrowning commented Mar 8, 2019 •

edited

Loading

bbrowning commented Mar 8, 2019 •

edited

Loading