Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All containers in a TaskRun's Pod are combined to calculate required resources #598

Closed
abayer opened this issue Mar 8, 2019 · 25 comments · Fixed by #723
Closed

All containers in a TaskRun's Pod are combined to calculate required resources #598

abayer opened this issue Mar 8, 2019 · 25 comments · Fixed by #723
Assignees
Labels
design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.

Comments

@abayer
Copy link
Contributor

abayer commented Mar 8, 2019

Expected Behavior

We're only running one container per TaskRun pod at a time, so the required resources for that pod should be the max values needed amongst all the containers we're running. While we were using init containers, that's how things worked.

Actual Behavior

With containers instead of init containers, the pod's required resources are the sum of the required resources for each container. See https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources for more info on how pod resource requirements are calculated. The result is that a TaskRun of 10 steps results in a pod that thinks it needs the resources to be able to run all 10 step containers concurrently. So the required resources jumps a ton from what it had been with init containers.

Steps to Reproduce the Problem

  1. Run a TaskRun with 10 steps, on a cluster with sufficient resources available for a pod with <10 concurrent containers with the resource requirements of one of those steps.

Additional Info

@abayer
Copy link
Contributor Author

abayer commented Mar 8, 2019

fyi @pivotal-nader-ziada @imjasonh @bobcatfish

@bbrowning
Copy link

bbrowning commented Mar 8, 2019

Given this (same for both Memory and CPU requests/limits) from https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#specify-a-memory-request-that-is-too-big-for-your-nodes

Memory requests and limits are associated with Containers, but it
is useful to think of a Pod as having a memory request and
limit. The memory request for the Pod is the sum of the memory
requests for all the Containers in the Pod. Likewise, the memory
limit for the Pod is the sum of the limits of all the Containers
in the Pod.

Pod scheduling is based on requests. A Pod is scheduled to run on
a Node only if the Node has enough available memory to satisfy
the Pod’s memory request.

I believe we could choose the largest of the memory/cpu requests, stuff that on a single Container, and leave the requests unset on all the other Containers in the Pod. Containers could keep their individual memory/CPU limits as scheduling is not based on limits.

An example:

Container A requests 512MB, Container B 2048MB, Container C 1024MB.

Today:
Pod gets scheduled with a request of 3584MB, as the requests of each Container get summed up.

Proposal:
Pod gets scheduled with a request of 2048MB, as the requests for all containers are set to 0 except the largest (or first? does it matter?) container, Container B.

@abayer
Copy link
Contributor Author

abayer commented Mar 8, 2019

Aaaaaah, that sounds reasonable - if I understand correctly, does that mean we'd effectively use the full resources (assuming no limits specified on steps) of the whole pod for each step?

@bbrowning
Copy link

To answer my own question above about leaving the resource request on the largest or first pod - you leave it on the largest pod. That's the only pod guaranteed to have a limit greater than or equal to the largest request.

@bbrowning
Copy link

bbrowning commented Mar 8, 2019

@abayer If you don't specify limits then yes, you can use as much CPU/memory as the Node has available. But I believe that behavior is unchanged from InitContainers or the current Containers implementation.

You can still set CPU/memory limits per Container and those limits will get enforced. The requests value is just to help the scheduler decide which Node your Pod gets placed on.

@abayer
Copy link
Contributor Author

abayer commented Mar 8, 2019

@bbrowning Gotcha - that's actually a nice result anyway. =)

@abayer
Copy link
Contributor Author

abayer commented Mar 8, 2019

@dwnusbaum has a memory of this coming up at the working group meeting on Wednesday, but I wasn't paying enough attention to remember myself. =) But regardless, yeah, I endorse this approach.

@nader-ziada
Copy link
Member

@bbrowning I thought the containers without any limits set would get the max or default limit, would setting a high limit on one container override that for all containers? or do set this limit on the pod?

If you do not specify a memory limit for a Container, one of the following situations applies:
The Container has no upper bound on the amount of memory it uses. The Container could use all of the memory available on the Node where it is running.
The Container is running in a namespace that has a default memory limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the memory limit.

there is anything in the api currently to set limits on a taskrun so @abayer doesn't that mean it's not set and you are still getting the problem? or are you setting them somehow?

@abayer
Copy link
Contributor Author

abayer commented Mar 8, 2019

I think we're seeing it with no explicit limits set - @jstrachan, can you verify?

@bbrowning
Copy link

@pivotal-nader-ziada The scheduling problem stems from a Container's requests - not its limits. And Task Steps are just Containers -

Steps []corev1.Container `json:"steps,omitempty"`

Containers can set memory and/or CPU requests and/or limits. Unless you're explicitly forbidding the setting of those somewhere in a webhook or reconcile loop that I missed.

Limits are enforced per Container as far as I know. Requests, on the other hand, are summed together for all Containers in a Pod and used by the scheduler when choosing node placement.

@dwnusbaum
Copy link
Contributor

has a memory of this coming up at the working group meeting on Wednesday

Yeah, as I recall the discussion was that this was a known issue that didn't make it into the PR to switch from init container to containers, but that we definitely want to do something to fix it.

Relevant discussion starts around 9:20 in the 3/6/2019 meeting recording.

@nader-ziada
Copy link
Member

Containers can set memory and/or CPU requests and/or limits. Unless you're explicitly forbidding the setting of those somewhere in a webhook or reconcile loop that I missed.

@bbrowning you are correct, we don't forbid that anywhere

@bobcatfish
Copy link
Collaborator

@pivotal-nader-ziada The scheduling problem stems from a Container's requests - not its limits. And Task Steps are just Containers -

@bbrowning @abayer trying to catch up here - as far as I can tell (and I think you're saying this above), the taskrun controller isnt setting any of these values, this would only come into play if a user set the request and/or limit values themselves in the Container spec for the step, is that right?

In steps to repro right now we've got:

Run a TaskRun with 10 steps, on a cluster with sufficient resources available for a pod with <10 concurrent containers with the resource requirements of one of those steps.

but you'd have to actually set the request values for the step in order to for this to be a problem, is that right?

If I'm understanding correctly it seems like we'd have a few options:

  1. @bbrowning 's suggestion, which I think is to take the values the user specified for request and actually change them?
  2. Warn the user but not change anything
  3. Document this behavior but not actually change any code (probably not a great solution 😅 )

(Not sure I'm understanding the problem entirely tho - maybe we can see some example container specs that would have this problem?)

@abayer
Copy link
Contributor Author

abayer commented Mar 20, 2019

I don't think you do need to manually set the request values - @jstrachan hit this in Jenkins X, and we don't explicitly set request values anywhere when generating our Tasks.

Anyway, lemme create a concrete reproduction.

@abayer
Copy link
Contributor Author

abayer commented Mar 20, 2019

got it - https://gist.github.com/abayer/41cc9db17600dfe845f06002fb0492ed - when you get somewhere between 30 and 40 steps in a single Task with no explicit request, with a cluster of 3 n1-standard-4s, autoscaling up to 10, you get

message: 'pod status "PodScheduled":"False"; message: "0/3 nodes are available:
        3 Insufficient cpu."'

@abayer
Copy link
Contributor Author

abayer commented Mar 20, 2019

And fwiw, https://gist.github.com/abayer/1fe81f3813d777167590b2523a3fff6e is the same Task but with an explicit request on the first step, and zeroed-out requests on the others. This one actually works.

@abayer
Copy link
Contributor Author

abayer commented Mar 20, 2019

And it also works with no explicit request on the first step and the rest zeroed out.

@bobcatfish
Copy link
Collaborator

Awesome, thanks for adding the repro case @abayer !

when you get somewhere between 30 and 40 steps in a single Task with no explicit request

Quick question, are you creating Tasks that have 30+ steps in them?

@bobcatfish bobcatfish added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. design This task is about creating and discussing a design labels Mar 20, 2019
@abayer
Copy link
Contributor Author

abayer commented Mar 20, 2019

Apparently? =) More likely is that the cases where this showed up were on a cluster running other pipelines etc, so that the nodes weren't as empty as they were in my test case.

@bbrowning
Copy link

Whether you have to explicitly define requests in your steps or not depends on how your underlying Kubernetes cluster is configured. Some clusters are configured to provide default request values for all containers if one isn't otherwise specified. See https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/

In the cluster @abayer is using it sounds like perhaps there is a LimitRange providing default request values here.

@abayer
Copy link
Contributor Author

abayer commented Mar 21, 2019

It was GKE, fwiw

@dwnusbaum
Copy link
Contributor

I'd like to pick this up. @bbrowning's proposal to unset all of the requests except for the largest one seems good to me. A few things that I'm not sure about are:

  • What should we do if there are no requests set at all? I guess we check if there is a LimitRange in the current namespace and use its defaultRequest setting?

  • What happens if a container specifies a cpu/memory limit but not a request? This page says that for CPU, the request will be automatically set to the limit. My understanding is that all containers in a Pod for a Task start simultaneously (please correct me if that is wrong!), but their entrypoints wait and cause the containers to be executed in the order specified in the Task. If that is the case, then the requests could easily go over the limit, but I'm not sure what would actually happen (i.e. if everyone requested the maximum amount of memory up front, will the node run out of memory, or will it be ok as long as the containers are just waiting in the entrypoint?). Here are the most relevant docs I could find:

    If a Container exceeds its memory request, it is likely that its Pod will be evicted whenever the node runs out of memory.
    A Container might or might not be allowed to exceed its CPU limit for extended periods of time. However, it will not be killed for excessive CPU usage.

/assign @dwnusbaum

@dwnusbaum
Copy link
Contributor

Here's an example from some local testing I've done showing that we can't just unset requests other than the largest one. Say we have a cluster whose Nodes all have 5Gi of memory, and we want to run a Task with 3 steps with the following resource settings:

steps:
  - name: step1
    ...
    resources:
      limits:
        memory: "3Gi"
      requests:
        memory: "1Gi"
  - name: step2
    ...
    resources:
      limits:
        memory: "3Gi"
      requests:
        memory: "1Gi"
  - name: step3
    ...
    resources:
      limits:
        memory: "3Gi"
      requests:
        memory: "2Gi"

The resulting Pod requires a Node with 4Gi of memory, and each container has a 3Gi limit. If we delete the requests on step1 and step2 so that we only request 2Gi initially, then the limits for step1 and step2 get used as their requests and we end up requesting 8Gi of memory instead, oops!

However, based on my testing it looks like we can set the requested memory to 0 (or maybe better to use some small value like 10-50m for the entrypoint command?) for the containers that aren't requesting the max value. From what I can tell each container is able to use memory as needed up to its limit without subsequent containers being affected.

@bbrowning
Copy link

@dwnusbaum Great catch, and you're right we can't just clear most of the requests but instead need to explicitly request something very small or perhaps 0. I don't actually know if 0 has any special meaning compared to a very small request value.

dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 3, 2019
Before this change, if memory requests were set in a Task's Steps,
(which are Containers) the generated Pod would require the sum of all
of the Steps requests. However, because Tekton overwrites Container
entrypoints in Tasks to effectively make the Containers execute one
at a time, we want to make Pods generated by the TaskRun only request
the memory of the Container with the largest memory request rather
than the sum of all requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest memory request among all Steps, and set the memory
request for all other steps to 0. If no Step has an explicit memory
request, all requests are set to 0. If we unset memory requests
instead of setting them to 0 explicitly, then the memory limit would
be used as the request, which would defeat the purpose of unsetting
the request (and could end up making the Pod request more memory than
it did in the first place).

Fixes tektoncd#598
dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 4, 2019
Before this change, if memory or CPU requests were set in a Task's
Steps, (which are Containers) the generated Pod would require the sum
of all of the Steps requests to be scheduled on a Node. However,
because Tekton overwrites Container entrypoints in Tasks to
effectively make the Containers execute one at a time, we want to
make Pods generated by the TaskRun only request the maximum resources
that will be necessary for any single Container rather than the sum
of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest cpu/memory requests among all Steps, and set
the cpu/memory requests for all other steps to 0, respectively. If
no Step has an explicit cpu/memory request, all requests are set to
0. If we unset resource requests instead of setting them to 0
explicitly, then the limits would be used for the requests, which
would defeat the purpose of unsetting the requested values (and
could end up making the Pod request more memory than it did in the
first place).

Fixes tektoncd#598
dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 4, 2019
Before this change, if memory or CPU requests were set in a Task's
Steps, (which are Containers) the generated Pod would require the sum
of all of the Steps requests to be scheduled on a Node. However,
because Tekton overwrites Container entrypoints in Tasks to
effectively make the Containers execute one at a time, we want to
make Pods generated by the TaskRun only request the maximum resources
that will be necessary for any single Container rather than the sum
of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest cpu/memory requests among all Steps, and set
the cpu/memory requests for all other steps to 0, respectively. If
no Step has an explicit cpu/memory request, all requests are set to
0. If we unset resource requests instead of setting them to 0
explicitly, then the limits would be used for the requests, which
would defeat the purpose of unsetting the requested values (and
could end up making the Pod request more memory than it did in the
first place).

Fixes tektoncd#598
dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 4, 2019
Before this change, if memory or CPU requests were set in a Task's
Steps, (which are Containers) the generated Pod would require the sum
of all of the Steps requests to be scheduled on a Node. However,
because Tekton overwrites Container entrypoints in Tasks to
effectively make the Containers execute one at a time, we want to
make Pods generated by the TaskRun only request the maximum resources
that will be necessary for any single Container rather than the sum
of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest cpu/memory requests among all Steps, and set
the cpu/memory requests for all other steps to 0, respectively. If
no Step has an explicit cpu/memory request, all requests are set to
0. If we unset resource requests instead of setting them to 0
explicitly, then the limits would be used for the requests, which
would defeat the purpose of unsetting the requested values (and
could end up making the Pod request more memory than it did in the
first place).

Fixes tektoncd#598
dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 5, 2019
Before this change, if CPU, memory, or ephemeral storage resource
requests were set in a Task's steps (which are Containers), the
generated Pod would require the sum of all of the steps' requests
to be scheduled on a Node. However, because Tekton overwrites
Container entrypoints in Tasks to make the Containers logically
execute one at a time, we want to make Pods generated by the TaskRun
only request the maximum resources that will be necessary for any
single Container rather than the sum of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest resource requests among all Steps, and set the
resource requests for all other steps to 0 for the respective
resource. If no Step has an explicit resource request, all requests
are set to 0. If we unset resource requests instead of setting them
to 0 explicitly, then the limits would be used for the requests,
which would defeat the purpose of unsetting the requested values
(and could end up making the Pod request more memory than it did in
the first place).

Fixes tektoncd#598
dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 5, 2019
Before this change, if CPU, memory, or ephemeral storage resource
requests were set in a Task's steps (which are Containers), the
generated Pod would require the sum of all of the steps' requests
to be scheduled on a Node. However, because Tekton overwrites
Container entrypoints in Tasks to make the Containers logically
execute one at a time, we want to make Pods generated by the TaskRun
only request the maximum resources that will be necessary for any
single Container rather than the sum of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest resource requests among all Steps, and set the
resource requests for all other steps to 0 for the respective
resource. If no Step has an explicit resource request, all requests
are set to 0. If we unset resource requests instead of setting them
to 0 explicitly, then the limits would be used for the requests,
which would defeat the purpose of unsetting the requested values
(and could end up making the Pod request more memory than it did in
the first place).

Fixes tektoncd#598
dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 5, 2019
Before this change, if CPU, memory, or ephemeral storage resource
requests were set in a Task's steps (which are Containers), the
generated Pod would require the sum of all of the steps' requests
to be scheduled on a Node. However, because Tekton overwrites
Container entrypoints in Tasks to make the Containers logically
execute one at a time, we want to make Pods generated by the TaskRun
only request the maximum resources that will be necessary for any
single Container rather than the sum of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest resource requests among all Steps, and set the
resource requests for all other steps to 0 for the respective
resource. If no Step has an explicit resource request, all requests
are set to 0. If we unset resource requests instead of setting them
to 0 explicitly, then the limits would be used for the requests,
which would defeat the purpose of unsetting the requested values
(and could end up making the Pod request more memory than it did in
the first place).

Fixes tektoncd#598
dwnusbaum added a commit to dwnusbaum/pipeline that referenced this issue Apr 12, 2019
Before this change, if CPU, memory, or ephemeral storage resource
requests were set in a Task's steps (which are Containers), the
generated Pod would require the sum of all of the steps' requests
to be scheduled on a Node. However, because Tekton overwrites
Container entrypoints in Tasks to make the Containers logically
execute one at a time, we want to make Pods generated by the TaskRun
only request the maximum resources that will be necessary for any
single Container rather than the sum of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest resource requests among all Steps, and set the
resource requests for all other steps to 0 for the respective
resource. If no Step has an explicit resource request, all requests
are set to 0. If we unset resource requests instead of setting them
to 0 explicitly, then the limits would be used for the requests,
which would defeat the purpose of unsetting the requested values
(and could end up making the Pod request more memory than it did in
the first place).

Fixes tektoncd#598
tekton-robot pushed a commit that referenced this issue Apr 12, 2019
Before this change, if CPU, memory, or ephemeral storage resource
requests were set in a Task's steps (which are Containers), the
generated Pod would require the sum of all of the steps' requests
to be scheduled on a Node. However, because Tekton overwrites
Container entrypoints in Tasks to make the Containers logically
execute one at a time, we want to make Pods generated by the TaskRun
only request the maximum resources that will be necessary for any
single Container rather than the sum of all resource requests.

To make this happen, when generating a Pod for a Task, we find the
Step with largest resource requests among all Steps, and set the
resource requests for all other steps to 0 for the respective
resource. If no Step has an explicit resource request, all requests
are set to 0. If we unset resource requests instead of setting them
to 0 explicitly, then the limits would be used for the requests,
which would defeat the purpose of unsetting the requested values
(and could end up making the Pod request more memory than it did in
the first place).

Fixes #598
@jorgemoralespou
Copy link

As reported here: #1045 if there's a LimitRange setting the container requested resources to 0 makes the deployment not work.

pradeepitm12 pushed a commit to pradeepitm12/pipeline that referenced this issue Jan 28, 2021
This commit includes two fixes:

1. Replaces `CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE` with an explicit call to
`gcloud auth activate-service-account`. This is the same fix as
tektoncd#2847.

2. Pass images with tags to ko_parse.py to validate the images. This bug was
introduced in tektoncd#598

Signed-off-by: Dibyo Mukherjee <dibyo@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants