-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We need the pods that are executing a taskRun to be in qosClass Guaranteed #4046
Comments
Thank you @dfuessl for the detailed bug report. The current behaviour in Tekton is indeed related to the fact that step are executed sequentially, and we do not want to request to the k8s scheduler the sum of the resources needed by all steps (and init containers). See the docs for reference too. We attempted to use Would it be possible to expand a bit more on your use case? Some ideas:
/cc @bobcatfish @imjasonh I'm going to convert this into a feature request, as the current behaviour is as designed - even though this design does not currently meet your needs 🙏 |
Thank you very much, @afrittoli, for your answer. I have read through #224 and I understand now that it is not a good idea to user initContainers for all the steps in a task. Your use case: Some pushes to (and merges on) the git repository, that contains the code base of our java application, will trigger a pipelineRun. We clone the git repo, build the system, do statical code analysis, execute several test suites (using web application servers, ephemeral databases and mock web services) and do other quality checks. We push test results to special git repositories. When everything is ok, we push the produced java artifacts to an artifact repository. Then we deploy the application to a server and do some additional tests. Finally an e-mail will be sent to the developers. Unfortunately, there are some tasks which are not idempotent. I think, re-executing a taskRun that has been evicted due to high workload is not the best idea, because this would make the workload even higher. Also some tasks are not really idempotent. On the other hand, our Java developers would like to rely on our pipelines. Especially the merges in the git repository are not repeatable for them, so we have a problem. Spreading resources across steps and init containers? I am not quite sure, if I understand this really well, but I think the limits apply to each step inidvidually. So we would have the resulting limit of the pod equal to the sum of the limits. And as we need the requests to be equal to the limit, also the request for the pod would be equal to the sum of the request. Hence we would be back to pre #598 . The key point in my opinion is that k8s expects all containers to execute in parallel, so it needs the sum of the resources of the containers (besides initContainers) whereas tekton executes all containers (except sidecars) sequentially. In our pipelines, currently, it would not be a big problem to go back to pre #598 (concerning the memory usage). We usually have big sidecars and one big step which consume lots of memory. The other steps are auxiliary and they need not much resources. But in general it will, of course, be a real problem. Maybe we must redesign our pipelines so that each task has only one step. But this will produce a lot of ugly boilerplate code. We would have to rethink the modularity and the design of our container images. And then we will still have the problem with te initSteps (place-tools and place-scripts). Is there away (e. g. via podTemplates) to give them resource requests and limits? |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Expected Behavior
It should be possible to get the pods executing a taskRun in Quality of Service Class "Guaranteed".
Actual Behavior
Currently we see no way to get the Quality of Service Class pods higher than "Burstable"
Steps to Reproduce the Problem
Additional Info
In k8s, when we make the resource limits equal to the requests, we get a pod with qosClass Guarateed. We would like this to be the same for a tekton class. We run Continuous Integration/Continuous Development with Tekton and we do not want the TaskRun pods to be evicted when there is heavy load on the system. But apparently this is currently not possible.
One of the possible cause may be that the initContainers (place-tools and place-scripts) have no resources specifiation at all. We do not see a way to make them run with resource requests and resource limits.
Another problem may be that all step containers but the largest run with zero resource requests. We understand that this may be for solving the issue #598. But this means that there is currently no way to make requests and limits equal for every container in the pod. Hence we cannot get the qosClass Guaranteed.
As all steps in the TaskRun will be executed sequentially, will it happen to be possible to run them all as initContainers? Currently only place-tools and place-scripts run as initContainers. This is only an idea. I cannot estimate what would be the implication of a chaneg like this.
Kubernetes version:
Output of
kubectl version
:Tekton Pipeline version:
The text was updated successfully, but these errors were encountered: