-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate cancel and timeout logic #2365
Consolidate cancel and timeout logic #2365
Conversation
The following is the coverage report on pkg/.
|
882452e
to
9e224cc
Compare
The following is the coverage report on pkg/.
|
9e224cc
to
09eed05
Compare
The following is the coverage report on pkg/.
|
@@ -168,74 +191,15 @@ func (c *Reconciler) Reconcile(ctx context.Context, key string) error { | |||
return multierror.Append(merr, c.updateStatusLabelsAndAnnotations(tr, original)).ErrorOrNil() | |||
} | |||
|
|||
func (c *Reconciler) updateStatusLabelsAndAnnotations(tr, original *v1alpha1.TaskRun) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just moved down, no changes
@@ -465,67 +498,38 @@ func (c *Reconciler) handlePodCreationError(tr *v1alpha1.TaskRun, err error) { | |||
c.Logger.Errorf("Failed to create build pod for task %q: %v", tr.Name, err) | |||
} | |||
|
|||
func updateTaskRunResourceResult(taskRun *v1alpha1.TaskRun, podStatus corev1.PodStatus) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just moved down - no changes.
I consolidated all func (c *Reconciler) together.
09eed05
to
44cd91f
Compare
The following is the coverage report on pkg/.
|
44cd91f
to
2eb7777
Compare
The following is the coverage report on pkg/.
|
pkg/reconciler/taskrun/taskrun.go
Outdated
if tr.IsCancelled() { | ||
before := tr.Status.GetCondition(apis.ConditionSucceeded) | ||
message := fmt.Sprintf("TaskRun %q was cancelled", tr.Name) | ||
err := c.failTaskRun(tr, "TaskRunCancelled", message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggest using the constant TaskRunSpecStatusCancelled
here instead of literal string "TaskRunCancelled"
. Or putting it into a named constant if you want to keep task.spec.status
conceptually separate from task.status
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll fix that, thank you
I'm hitting two data races in the unit tests - not tests that I've touched - but perhaps the new logic exposes or creates a new issue. |
2eb7777
to
4ea9f58
Compare
The following is the coverage report on pkg/.
|
4ea9f58
to
cb42966
Compare
The following is the coverage report on pkg/.
|
cb42966
to
86b13b6
Compare
The following is the coverage report on pkg/.
|
@@ -120,6 +121,17 @@ func (trs *TaskRunStatus) MarkResourceNotConvertible(err *CannotConvertError) { | |||
}) | |||
} | |||
|
|||
// MarkResourceFailed sets the ConditionSucceeded condition to ConditionFalse | |||
// based on an error that occurred and a reason | |||
func (trs *TaskRunStatus) MarkResourceFailed(reason string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should be in alpha too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TaskRunStatus defines the observed state of TaskRun
type TaskRunStatus = v1beta1.TaskRunStatus
It doesn't as it's just an alias in v1alpha1
😉
Cancel and timeout do very similar things when they happen: they update the status of the taskrun, set the completion time and try and delete the pod. Today this is done for the two cases in different places, the code structured differently and the behaviour slightly different: - log levels of the messages are different - cancel does not set the completion time - cancel does not check if the error on pod deletion is a NotFound This commit introduces "HasTimedOut" to tasktun_types, which matches what "IsCancelled" does. It introduces a "killTaskRun" function that can be used by both cancel and timeout, with the only different being the "Reason" and termination message. The timeout_check module is not necessary anymore. The check for IsCancelled and HasTimedOut are move out of "reconcile" into "Reconcile", so that now "Reconcile" checks: - HasStarted - isDone - IsCancelled - HasTimedOut and finally, if applicable, it invokes "reconcile".
86b13b6
to
279957f
Compare
The following is the coverage report on pkg/.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sbwsg, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-tekton-pipeline-integration-tests |
The failure in the integration tests is on pipeline-task retry. |
/test pull-tekton-pipeline-integration-tests |
return runtime > timeout | ||
} | ||
|
||
func (tr *TaskRun) GetTimeout() time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @afrittoli are you sure we need this function? looking at the taskrun defaulting logic it looks like we default the value of the timeout:
also we don't have the same function for pipelineruns, and finally finally apisconfig.DefaultTimeoutMinutes
is a default value afaik; but the user can actually override it with a config map so we probably want to use that instead (there is some similar logic in the timeout logic that i think is being overly cautious - its always hard to know when you can rely on the defaulting to be called)
// tr.Status.PodName will be empty if the pod was never successfully created. This condition | ||
// can be reached, for example, by the pod never being schedulable due to limits imposed by | ||
// a namespace's ResourceQuota. | ||
err := c.KubeClientSet.CoreV1().Pods(tr.Namespace).Delete(tr.Status.PodName, &metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if some kind of if statement should have been placed here to guard against scenarios when a TaskRun reason is v1beta1.TaskRunReasonTimedOut
or v1beta1.TaskRunCancelled
. As of now, this is deleting pods for TaskRuns that have timed out or been cancelled, which amongst other things, deletes the TaskRun logs.
Cancel and timeout do very similar things when they happen: they
update the status of the taskrun, set the completion time and try
and delete the pod.
Today this is done for the two cases in different places, the code
structured differently and the behaviour slightly different:
This commit introduces "HasTimedOut" to tasktun_types, which
matches what "IsCancelled" does. It introduces a "killTaskRun"
function that can be used by both cancel and timeout, with the
only different being the "Reason" and termination message.
The timeout_check module is not necessary anymore.
The check for IsCancelled and HasTimedOut are move out of
"reconcile" into "Reconcile", so that now "Reconcile" checks:
and finally, if applicable, it invokes "reconcile".
Changes
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task to build and release this image.
Reviewer Notes
If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.
Release Notes