Skip to content

Commit

Permalink
Switch PipelineRun timeout -> TaskRun logic to instead signal the Tas…
Browse files Browse the repository at this point in the history
…kRuns to stop

Fixes #5127

As noted in #5127, the logic around calculating a timeout for a `PipelineRun`'s `TaskRun` to make sure that the `TaskRun`'s timeout is always going to end before the `PipelineRun`'s timeout ends is problematic. It can result in race conditions where a `TaskRun` gets timed out, immediately followed by the `PipelineRun` being reconciled while not yet having hit the end of its own timeout. This change gets rid of that behavior, and instead sets the `TaskRun.Spec.Status` to a new value, `TaskRunTimedOut`, with the `TaskRun` reconciler handling that in the same way it does setting `TaskRun.Spec.Status` to `TaskRunCancelled`.

By doing this, we can unify the behavior for both `TaskRun`s and `Run`s upon `PipelineRun`s timing out, given that we already cancel `Run`s upon `PipelineRun` timeout, and we can kill off a bunch of flaky outcomes for `PipelineRun`s.

Also, rather than setting a 1s timeout on tasks which are created after the `.timeouts.pipeline`, `.timeouts.tasks`, or `.timeouts.finally` timeouts have been passed, this change will skip creation of those `TaskRun`s or `Run`s completely. However, it will still report those "tasks" as failed, not skipped, since attempted creation of tasks after the appropriate timeout has been reached is a failure case, and that's the existing expected behavior for `PipelineTask`s which try to start after timeouts have elapsed.

Huge thanks to @jerop for suggesting this approach!

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
  • Loading branch information
abayer authored and tekton-robot committed Aug 19, 2022
1 parent 593d92e commit 58fd07c
Show file tree
Hide file tree
Showing 21 changed files with 1,678 additions and 648 deletions.
27 changes: 14 additions & 13 deletions docs/taskruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -582,19 +582,20 @@ steps:

The following tables shows how to read the overall status of a `TaskRun`:

`status`|`reason`|`completionTime` is set|Description
:-------|:-------|:---------------------:|--------------:
Unknown|Started|No|The TaskRun has just been picked up by the controller.
Unknown|Pending|No|The TaskRun is waiting on a Pod in status Pending.
Unknown|Running|No|The TaskRun has been validated and started to perform its work.
Unknown|TaskRunCancelled|No|The user requested the TaskRun to be cancelled. Cancellation has not been done yet.
True|Succeeded|Yes|The TaskRun completed successfully.
False|Failed|Yes|The TaskRun failed because one of the steps failed.
False|\[Error message\]|No|The TaskRun encountered a non-permanent error, and it's still running. It may ultimately succeed.
False|\[Error message\]|Yes|The TaskRun failed with a permanent error (usually validation).
False|TaskRunCancelled|Yes|The TaskRun was cancelled successfully.
False|TaskRunTimeout|Yes|The TaskRun timed out.
False|TaskRunImagePullFailed|Yes|The TaskRun failed due to one of its steps not being able to pull the image.
`status`|`reason`|`message`|`completionTime` is set|Description
:-------|:-------|:--|:---------------------:|--------------:
Unknown|Started|n/a|No|The TaskRun has just been picked up by the controller.
Unknown|Pending|n/a|No|The TaskRun is waiting on a Pod in status Pending.
Unknown|Running|n/a|No|The TaskRun has been validated and started to perform its work.
Unknown|TaskRunCancelled|n/a|No|The user requested the TaskRun to be cancelled. Cancellation has not been done yet.
True|Succeeded|n/a|Yes|The TaskRun completed successfully.
False|Failed|n/a|Yes|The TaskRun failed because one of the steps failed.
False|\[Error message\]|n/a|No|The TaskRun encountered a non-permanent error, and it's still running. It may ultimately succeed.
False|\[Error message\]|n/a|Yes|The TaskRun failed with a permanent error (usually validation).
False|TaskRunCancelled|n/a|Yes|The TaskRun was cancelled successfully.
False|TaskRunCancelled|TaskRun cancelled as the PipelineRun it belongs to has timed out.|Yes|The TaskRun was cancelled because the PipelineRun timed out.
False|TaskRunTimeout|n/a|Yes|The TaskRun timed out.
False|TaskRunImagePullFailed|n/a|Yes|The TaskRun failed due to one of its steps not being able to pull the image.

When a `TaskRun` changes status, [events](events.md#taskruns) are triggered accordingly.

Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/pipeline/v1alpha1/run_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ const (
// RunCancelledByPipelineMsg indicates that the PipelineRun of which part this Run was
// has been cancelled.
RunCancelledByPipelineMsg RunSpecStatusMessage = "Run cancelled as the PipelineRun it belongs to has been cancelled."
// RunCancelledByPipelineTimeoutMsg indicates that the Run was cancelled because the PipelineRun running it timed out.
RunCancelledByPipelineTimeoutMsg RunSpecStatusMessage = "Run cancelled as the PipelineRun it belongs to has timed out."
)

// GetParam gets the Param from the RunSpec with the given name
Expand Down
12 changes: 12 additions & 0 deletions pkg/apis/pipeline/v1beta1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 44 additions & 0 deletions pkg/apis/pipeline/v1beta1/pipelinerun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,40 @@ func (pr *PipelineRun) HasTimedOut(ctx context.Context, c clock.PassiveClock) bo
return false
}

// HaveTasksTimedOut returns true if a pipelinerun has exceeded its spec.Timeouts.Tasks
func (pr *PipelineRun) HaveTasksTimedOut(ctx context.Context, c clock.PassiveClock) bool {
timeout := pr.TasksTimeout()
startTime := pr.Status.StartTime

if !startTime.IsZero() && timeout != nil {
if timeout.Duration == config.NoTimeoutDuration {
return false
}
runtime := c.Since(startTime.Time)
if runtime > timeout.Duration {
return true
}
}
return false
}

// HasFinallyTimedOut returns true if a pipelinerun has exceeded its spec.Timeouts.Finally, based on status.FinallyStartTime
func (pr *PipelineRun) HasFinallyTimedOut(ctx context.Context, c clock.PassiveClock) bool {
timeout := pr.FinallyTimeout()
startTime := pr.Status.FinallyStartTime

if startTime != nil && !startTime.IsZero() && timeout != nil {
if timeout.Duration == config.NoTimeoutDuration {
return false
}
runtime := c.Since(startTime.Time)
if runtime > timeout.Duration {
return true
}
}
return false
}

// HasVolumeClaimTemplate returns true if PipelineRun contains volumeClaimTemplates that is
// used for creating PersistentVolumeClaims with an OwnerReference for each run
func (pr *PipelineRun) HasVolumeClaimTemplate() bool {
Expand Down Expand Up @@ -418,6 +452,10 @@ type PipelineRunStatusFields struct {
// +optional
// +listType=atomic
ChildReferences []ChildStatusReference `json:"childReferences,omitempty"`

// FinallyStartTime is when all non-finally tasks have been completed and only finally tasks are being executed.
// +optional
FinallyStartTime *metav1.Time `json:"finallyStartTime,omitempty"`
}

// SkippedTask is used to describe the Tasks that were skipped due to their When Expressions
Expand Down Expand Up @@ -450,6 +488,12 @@ const (
GracefullyStoppedSkip SkippingReason = "PipelineRun was gracefully stopped"
// MissingResultsSkip means the task was skipped because it's missing necessary results
MissingResultsSkip SkippingReason = "Results were missing"
// PipelineTimedOutSkip means the task was skipped because the PipelineRun has passed its overall timeout.
PipelineTimedOutSkip SkippingReason = "PipelineRun timeout has been reached"
// TasksTimedOutSkip means the task was skipped because the PipelineRun has passed its Timeouts.Tasks.
TasksTimedOutSkip SkippingReason = "PipelineRun Tasks timeout has been reached"
// FinallyTimedOutSkip means the task was skipped because the PipelineRun has passed its Timeouts.Finally.
FinallyTimedOutSkip SkippingReason = "PipelineRun Finally timeout has been reached"
// None means the task was not skipped
None SkippingReason = "None"
)
Expand Down
31 changes: 31 additions & 0 deletions pkg/apis/pipeline/v1beta1/pipelinerun_types_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,37 @@ func TestPipelineRunHasTimedOut(t *testing.T) {
t.Errorf("Expected HasTimedOut to be %t when using pipeline.timeouts.pipeline", tc.expected)
}
})
t.Run("pipeline.timeouts.tasks "+tc.name, func(t *testing.T) {
pr := &v1beta1.PipelineRun{
ObjectMeta: metav1.ObjectMeta{Name: "foo"},
Spec: v1beta1.PipelineRunSpec{
Timeouts: &v1beta1.TimeoutFields{Tasks: &metav1.Duration{Duration: tc.timeout}},
},
Status: v1beta1.PipelineRunStatus{PipelineRunStatusFields: v1beta1.PipelineRunStatusFields{
StartTime: &metav1.Time{Time: tc.starttime},
}},
}

if pr.HaveTasksTimedOut(context.Background(), testClock) != tc.expected {
t.Errorf("Expected HasTimedOut to be %t when using pipeline.timeouts.pipeline", tc.expected)
}
})
t.Run("pipeline.timeouts.finally "+tc.name, func(t *testing.T) {
pr := &v1beta1.PipelineRun{
ObjectMeta: metav1.ObjectMeta{Name: "foo"},
Spec: v1beta1.PipelineRunSpec{
Timeouts: &v1beta1.TimeoutFields{Finally: &metav1.Duration{Duration: tc.timeout}},
},
Status: v1beta1.PipelineRunStatus{PipelineRunStatusFields: v1beta1.PipelineRunStatusFields{
StartTime: &metav1.Time{Time: tc.starttime},
FinallyStartTime: &metav1.Time{Time: tc.starttime},
}},
}

if pr.HasFinallyTimedOut(context.Background(), testClock) != tc.expected {
t.Errorf("Expected HasTimedOut to be %t when using pipeline.timeouts.pipeline", tc.expected)
}
})
}
}

Expand Down
8 changes: 8 additions & 0 deletions pkg/apis/pipeline/v1beta1/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -1015,6 +1015,10 @@
"x-kubernetes-patch-merge-key": "type",
"x-kubernetes-patch-strategy": "merge"
},
"finallyStartTime": {
"description": "FinallyStartTime is when all non-finally tasks have been completed and only finally tasks are being executed.",
"$ref": "#/definitions/v1.Time"
},
"observedGeneration": {
"description": "ObservedGeneration is the 'Generation' of the Service that was last processed by the controller.",
"type": "integer",
Expand Down Expand Up @@ -1079,6 +1083,10 @@
"description": "CompletionTime is the time the PipelineRun completed.",
"$ref": "#/definitions/v1.Time"
},
"finallyStartTime": {
"description": "FinallyStartTime is when all non-finally tasks have been completed and only finally tasks are being executed.",
"$ref": "#/definitions/v1.Time"
},
"pipelineResults": {
"description": "PipelineResults are the list of results written out by the pipeline task's containers",
"type": "array",
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/pipeline/v1beta1/taskrun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ const (
// TaskRunCancelledByPipelineMsg indicates that the PipelineRun of which this
// TaskRun was a part of has been cancelled.
TaskRunCancelledByPipelineMsg TaskRunSpecStatusMessage = "TaskRun cancelled as the PipelineRun it belongs to has been cancelled."
// TaskRunCancelledByPipelineTimeoutMsg indicates that the TaskRun was cancelled because the PipelineRun running it timed out.
TaskRunCancelledByPipelineTimeoutMsg TaskRunSpecStatusMessage = "TaskRun cancelled as the PipelineRun it belongs to has timed out."
)

// TaskRunDebug defines the breakpoint config for a particular TaskRun
Expand Down
8 changes: 2 additions & 6 deletions pkg/apis/pipeline/v1beta1/taskrun_types_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ func TestTaskRunIsDone(t *testing.T) {
},
}
if !tr.IsDone() {
t.Fatal("Expected pipelinerun status to be done")
t.Fatal("Expected taskrun status to be done")
}
}

Expand All @@ -131,11 +131,7 @@ func TestTaskRunIsCancelled(t *testing.T) {
},
}
if !tr.IsCancelled() {
t.Fatal("Expected pipelinerun status to be cancelled")
}
expected := ""
if string(tr.Spec.StatusMessage) != expected {
t.Fatalf("Expected StatusMessage is %s but got %s", expected, tr.Spec.StatusMessage)
t.Fatal("Expected taskrun status to be cancelled")
}
}

Expand Down
4 changes: 4 additions & 0 deletions pkg/apis/pipeline/v1beta1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 27 additions & 16 deletions pkg/reconciler/pipelinerun/cancel.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ import (
"time"

"github.com/tektoncd/pipeline/pkg/apis/config"

"github.com/tektoncd/pipeline/pkg/apis/pipeline/v1alpha1"
"github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1"
clientset "github.com/tektoncd/pipeline/pkg/client/clientset/versioned"
Expand All @@ -35,6 +34,7 @@ import (
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/sets"
"knative.dev/pkg/apis"
)

Expand Down Expand Up @@ -124,9 +124,14 @@ func cancelPipelineRun(ctx context.Context, logger *zap.SugaredLogger, pr *v1bet

// cancelPipelineTaskRuns patches `TaskRun` and `Run` with canceled status
func cancelPipelineTaskRuns(ctx context.Context, logger *zap.SugaredLogger, pr *v1beta1.PipelineRun, clientSet clientset.Interface) []string {
return cancelPipelineTaskRunsForTaskNames(ctx, logger, pr, clientSet, sets.NewString())
}

// cancelPipelineTaskRunsForTaskNames patches `TaskRun`s and `Run`s for the given task names, or all if no task names are given, with canceled status
func cancelPipelineTaskRunsForTaskNames(ctx context.Context, logger *zap.SugaredLogger, pr *v1beta1.PipelineRun, clientSet clientset.Interface, taskNames sets.String) []string {
errs := []string{}

trNames, runNames, err := getChildObjectsFromPRStatus(ctx, pr.Status)
trNames, runNames, err := getChildObjectsFromPRStatusForTaskNames(ctx, pr.Status, taskNames)
if err != nil {
errs = append(errs, err.Error())
}
Expand All @@ -152,9 +157,9 @@ func cancelPipelineTaskRuns(ctx context.Context, logger *zap.SugaredLogger, pr *
return errs
}

// getChildObjectsFromPRStatus returns taskruns and runs in the PipelineRunStatus's ChildReferences or TaskRuns/Runs,
// based on the value of the embedded status flag.
func getChildObjectsFromPRStatus(ctx context.Context, prs v1beta1.PipelineRunStatus) ([]string, []string, error) {
// getChildObjectsFromPRStatusForTaskNames returns taskruns and runs in the PipelineRunStatus's ChildReferences or TaskRuns/Runs,
// based on the value of the embedded status flag and the given set of PipelineTask names. If that set is empty, all are returned.
func getChildObjectsFromPRStatusForTaskNames(ctx context.Context, prs v1beta1.PipelineRunStatus, taskNames sets.String) ([]string, []string, error) {
cfg := config.FromContextOrDefaults(ctx)

var trNames []string
Expand All @@ -163,21 +168,27 @@ func getChildObjectsFromPRStatus(ctx context.Context, prs v1beta1.PipelineRunSta

if cfg.FeatureFlags.EmbeddedStatus != config.FullEmbeddedStatus {
for _, cr := range prs.ChildReferences {
switch cr.Kind {
case "TaskRun":
trNames = append(trNames, cr.Name)
case "Run":
runNames = append(runNames, cr.Name)
default:
unknownChildKinds[cr.Name] = cr.Kind
if taskNames.Len() == 0 || taskNames.Has(cr.PipelineTaskName) {
switch cr.Kind {
case "TaskRun":
trNames = append(trNames, cr.Name)
case "Run":
runNames = append(runNames, cr.Name)
default:
unknownChildKinds[cr.Name] = cr.Kind
}
}
}
} else {
for trName := range prs.TaskRuns {
trNames = append(trNames, trName)
for trName, trs := range prs.TaskRuns {
if taskNames.Len() == 0 || taskNames.Has(trs.PipelineTaskName) {
trNames = append(trNames, trName)
}
}
for runName := range prs.Runs {
runNames = append(runNames, runName)
for runName, runStatus := range prs.Runs {
if taskNames.Len() == 0 || taskNames.Has(runStatus.PipelineTaskName) {
runNames = append(runNames, runName)
}
}
}

Expand Down
22 changes: 20 additions & 2 deletions pkg/reconciler/pipelinerun/cancel_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ import (
"context"
"testing"

"k8s.io/apimachinery/pkg/util/sets"

"github.com/google/go-cmp/cmp"
"github.com/tektoncd/pipeline/pkg/apis/config"
"github.com/tektoncd/pipeline/test/diff"
Expand Down Expand Up @@ -388,11 +390,12 @@ func TestCancelPipelineRun(t *testing.T) {
}
}

func TestGetChildObjectsFromPRStatus(t *testing.T) {
func TestGetChildObjectsFromPRStatusForTaskNames(t *testing.T) {
testCases := []struct {
name string
embeddedStatus string
prStatus v1beta1.PipelineRunStatus
taskNames sets.String
expectedTRNames []string
expectedRunNames []string
hasError bool
Expand Down Expand Up @@ -433,6 +436,21 @@ func TestGetChildObjectsFromPRStatus(t *testing.T) {
expectedTRNames: []string{"t1"},
expectedRunNames: []string{"r1"},
hasError: false,
}, {
name: "taskrun and run, default embedded, just want taskrun",
embeddedStatus: config.DefaultEmbeddedStatus,
prStatus: v1beta1.PipelineRunStatus{PipelineRunStatusFields: v1beta1.PipelineRunStatusFields{
TaskRuns: map[string]*v1beta1.PipelineRunTaskRunStatus{
"t1": {PipelineTaskName: "task-1"},
},
Runs: map[string]*v1beta1.PipelineRunRunStatus{
"r1": {PipelineTaskName: "run-1"},
},
}},
taskNames: sets.NewString("task-1"),
expectedTRNames: []string{"t1"},
expectedRunNames: nil,
hasError: false,
}, {
name: "full embedded",
embeddedStatus: config.FullEmbeddedStatus,
Expand Down Expand Up @@ -516,7 +534,7 @@ func TestGetChildObjectsFromPRStatus(t *testing.T) {
cfg.OnConfigChanged(withCustomTasks(withEmbeddedStatus(newFeatureFlagsConfigMap(), tc.embeddedStatus)))
ctx = cfg.ToContext(ctx)

trNames, runNames, err := getChildObjectsFromPRStatus(ctx, tc.prStatus)
trNames, runNames, err := getChildObjectsFromPRStatusForTaskNames(ctx, tc.prStatus, tc.taskNames)

if tc.hasError {
if err == nil {
Expand Down
Loading

0 comments on commit 58fd07c

Please sign in to comment.