Use a helper for setting the Succeeded condition on PipelineRun. #2749

mattmoor · 2020-06-04T02:27:19Z

These helpers reduce a lot of the boilerplate and give us hooks
where we can eagerly set the CompletionTime field rather than waiting
for updateStatus.

Fixes: #2741

These helpers reduce a lot of the boilerplate and give us hooks where we can eagerly set the CompletionTime field rather than waiting for `updateStatus`. Fixes: tektoncd#2741

tekton-robot · 2020-06-04T02:30:30Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/apis/pipeline/v1beta1/pipelinerun_types.go	90.5%	77.6%	-12.9
pkg/reconciler/pipelinerun/pipelinerun.go	80.5%	80.2%	-0.2

tekton-robot · 2020-06-04T02:48:48Z

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

tekton-robot · 2020-06-04T02:49:04Z

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

mattmoor · 2020-06-04T03:44:59Z

/kind bug

vdemeester

/lgtm
/cc @afrittoli @sbwsg @bobcatfish

vdemeester · 2020-06-04T08:48:18Z

pkg/apis/pipeline/v1beta1/pipelinerun_types.go

+// MarkSucceeded changes the Succeeded condition to True with the provided reason and message.
+func (pr *PipelineRunStatus) MarkSucceeded(reason, messageFormat string, messageA ...interface{}) {
+	pipelineRunCondSet.Manage(pr).MarkTrueWithReason(apis.ConditionSucceeded, reason, messageFormat, messageA...)
+	succeeded := pr.GetCondition(apis.ConditionSucceeded)
+	pr.CompletionTime = &succeeded.LastTransitionTime.Inner
+}
+
+// MarkFailed changes the Succeeded condition to False with the provided reason and message.
+func (pr *PipelineRunStatus) MarkFailed(reason, messageFormat string, messageA ...interface{}) {
+	pipelineRunCondSet.Manage(pr).MarkFalse(apis.ConditionSucceeded, reason, messageFormat, messageA...)
+	succeeded := pr.GetCondition(apis.ConditionSucceeded)
+	pr.CompletionTime = &succeeded.LastTransitionTime.Inner
+}
+
+// MarkRunning changes the Succeeded condition to Unknown with the provided reason and message.
+func (pr *PipelineRunStatus) MarkRunning(reason, messageFormat string, messageA ...interface{}) {
+	pipelineRunCondSet.Manage(pr).MarkUnknown(apis.ConditionSucceeded, reason, messageFormat, messageA...)
+}
+


afrittoli

Thanks for this!

I was under the impression that in some cases we set the status to "ConditionFalse" even in case of transient errors, but I cannot find an example anymore - perhaps it was in the taskrun controller - so I think this is good!

If we find a case of transient error in the future we'll need to make sure we don't use the MarkFailed helper because we would not want to set the completion time in that case.

The only concern left that I have is that we have an issue with setting the completion time today. We want to mark the pipeline failed as early as possible, however taskruns may still be running when that happens, and we need to keep them running until they finish, which means that the completion time does not really reflect the time when the last sub-resource completed.

To fix that issue with this change in place, it means we won't be able to set the pipeline as failed until all TaskRuns completed. Is this an acceptable approach?

@vdemeester @bobcatfish @pritidesai

/approve

tekton-robot · 2020-06-04T09:30:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [afrittoli]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pritidesai · 2020-06-04T17:43:18Z

The only concern left that I have is that we have an issue with setting the completion time today. We want to mark the pipeline failed as early as possible, however taskruns may still be running when that happens, and we need to keep them running until they finish, which means that the completion time does not really reflect the time when the last sub-resource completed.

To fix that issue with this change in place, it means we won't be able to set the pipeline as failed until all TaskRuns completed. Is this an acceptable approach?

yes its a valid concern. I havent looked into this yet (will look into it ASAP) but are we now setting pipeline competition time when a first taskRun failure is noted 🤔

	after := resources.GetPipelineConditionStatus(pr, pipelineState, c.Logger, d)
	pr.Status.SetCondition(after)		switch after.Status {
	case corev1.ConditionTrue:
		pr.Status.MarkSucceeded(after.Reason, after.Message)
	case corev1.ConditionFalse:
		pr.Status.MarkFailed(after.Reason, after.Message)

In that case, we need a separate helper to not set the competition time on failure. The issue is GetPipelineConditionStatus returns corev1.ConditionFalse as soon as a failure is discovered without checking if any other taskRuns are still running. Ideally, GetPipelineConditionStatus should wait before returning failure but at the same time, reconciler should not pick any new tasks while in this waiting phase.

A    B     C
|    |     |
X    Y     Z

Let's say we have three tasks executing in parallel, A, B, and C. Task A fails but Task B and C are still running. Now, Task B finishes execution successfully but C is still running. While we are waiting for Task C to finish, Reconciler should not schedule (Task Y) subsequent node of Task B for execution otherwise we might break backward compatibility of failing pipeline on first task failure.

Trying to address this particular scenario in issue #1680

afrittoli · 2020-06-04T19:51:49Z

TBH I'm more inclined now towards changing the logic to setting both status and completion time of the PipelineRun only once all subresources have finished their job.
@pritidesai @vdemeester @bobcatfish @imjasonh wdyt?

pritidesai · 2020-06-05T20:11:32Z

I tested master with a pipeline having three tasks:

A(failure)    B(successful with sleep 10)     C(successful with sleep 20)

PipelineRun completion time is set to the task A competition time while task B and task C are still running 😢 :

kubectl get pr -o json | jq .items[].status.completionTime
"2020-06-05T19:52:03Z"

kubectl get pr -o json | jq '[ .items[].status.taskRuns[] | (.pipelineTaskName + " " + .status.completionTime) ]'
[
  "task-a 2020-06-05T19:52:03Z",
  "task-b 2020-06-05T19:52:16Z",
  "task-c 2020-06-05T19:52:27Z"
]

tkn CLI sets the pipeline duration to the duration of failed taskRun:

tkn pr describe pipelinerun-one-failure-two-success
Name:           pipelinerun-one-failure-two-success
Namespace:      default
Pipeline Ref:   pipeline-one-failure-two-success
Timeout:        1h0m0s
Labels:
 tekton.dev/pipeline=pipeline-one-failure-two-success

🌡️  Status

STARTED          DURATION     STATUS
18 minutes ago   10 seconds   Failed

💌 Message

TaskRun pipelinerun-one-failure-two-success-task-a-sgwh2 has failed ("step-fail" exited with code 1 (image: "docker-pullable://ubuntu@sha256:747d2dbbaaee995098c9792d99bd333c6783ce56150d1b11e333bbceed5c54d7"); for logs run: kubectl -n default logs pipelinerun-one-failure-two-success-task-a-sgwh2-pod-2xldc -c step-fail
)

📦 Resources

 No resources

⚓ Params

 No params

🗂  Taskruns

 NAME                                                 TASK NAME   STARTED          DURATION     STATUS
 ∙ pipelinerun-one-failure-two-success-task-a-sgwh2   task-a      18 minutes ago   10 seconds   Failed
 ∙ pipelinerun-one-failure-two-success-task-b-82xxr   task-b      18 minutes ago   23 seconds   Succeeded
 ∙ pipelinerun-one-failure-two-success-task-c-ztv7k   task-c      18 minutes ago   34 seconds   Succeeded

afrittoli · 2020-06-05T21:20:43Z

Yeah, the completion time is not set along with setting success or failure, which is a change in behaviour. I think the next step on this should be to do so only once all resources spawned by the pipelinerun have completed their work.

@bobcatfish @vdemeester @pritidesai @mattmoor @imjasonh thoughts?
I would be happy to submit a PR for that if we agree on it.
Personally I think it's ok to have v0.13.0 behaving like this.
Alternatively we could roll this back in a v0.13.1 branch (but not on master?) and continue towards the genreconciler for v0.14

mattmoor · 2020-06-06T19:57:34Z

I'm a bit puzzled by why my change affected this unless the Succeeded condition is being set to failed and then unset before the updateStatus? Let me take another look, but a workaround would be to have the MarkUnknown variant of this clear CompletionTime 🤔

mattmoor · 2020-06-06T20:00:48Z

e.g. are we expecting this line to undo a MarkFailed earlier in the reconciliation?

pipeline/pkg/reconciler/pipelinerun/pipelinerun.go

Line 566 in b51405a

pr.Status.MarkRunning(after.Reason, after.Message)

afrittoli · 2020-06-06T20:00:50Z

TBH I would prefer having stop setting the status to failed with no completion time, wait until the DAG is completed and then report how many failed / passed / skipped / cancelled.
I started working on a patch for that, and I'll raise this question on our Monday API WG.

afrittoli · 2020-06-07T11:30:16Z

#2774

…un." PR tektoncd#2749 introduces helpers to set the completion time along with setting the Succeeded condition to Unknown, True or False. This is fine, however in combination with a previous issue, whereby we update the Succeeded condition to False in case of failure as soon as the first failure is encountered, this results in having the completion time set as soon as the first failure is encountered, which may not match the actual completion time of the pipeline run, in case other tasks were already running when the initial failure occurred. For v0.13.x we shall keep completion time and condition update separated. Next release will include this plus a fix to the original issue. This reverts commit 8ff3169.

…un." PR #2749 introduces helpers to set the completion time along with setting the Succeeded condition to Unknown, True or False. This is fine, however in combination with a previous issue, whereby we update the Succeeded condition to False in case of failure as soon as the first failure is encountered, this results in having the completion time set as soon as the first failure is encountered, which may not match the actual completion time of the pipeline run, in case other tasks were already running when the initial failure occurred. For v0.13.x we shall keep completion time and condition update separated. Next release will include this plus a fix to the original issue. This reverts commit 8ff3169.

Use a helper for setting the Succeeded condition on PipelineRun.

b51405a

These helpers reduce a lot of the boilerplate and give us hooks where we can eagerly set the CompletionTime field rather than waiting for `updateStatus`. Fixes: tektoncd#2741

tekton-robot requested review from bobcatfish and dlorenc June 4, 2020 02:27

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 4, 2020

tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 4, 2020

vdemeester reviewed Jun 4, 2020

View reviewed changes

tekton-robot requested review from afrittoli and a user June 4, 2020 08:48

tekton-robot assigned vdemeester Jun 4, 2020

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2020

afrittoli reviewed Jun 4, 2020

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2020

tekton-robot merged commit 8ff3169 into tektoncd:master Jun 4, 2020

mattmoor deleted the completion-time branch June 4, 2020 15:04

afrittoli mentioned this pull request Jun 6, 2020

Emit events from the PipelineRun controller #2545

Closed

3 tasks

afrittoli mentioned this pull request Jun 8, 2020

Revert "Use a helper for setting the Succeeded condition on PipelineRun." #2783

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a helper for setting the Succeeded condition on PipelineRun. #2749

Use a helper for setting the Succeeded condition on PipelineRun. #2749

mattmoor commented Jun 4, 2020

tekton-robot commented Jun 4, 2020

tekton-robot commented Jun 4, 2020

tekton-robot commented Jun 4, 2020

mattmoor commented Jun 4, 2020

vdemeester left a comment

vdemeester Jun 4, 2020

afrittoli left a comment

tekton-robot commented Jun 4, 2020

pritidesai commented Jun 4, 2020 •

edited

Loading

afrittoli commented Jun 4, 2020

pritidesai commented Jun 5, 2020 •

edited

Loading

afrittoli commented Jun 5, 2020

mattmoor commented Jun 6, 2020

mattmoor commented Jun 6, 2020

afrittoli commented Jun 6, 2020

afrittoli commented Jun 7, 2020

Use a helper for setting the Succeeded condition on PipelineRun. #2749

Use a helper for setting the Succeeded condition on PipelineRun. #2749

Conversation

mattmoor commented Jun 4, 2020

tekton-robot commented Jun 4, 2020

tekton-robot commented Jun 4, 2020

tekton-robot commented Jun 4, 2020

mattmoor commented Jun 4, 2020

vdemeester left a comment

Choose a reason for hiding this comment

vdemeester Jun 4, 2020

Choose a reason for hiding this comment

afrittoli left a comment

Choose a reason for hiding this comment

tekton-robot commented Jun 4, 2020

pritidesai commented Jun 4, 2020 • edited Loading

afrittoli commented Jun 4, 2020

pritidesai commented Jun 5, 2020 • edited Loading

afrittoli commented Jun 5, 2020

mattmoor commented Jun 6, 2020

mattmoor commented Jun 6, 2020

afrittoli commented Jun 6, 2020

afrittoli commented Jun 7, 2020

pritidesai commented Jun 4, 2020 •

edited

Loading

pritidesai commented Jun 5, 2020 •

edited

Loading