Refactor metrics observe calls to report metrics as soon as possible #456

HeavyWombat · 2020-10-30T14:31:45Z

Based on the suggestion by @SaschaSchwarze0, there is a gap in the metrics reporting. At the moment, the metrics are observed when the BuildRun is done. However, a metric like the BuildRun Establised Duration could be reported as soon as the BuildRun is available in the system. Furthermore, in case the BuildRun fails, the metrics for this execution are not reported at all.

Idea for refactoring: Move metrics for build-run-count, build-run-established, and build-run-rampup to a section in Reconcile where the details are already available and where in theory the metrics should only be reported once.

SaschaSchwarze0 · 2020-10-30T14:42:02Z

pkg/controller/buildrun/buildrun_controller.go

+			// Report the buildrun established duration (time between the creation of the buildrun and the start of the buildrun)
+			buildmetrics.BuildRunEstablishObserve(
+				buildRun.Status.BuildSpec.StrategyRef.Name,
+				buildRun.Namespace,
+				buildRun.Status.StartTime.Time.Sub(buildRun.CreationTimestamp.Time),
+			)


I do not think that this one can work here because the buildRun does not yet have a StartTime. This is set later when the TaskRun started.

Let me double check.

I have no idea what I did in my local test with minikube that did not show up. Today, I was able to reproduce that this leads to an expected nil pointer de-reference scenario. Therefore, I moved the code section to the part of the code where the required start time is actually set for the first time.

SaschaSchwarze0

Looks good. I think we need to handle nil in one if clause.

SaschaSchwarze0 · 2020-11-03T07:44:56Z

pkg/controller/buildrun/buildrun_controller.go

@@ -383,35 +393,29 @@ func (r *ReconcileBuildRun) Reconcile(request reconcile.Request) (reconcile.Resu
 			}

 			buildRun.Status.LatestTaskRunRef = &lastTaskRun.Name
-			buildRun.Status.StartTime = lastTaskRun.Status.StartTime
+
+			if buildRun.Status.StartTime == nil {


In Tekton's TaskRun, Status.StartTime is defined as *metav1.Time and therefore can be nil. We might only be hitting this case here if a TaskRun fails right away and is never started. If my understanding of the IsZero function is correct, I think we could do:

if buildRun.Status.StartTime == nil && !lastTaskRun.Status.StartTime.IsZero() {

As I do not think that Tekton will ever set a zero timestamp, this will probably work the same way:

if buildRun.Status.StartTime == nil && lastTaskRun.Status.StartTime != nil {

Good point. I will add the second check.

Order imports based on import mode using `goimports -w`.

There are a couple of TYPOs in the metrics test package, one even in the variable name for known histogram metrics. Fix TYPOs in `metrics_test.go` source file.

There are metrics that are observed when the build run is finished even though the actual details were available already. Therefore, it could happen that some metrics are *not* reported due to a failing build run and would be lost. If the metrics observation would be right after the creation of the `TaskRun`, then the metrics should be reported in any case and should also only be reported once. Move `BuildRunCountInc` and `BuildRunRampUpDurationObserve` calls to after when the `TaskRun` is created, because at that point in time the required information are already available. Move `BuildRunEstablishObserve` to the section of the code, where the required time fields are set for the first time.

Since some metrics need to be reported as early as possible, the respective mocks used in the test cases require to have start time and build spec details. Add start time to task run mocks. Add build spec to build run mocks.

SaschaSchwarze0

/lgtm
/approve

openshift-ci-robot · 2020-11-03T14:07:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [SaschaSchwarze0]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot requested review from gabemontero and otaviof October 30, 2020 14:31

SaschaSchwarze0 requested changes Oct 30, 2020

View reviewed changes

HeavyWombat requested a review from SaschaSchwarze0 November 2, 2020 20:17

SaschaSchwarze0 requested changes Nov 3, 2020

View reviewed changes

HeavyWombat added 4 commits November 3, 2020 14:02

Group imports in metrics test package

e6536e7

Order imports based on import mode using `goimports -w`.

Fix TYPOs in metrics test package

8825065

There are a couple of TYPOs in the metrics test package, one even in the variable name for known histogram metrics. Fix TYPOs in `metrics_test.go` source file.

Introduce required fields in mocks

ab18d6b

Since some metrics need to be reported as early as possible, the respective mocks used in the test cases require to have start time and build spec details. Add start time to task run mocks. Add build spec to build run mocks.

HeavyWombat requested a review from SaschaSchwarze0 November 3, 2020 13:03

SaschaSchwarze0 approved these changes Nov 3, 2020

View reviewed changes

openshift-ci-robot assigned SaschaSchwarze0 Nov 3, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 3, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 3, 2020

openshift-merge-robot merged commit d34f948 into shipwright-io:master Nov 3, 2020

HeavyWombat deleted the refactor/send-metrics branch December 8, 2020 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor metrics observe calls to report metrics as soon as possible #456

Refactor metrics observe calls to report metrics as soon as possible #456

HeavyWombat commented Oct 30, 2020

SaschaSchwarze0 Oct 30, 2020

HeavyWombat Oct 30, 2020

HeavyWombat Nov 2, 2020

SaschaSchwarze0 left a comment

SaschaSchwarze0 Nov 3, 2020

HeavyWombat Nov 3, 2020

SaschaSchwarze0 left a comment

openshift-ci-robot commented Nov 3, 2020

Refactor metrics observe calls to report metrics as soon as possible #456

Refactor metrics observe calls to report metrics as soon as possible #456

Conversation

HeavyWombat commented Oct 30, 2020

SaschaSchwarze0 Oct 30, 2020

Choose a reason for hiding this comment

HeavyWombat Oct 30, 2020

Choose a reason for hiding this comment

HeavyWombat Nov 2, 2020

Choose a reason for hiding this comment

SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

SaschaSchwarze0 Nov 3, 2020

Choose a reason for hiding this comment

HeavyWombat Nov 3, 2020

Choose a reason for hiding this comment

SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Nov 3, 2020