-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor metrics observe calls to report metrics as soon as possible #456
Refactor metrics observe calls to report metrics as soon as possible #456
Conversation
// Report the buildrun established duration (time between the creation of the buildrun and the start of the buildrun) | ||
buildmetrics.BuildRunEstablishObserve( | ||
buildRun.Status.BuildSpec.StrategyRef.Name, | ||
buildRun.Namespace, | ||
buildRun.Status.StartTime.Time.Sub(buildRun.CreationTimestamp.Time), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that this one can work here because the buildRun does not yet have a StartTime. This is set later when the TaskRun started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what I did in my local test with minikube
that did not show up. Today, I was able to reproduce that this leads to an expected nil pointer de-reference scenario. Therefore, I moved the code section to the part of the code where the required start time is actually set for the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think we need to handle nil
in one if clause.
@@ -383,35 +393,29 @@ func (r *ReconcileBuildRun) Reconcile(request reconcile.Request) (reconcile.Resu | |||
} | |||
|
|||
buildRun.Status.LatestTaskRunRef = &lastTaskRun.Name | |||
buildRun.Status.StartTime = lastTaskRun.Status.StartTime | |||
|
|||
if buildRun.Status.StartTime == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Tekton's TaskRun, Status.StartTime
is defined as *metav1.Time
and therefore can be nil. We might only be hitting this case here if a TaskRun fails right away and is never started. If my understanding of the IsZero
function is correct, I think we could do:
if buildRun.Status.StartTime == nil && !lastTaskRun.Status.StartTime.IsZero() {
As I do not think that Tekton will ever set a zero timestamp, this will probably work the same way:
if buildRun.Status.StartTime == nil && lastTaskRun.Status.StartTime != nil {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will add the second check.
Order imports based on import mode using `goimports -w`.
There are a couple of TYPOs in the metrics test package, one even in the variable name for known histogram metrics. Fix TYPOs in `metrics_test.go` source file.
There are metrics that are observed when the build run is finished even though the actual details were available already. Therefore, it could happen that some metrics are *not* reported due to a failing build run and would be lost. If the metrics observation would be right after the creation of the `TaskRun`, then the metrics should be reported in any case and should also only be reported once. Move `BuildRunCountInc` and `BuildRunRampUpDurationObserve` calls to after when the `TaskRun` is created, because at that point in time the required information are already available. Move `BuildRunEstablishObserve` to the section of the code, where the required time fields are set for the first time.
Since some metrics need to be reported as early as possible, the respective mocks used in the test cases require to have start time and build spec details. Add start time to task run mocks. Add build spec to build run mocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SaschaSchwarze0 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Based on the suggestion by @SaschaSchwarze0, there is a gap in the metrics reporting. At the moment, the metrics are observed when the BuildRun is done. However, a metric like the BuildRun Establised Duration could be reported as soon as the BuildRun is available in the system. Furthermore, in case the BuildRun fails, the metrics for this execution are not reported at all.
Idea for refactoring: Move metrics for
build-run-count
,build-run-established
, andbuild-run-rampup
to a section inReconcile
where the details are already available and where in theory the metrics should only be reported once.