-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: refactor metrics endpoint #216
chore: refactor metrics endpoint #216
Conversation
refactoring is needed to make the metrics package usable from within the runner package for further metrics. This change also makes the metric-collector independent from requests to the /metrics endpoint Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
@gabriel-samfira by refactoring all the metrics, i've seen that we sometimes add the labels Will do another round in this PR and align all the labels to have at least the |
This is great! As part of the next release, I was thinking of extending the external provider interface to allow an RPC API of some sort that external providers can call into for logging, telemetry, etc. We can also do metrics there once that happens, like how long executing an operation against a provider took and things of that sort. Will do an in-depth review soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the separation of concerns and how much cleaner the code is. I have just a couple of comments, but overall this looks much nicer!
by adding the context from main and make auth.GetAdminContext accepting a context we are now able to stop the metrics collection loop once the context is canceled Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of really tiny suggestions to align with the rest of the code base. Overall this looks great! Thanks for all the work!
fail if metric registration panics Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
if err := metrics.RegisterMetrics(); err != nil { | ||
log.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one downside of this implementation (register the metrics here instead of doing it in the init
):
for any further metric we want to change, we have to do a e.g.
if cfg.Metrics.Enable {
metrics.InstanceCreatedCount.WithLabelValues(
bootstrapParams.PoolID, // label: pool_id
e.cfg.Name, // label: provider
).Inc()
}
i'm either thinking of registering the metrics, no matter if cfg.Metrics.Enable
is set or directly drop cfg.Metrics.Enable
and always run garm
with /metrics
.
WDYT @gabriel-samfira ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we just add it to the RegisterMetrics()
function and it gets registered here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In essence, RegisterMetrics()
would act like the init()
function, but would be capable of returning an error we can potentially handle. So instead of using the init()
function, we define our own and call it here to register the metrics.
If metrics are enabled, we register them and start the collector loop.
fix linter warnings Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
refactoring is needed to make the metrics package usable from within the runner package for further metrics.
This change also makes the metric-collector independent from requests to the /metrics endpoint
This refactoring is needed to continue on #181
I will raise a couple of additional PRs to at least introduce some counters, like e.g.
instance_created_total
andinstance_created_error_total
.I will follow this pattern -> https://promlabs.com/blog/2023/09/19/errors-successes-totals-which-metrics-should-i-expose-to-prometheus/#recommended-for-binary-outcomes-exposing-errors-and-totals to introduce some more metrics.
The future code might look like e.g.