Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metrics & mixin #375

Merged
merged 8 commits into from
Mar 16, 2020
Merged

Prometheus metrics & mixin #375

merged 8 commits into from
Mar 16, 2020

Conversation

karlskewes
Copy link
Contributor

@karlskewes karlskewes commented Mar 12, 2020

Rebased Prometheus PR on master.

Closes: #177

  • add http handler on /metrics.
    There aren't any sensitive endpoints on this port, but if metrics are considered sensitive for some then I can move to a custom port.
  • add standard build_info and go_build_info metrics
  • add custom unseal error & request metrics - RE(D)
  • add mixin with:
    • Grafana dashboard
    • Alerts
  • add PodMonitor jsonnet for those using Prometheus Operator (us)

Additional thoughts:

  • make test is currently failing for me on master as well as the prometheus branch:
    --- FAIL: TestHttpCert (1.37s)
  • I did the bulk of this before key rotation was done and didn't experience any errors due to prometheus integration.
  • Key rotation metrics/dashboard/alert could be easily added in a subsequent PR.

Testing done:

  • with kind
  • create sealed secret - observe request counter increment
  • submit SS with new secret - observe request counter increment
  • submit SS with modified namespace (incorrect) - observe error increment
  • submit SS with modified encrypted value (incorrect) - observe error increment
  • delete secret - observe request counter increment
  • delete SS - observe request counter increment

- `_unseal_requests_total` Counter is incremented every time the
  `unseal()` function is called. Note that some Kubernetes events like
  deleting a `SealedSecret` results in this function being called and
  the counter incremented. Not bad though.
  This Counter gives the best indiciation of controller activity.

- `_unseal_errors_total{reason="<reason>"}` Counter with labels for each
  of the unseal failure cases; fetch, status, unmanaged, unseal, update
  The default retry behaviour of 5 times in quick succession tends to
  result in the Counter being incremented 5 times in event of a failure
  and user action is required to rectify, eg: reseal, RBAC, etc.

- `build_info` with `revision` set to `VERSION` from git/tag

All Counters are initialized to 0 during init.
- Basic alerts that will fire if there are any unseal error counter
  increments. In my experience these all require user action.

- Tests for alerts

- Simple dashboard plotting total requests and errors. Could be
  extended with `build_info` panel/etc.
@karlskewes karlskewes changed the title Prometheus metrics & mixin WIP: Prometheus metrics & mixin Mar 12, 2020
@@ -0,0 +1,56 @@
# Prometheus Mixin Makefile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put prometheus-mixin in a contrib top level directory

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also please add a README.md file in this directory explaining what it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated.

@mkmik
Copy link
Collaborator

mkmik commented Mar 13, 2020

unfortunately the tests fail because of the way multiarch docker images are currently built.

the official docker ecosystem is often very automation unfriendly; there are alternative tools that will allow us to build images and manifests offline without having to push them to a registry; I'll try to play with that and regain the ability to run integration tests in unprivileged PRs

@mkmik
Copy link
Collaborator

mkmik commented Mar 13, 2020

@kskewes, I'm happy to merge this and apply further cleanups myself. Please clarify why you marked this PR as "WIP", do you plan pushing more changes later or it's just to signal that it's not "done" done?

@karlskewes
Copy link
Contributor Author

@kskewes, I'm happy to merge this and apply further cleanups myself. Please clarify why you marked this PR as "WIP", do you plan pushing more changes later or it's just to signal that it's not "done" done?

Just to signal it's not done done. Thanks for the quick feedback.
I've started tending to it and will submit this weekend.

@karlskewes karlskewes changed the title WIP: Prometheus metrics & mixin Prometheus metrics & mixin Mar 14, 2020
Copy link
Collaborator

@mkmik mkmik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mkmik mkmik merged commit 1ba3992 into bitnami-labs:master Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Add Prometheus Metrics
2 participants