Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[READY] - monitoring: grafana and prometheus service enabled #642

Merged
merged 5 commits into from
Jan 14, 2024

Conversation

sarcasticadmin
Copy link
Member

@sarcasticadmin sarcasticadmin commented Nov 25, 2023

Description of PR

Relates to: #641 #567

This enables the foundations for grafana and prometheus services on the monitoring vm. I originally wanted to leverage collectd instead of prometheus but that ended up being more trouble than it was worth with influxdbv2 or graphite. Landing on prometheus should give us a good base and we can restrict the access to the webserver for scaping in a few ways. Grafanas admin password is currently being set on first login when creating a VM.

Currently we still need to:

  • Enable TLS
  • Generate a static configs for scapers
  • Enable prometheus exporters on Servers
  • Enable prometheus exporters on APs
  • Enable prometheus exporters on Pis

This will be done in follow up PRs if this initial approach is approved.

Prometheus is also added as a common nixos module so that it can be consumed from other machines.

ss-202311251700943674

Previous Behavior

  • Monitoring server was missing from scale-network automation

New Behavior

  • Monitoring server provisioned via a nixos config

Tests

From another nixos machine:

nix build .#nixosConfigurations.monitor.config.system.build.vm -L
export QEMU_NET_OPTS="hostfwd=tcp::2222-:22"
./result/bin/run-nixos-vm

Setting up port forwarding to expose on localhost:8000:

ssh -o StrictHostKeyChecking=no -p 2222 rherna@127.0.0.1 -L 8000:127.0.0.1:80

@nixinator
Copy link
Collaborator

nixinator commented Nov 27, 2023

Nice work on this @sarcasticadmin , i'm currently testing your changes...

ON TLS

Were going to require certs and i presume ACME is out because it's on internal ipv4 subnet.

Hmmm....

We could get everything running on ipV6, then we could get ACME , if acme supports ipv6 however. This would require our internal servers to have public ipV6 DNS entries.. I don't know how i feel about that... maybe i should feel good.

Failing that were going to need our own CA, and start shunting certs about, which might be a massive PITA. However it would be a good opportunity to build a nix internal ACME system along the lines of https://smallstep.com/blog/private-acme-server/

however, this may not be an insignificant amount of work.

@davidelang
Copy link
Collaborator

davidelang commented Nov 27, 2023 via email

@owendelong
Copy link
Collaborator

owendelong commented Nov 27, 2023 via email

@nixinator
Copy link
Collaborator

I have to remember that @owendelong doesn't use the internet, he is the internet. ;-)

@owendelong
Copy link
Collaborator

owendelong commented Nov 27, 2023 via email

@sarcasticadmin
Copy link
Member Author

Nice work on this @sarcasticadmin , i'm currently testing your changes...

@nixinator thanks for testing.

however, this may not be an insignificant amount of work.

At this point my plan is to do one of the following:

  1. Issue a self signed certificate at runtime. This will require anyone connecting to web ui to "trust on first use" but thats good enough in my book for the small number of users for this service.
  2. Make grafana web service listen only on loopback and require that users to port forward via SSH to get to the web ui. Plus side here is that this doesnt require any self signed certs and we can leave it on http.

Sounds like in general were good with this approach though, Ill mark it as READY

@sarcasticadmin sarcasticadmin changed the title [REVIEW] - monitoring: grafana and prometheus service enabled [READY] - monitoring: grafana and prometheus service enabled Nov 27, 2023
@owendelong
Copy link
Collaborator

owendelong commented Nov 28, 2023 via email

@sarcasticadmin
Copy link
Member Author

Why not use my CA? Is there a problem with, perhaps using the same wildcard cert on every box?

Its mainly to avoid the bootstrap problem of getting the private key on the VM in the first place. Many ways to do this but trying to keep it to a low level of effort.

Copy link
Collaborator

@owendelong owendelong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this looks very "rusty". Ugh.

I guess we have to use the tools that are available.

@owendelong
Copy link
Collaborator

owendelong commented Nov 28, 2023 via email

@nixinator
Copy link
Collaborator

we may be able to deploy some keys with secrix , so that keys can be bootstrapped from the flake.nix.

However, i'll have to look at it.

@owendelong
Copy link
Collaborator

owendelong commented Dec 3, 2023 via email

@nixinator
Copy link
Collaborator

sure my current flavour of secret management is secrix.

It's rather nice.. .

https://journal.platonic.systems/introducing-secrix/

This enables the foundations for grafana and prometheus services on the
monitoring vm. Currently we still need to:

  - Enable TLS
  - Generate a static configs for scapers
  - Enable prometheus exporters on Servers
  - Enable prometheus exporters on APs
  - Enable prometheus exporters on Pis

This will be done in follow up PRs.

Prometheus is also added as a common nixos module so that it can be
consumed from other machines.
[READY] Generate prometheus config from inventory
@owendelong owendelong merged commit f8243d9 into master Jan 14, 2024
1 check passed
@owendelong owendelong deleted the rh/1699428793monitor branch January 14, 2024 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants