Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix auto-discovery for latest versions on Kubernetes #9574

Merged
merged 3 commits into from
Jul 1, 2021

Conversation

L3n41c
Copy link
Member

@L3n41c L3n41c commented Jun 22, 2021

What does this PR do?

Leverages the possible_prometheus_urls parameter introduced by #9573 in the auto-discovery configuration file.

Motivation

Make the agent able to automatically monitor the Kube scheduler on most recent versions of Kubernetes without requiring too much user setup.
The goad is to simplify the Kubernetes control plane monitoring setup described in DataDog/documentation#10828.

Additional Notes

On older versions of Kubernetes, the Kube scheduler metrics were available at http://%%host%%:10251/metrics where %%host%% is the node IP.
On most recent version of Kubernetes,

  • only the secure https endpoint is available by default. The plain http endpoint is disabled by default.
  • the port is bound on the loopback interface on 127.0.0.1 instead of being remotely available.

The new auto-discovery configuration will try the combination of different settings to find a usable Kube scheduler metrics endpoint.
The only thing that the end-user will still need to do to get this check working is:

  • either enabling hostNetwork: true on its agent to be able to use the 127.0.0.1 IP, or
  • to configure its control plane components to expose their metrics endpoint on 0.0.0.0 as already described in K8s control plane docs documentation#10828.

Kubernetes exposes in the pods only the certificates needed to talk to the API server.
There is no reliable way to get the CA certificate used by the Kube scheduler that would work on every distribution. In particular, at least minikube and kubeadm are using self-signed certificates.
That’s why ssl_verify: false is required.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

@ghost ghost added the documentation label Jun 22, 2021
@codecov
Copy link

codecov bot commented Jun 22, 2021

Codecov Report

Merging #9574 (7c4d6ea) into master (e97eb15) will not change coverage.
The diff coverage is n/a.

Flag Coverage Δ
kube_scheduler 98.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

vboulineau
vboulineau previously approved these changes Jun 30, 2021
@L3n41c L3n41c merged commit 7e4e0a9 into master Jul 1, 2021
@L3n41c L3n41c deleted the lenaic/kube_scheduler branch July 1, 2021 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[kube_controller_manager] [kube_scheduler] Support https endpoint for fetching metrics
2 participants