Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a Helm chart for Kubernetes monitoring with Elastic Agent #3847

Closed
1 task done
joshdover opened this issue Nov 30, 2023 · 7 comments · Fixed by #5331
Closed
1 task done

Introduce a Helm chart for Kubernetes monitoring with Elastic Agent #3847

joshdover opened this issue Nov 30, 2023 · 7 comments · Fixed by #5331
Assignees
Labels
enhancement New feature or request in progress

Comments

@joshdover
Copy link
Contributor

joshdover commented Nov 30, 2023

Description

Extracted from an internal discussion document:

We can solve many of the challenges with deploying Agent on Kubernetes through a Helm chart for Elastic Agent. This allows us to improve the current experience by (mapped to priorities above that are not already met and easy):

  • Deploying separate cluster-level and node-level Agents
    • This allows us to eliminate the leader-election mechanism and the associated K8s API calls. These API calls currently scale horizontally with the number of Kubernetes nodes in the cluster.
    • This also allows separate resource planning and scaling for cluster-level and node-level Agents, which need different amount of resources depending on the horizontal and vertical scale of the cluster and its nodes, respectively.
  • Provide “templates” or “presets” for common integrations
    • This moves us away from our current “one-size-fits-all” k8s manifest where users have to comment/uncomment certain sections based on which integrations they want to use
    • Along with this, we can configure the minimum security privileges needed for each integration’s container and deploy separate containers for integrations that require higher-privileges
    • Examples of presets we would provide:
      • P1 A containers.logs: true option would create a DaemonSet agent for container log collection
      • P1 A deployment.metrics: true option would create a ReplicaSet agent deployed with KSM bundled in the same pod. We’d also have similar options for Services and other workload types, all which would enable inputs on the same ReplicaSet
      • P1 An autodiscovery.metrics: true option would mount the required templates for hints-based autodiscovery
      • P2 A cloudDefend.enabled: true option would add the necessary container privileges and inputs to the DaemonSet agents for Cloud Defend
      • P2 A profiling.enabled: true option would add the necessary container privileges and inputs to the DaemonSet agents for Universal Profiling
      • P2 A autoInstrumentation.enabled: true option would deploy the apm-attacher hook as a Deployment for auto instrumenting application pods with APM Agents.
  • P3 Simpler sharding of kube-state-metrics (KSM)
    • For any workload metrics that depend on kube-state-metrics, we could pseudo-automate sharding by adding a simple config like ksm.shards: 2 which would shard the ReplicaSet used for the agent collecting this data, along with KSM container in the same pod.

Tasks

Preview Give feedback
  1. pkoutsovasilis
@pkoutsovasilis
Copy link
Contributor

Hello @joshdover in this draft PR elastic/cloud-on-k8s#7356 you can find an initial effort that satisfies some of the bullets mentioned in the description.

In a high-level overview:

  • The Helm chart in the PR incorporates the ECK-operator to facilitate the deployment of Elastic Agent on Kubernetes.
    • Regrettably, there is a limitation regarding kube-state-metrics (KSM) sharding. The current ECK-operator does not support statefulset deployments for the agent, rendering KSM sharding unfeasible in the current implementation.
  • The values.yaml file within the Helm chart captures integrations with the respective configuration parameters. Boolean flags are used to enable or disable specific capabilities of integration. It's noteworthy that, at this stage, additional keys can be passed to override default values for each capability.
  • The current POC does not encompass the implementation for events, universal profiling, auto-discovery metrics, and auto-instrumentation. I would propose to first try and evaluate the direction to follow based on this PoC and of course these aspects can be addressed in subsequent iterations.

cc @norrietaylor

@joshdover
Copy link
Contributor Author

joshdover commented Dec 4, 2023

From my perspective, the progress on elastic/cloud-on-k8s#7356 is already very solid. I think we have a few things to tighten up, but I'd like to discuss what we need to merge in an initial version of this and start iterating towards a beta. Here's what I'm thinking:

Blockers to merge initial PR:

  • Alignment on ECK as the basis @strawgate
  • Alignment on the installation experience
  • A few more config knobs need to be exposed (output settings, etc.)
  • Support for hints based autodiscovery
  • Need high-level agreement on the values.yml names. We can change these later, but it'd be best to avoid breaking changes on them if at all possible
  • Basic e2e smoke test w/ installing the chart -> data available in ES @pkoutsovasilis
  • Test integration with ECK diagnostics (ideally we could also grab the user's values.yml (santized) from the chart)
    • Basics already work
    • Integration with elastic-agent diagnostics would be great

Blockers to beta release:

  • Documentation comments in values.yml
  • Documentation in ECK and/or Agent docs
  • Onboarding steps in the Fleet UI
  • Available hints annotations in the Integrations UI
    • This can be worked on even before we ship this @nimarezainia for scoping and prioritization

Optional for GA, can be done later:

  • Support apm-attacher as a sub-chart for autoinstrumentation
  • Support for universal profiling
  • Support for cloud security posture

@cmacknz
Copy link
Member

cmacknz commented Dec 7, 2023

We are going to need some automation to keep the integrations configurations in the helm charts up to date as well.

@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Feb 5, 2024

@joshdover I just pushed the first basic e2e test for the Helm Chart PoC (elastic/cloud-on-k8s@a7f9aa7). As proposed in this e2e test, the integrations Helm charts gets deployed, as a user would deploy it through cli, and there are checks that events exist in the expected data streams. However, all this investigation of e2e testing and the remaining debate should we go ECK or bare-k8s was the reason I reached out to certain individuals from ECK and MKI teams. My findings so far are:

  • MKI does not use ECK-operator but instead it utilises specialised controllers
  • ECK has attempted in the past to incorporate more targeted configs of the elastic-agent but it was decided not to be implemented
  • not sure exactly about the user-footprint of ECK

Do the above findings help to make a final decision on the debate ECK vs bare-k8s?

cc @norrietaylor

@pkoutsovasilis
Copy link
Contributor

With the latest iteration of this PR that captures the effort of this ticket, I consider multiple "features" as supported, namely:

  • support for Kubernetes integration
    • kubelet metrics
    • kube-state metrics
    • hints-based autodiscovery
    • container logs
    • etc.
  • support for Cloud-Defend integration
    • note: cloud-defend itself requires some attention
  • support for custom user-defined integrations
  • support for defining and utilising different elasticsearch outputs per integration
  • support for defining multiple elastic-agent presets; allowing different integrations to run under different agent instances

Examples of the above can be found here

Although there are still some minor rough edges to polish, I think it would be tremendously helpful if we had an early run of feedback from key individuals trying out the helm chart.

cc @norrietaylor

@jlind23
Copy link
Contributor

jlind23 commented Aug 20, 2024

@pkoutsovasilis according to the demo you gave us, should we consider this issue and its subtask as done?

@pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis according to the demo you gave us, should we consider this issue and its subtask as done?

sorry for the delay @jlind23 another issue came up and I had to switch to it, however, today I am gonna open up the Helm chart PR on elastic-agent repo that will close this one when it gets merged 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants