Add metrics-monitoring beats to resource monitoring #4326

fearful-symmetry · 2024-02-22T18:54:53Z

What does this PR do?

This PR is currently a draft, as the test borrows the integration setup code from #4150. Once that PR is merged, we can refactor this to use the same code to setup + install integrations

Why is it important?

This PR changes the behavior of the monitoring beats, so they also monitor and report metrics on themselves. This fixes an issue where the CPU and memory usage that agent reports to fleet can be deceptive, as they don't include all the beats that are running under agent.

I also did a light refactor of the monitoring setup so we use constants for the monitoring beats IDs.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool
I have added an integration test or an E2E test

…o-monitoring

mergify · 2024-02-22T18:55:48Z

This pull request does not have a backport label. Could you fix it @fearful-symmetry? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

cmacknz · 2024-02-23T00:38:42Z

Do the beat/metrics and http/metrics inputs report the resource usage and stats of the Beat they themselves are running in? Not just the other monitoring beats, as in beat/metrics is reporting the CPU usage of beat/metrics-monitoring, http/metrics-monitoring and filestream-monitoring and not just the latter two?

fearful-symmetry · 2024-02-23T17:40:25Z

Do the beat/metrics and http/metrics inputs report the resource usage and stats of the Beat they themselves are running in? Not just the other monitoring beats, as in beat/metrics is reporting the CPU usage of beat/metrics-monitoring, http/metrics-monitoring and filestream-monitoring and not just the latter two?

That's the goal of this PR, yeah. If you want, you can verify the negative, and comment-out the lines that map in v1_monitor.go, and the resulting integration test will fail.

…o-monitoring

elasticmachine · 2024-03-07T14:53:34Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

cmacknz

How can we verify that the Fleet memory and CPU usage calculations are actually using the fields from the new documents?

If I install an agent built from this branch the CPU and memory in Fleet are exactly the same as one built from main. Possibly this is because they aren't using enough resources for their to be a detectable difference, but I would have expected memory to be higher at a minimum.

testing/integration/metrics_monitoring_test.go

leehinman · 2024-03-13T19:13:43Z

internal/pkg/agent/application/monitoring/v1_monitor_test.go

+					if binary != "filebeat" && binary != "metricbeat" {
+						t.Errorf("expected monitoring compoent to be metricbeat or filebeat, got %s", binary)
+					}
+					if componentID != "filestream-monitoring" && componentID != "beat/metrics-monitoring" && componentID != "http/metrics-monitoring" {


optional. I'm wondering if using the const monitoringFilesUnitsID might be better than the string "filestream-monitoring"

…o-monitoring

cmacknz · 2024-03-18T18:51:04Z

testing/integration/metrics_monitoring_test.go

+			res, err := estools.PerformQueryForRawQuery(ctx, query, "metrics-elastic_agent*", runner.info.ESClient)
+			require.NoError(runner.T(), err)
+			runner.T().Logf("Fetched metrics for %s, got %d hits", cid, res.Hits.Total.Value)
+			if res.Hits.Total.Value < 5 {


Why < 5? Doesn't any amount of hits mean there was at least one document matching the query?

Yah, my concern was that we could end up in some freak accident where a test misconfiguration causes us to re-use an agent install, and thus an agent ID, leading to some overlap of results. Not sure if this is realistic, though.

The agent ID you get on enrollment is unique within Fleet, so shouldn't be possible unless Fleet is completely broken.

This isn't protecting against that properly anyway IMO, it is just a magic number.

A way that should actually work would be:

Install and enroll an agent with monitoring disabled.

Wait one full metrics collection cycle.

Ensure there are no hits in the metrics-* indices for that agent.

Turn monitoring on.

Ensure there are metrics now.

I'm not sure if all of that is actually worth it though.

cmacknz · 2024-03-18T18:51:34Z

testing/integration/metrics_monitoring_test.go

+					},
+					{
+						"exists": map[string]interface{}{
+							"field": "system.process.cpu.total.value", // make sure we fetch documents that have the metric field used by fleet monitoring


You could also check for the memory metric.

elastic-sonarqube · 2024-03-18T20:53:07Z

Quality Gate passed

The SonarQube Quality Gate passed, but some issues were introduced.

3 New issues
0 Security Hotspots
68.0% 68.0% Coverage on New Code
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

This reverts commit 4594088.

…4451) This reverts commit 4594088.

…#4326)" (elastic#4451) This reverts commit 7f83ddd.

… fleet naming changes (#4462) * Reapply "Add metrics-monitoring beats to resource monitoring (#4326)" (#4451) This reverts commit 7f83ddd. * revert removal, make it easier to adjust unit ID output name * update unit names

fearful-symmetry added 3 commits February 15, 2024 12:31

add monitoring beats to resources

a886d98

Merge remote-tracking branch 'upstream/main' into add-metrics-beats-t…

da6a483

…o-monitoring

finish tests, use constant for names in monitoring

194fdd6

mergify bot assigned fearful-symmetry Feb 22, 2024

mergify bot added the backport-skip label Feb 22, 2024

fix headers, add changelog

d4a58ed

fearful-symmetry added the Team:Elastic-Agent Label for the Agent team label Feb 22, 2024

fix tests

66ca27d

fearful-symmetry added 4 commits March 6, 2024 10:11

Merge remote-tracking branch 'upstream/main' into add-metrics-beats-t…

c204c98

…o-monitoring

refactor tests

160ad65

oops

5b46d0f

Merge remote-tracking branch 'upstream/main' into add-metrics-beats-t…

4369984

…o-monitoring

fearful-symmetry marked this pull request as ready for review March 7, 2024 14:53

fearful-symmetry requested a review from a team as a code owner March 7, 2024 14:53

fearful-symmetry requested review from faec and leehinman March 7, 2024 14:53

cmacknz reviewed Mar 8, 2024

View reviewed changes

testing/integration/metrics_monitoring_test.go Outdated Show resolved Hide resolved

testing/integration/metrics_monitoring_test.go Outdated Show resolved Hide resolved

testing/integration/metrics_monitoring_test.go Show resolved Hide resolved

refine es query

9f5837a

fearful-symmetry requested a review from cmacknz March 12, 2024 16:40

leehinman approved these changes Mar 13, 2024

View reviewed changes

fearful-symmetry added 3 commits March 18, 2024 09:07

use const for tests

2dc57b4

Merge remote-tracking branch 'upstream/main' into add-metrics-beats-t…

b1d31b5

…o-monitoring

formatting

1f3f95d

cmacknz reviewed Mar 18, 2024

View reviewed changes

adjust document check

08e0ce8

fearful-symmetry requested a review from cmacknz March 19, 2024 14:38

cmacknz approved these changes Mar 19, 2024

View reviewed changes

fearful-symmetry merged commit 4594088 into elastic:main Mar 20, 2024
9 checks passed

rdner added a commit that referenced this pull request Mar 21, 2024

Revert "Add metrics-monitoring beats to resource monitoring (#4326)"

7a0b961

This reverts commit 4594088.

rdner mentioned this pull request Mar 21, 2024

Revert "Add metrics-monitoring beats to resource monitoring" #4451

Merged

pierrehilbert pushed a commit that referenced this pull request Mar 21, 2024

Revert "Add metrics-monitoring beats to resource monitoring (#4326)" (#…

7f83ddd

…4451) This reverts commit 4594088.

fearful-symmetry added a commit to fearful-symmetry/elastic-agent that referenced this pull request Mar 21, 2024

Reapply "Add metrics-monitoring beats to resource monitoring (elastic…

95a8bd6

…#4326)" (elastic#4451) This reverts commit 7f83ddd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics-monitoring beats to resource monitoring #4326

Add metrics-monitoring beats to resource monitoring #4326

fearful-symmetry commented Feb 22, 2024

mergify bot commented Feb 22, 2024

cmacknz commented Feb 23, 2024

fearful-symmetry commented Feb 23, 2024

elasticmachine commented Mar 7, 2024

cmacknz left a comment

leehinman Mar 13, 2024

cmacknz Mar 18, 2024

fearful-symmetry Mar 18, 2024

cmacknz Mar 18, 2024

cmacknz Mar 18, 2024

elastic-sonarqube bot commented Mar 18, 2024

Add metrics-monitoring beats to resource monitoring #4326

Add metrics-monitoring beats to resource monitoring #4326

Conversation

fearful-symmetry commented Feb 22, 2024

What does this PR do?

Why is it important?

Checklist

mergify bot commented Feb 22, 2024

cmacknz commented Feb 23, 2024

fearful-symmetry commented Feb 23, 2024

elasticmachine commented Mar 7, 2024

cmacknz left a comment

Choose a reason for hiding this comment

leehinman Mar 13, 2024

Choose a reason for hiding this comment

cmacknz Mar 18, 2024

Choose a reason for hiding this comment

fearful-symmetry Mar 18, 2024

Choose a reason for hiding this comment

cmacknz Mar 18, 2024

Choose a reason for hiding this comment

cmacknz Mar 18, 2024

Choose a reason for hiding this comment

elastic-sonarqube bot commented Mar 18, 2024

Quality Gate passed