[Fleet]: Linux agents gets unhealthy with system integration on 7.17.27 #6519

harshitgupta-qasource · 2025-01-13T05:31:15Z

Kibana Build details:

VERSION: 7.17.27
BUILD 47755
COMMIT 828e49db669c29d8cc4f3a30f6abe5e8f69a4290
Artifact: https://staging.elastic.co/7.17.27-b47ca93f/summary-7.17.27.html#elastic-agent-package

Host OS and Browser version: [Ubuntu 22] , [Ubuntu 18], [Sles]

Preconditions:

7.17.27 BC1 Kibana Cloud environment should be available.

Steps to reproduce:

Navigate to the Agents Tab
Wait for a while till the agent becomes unhealthy.
Observe that the Ubuntu agent goes to unhealthy with system intergration
Now add endpoint security integration and Go to the Endpoint Tab
Observe that the Ubuntu agent goes to unhealthy.

Expected:

Linux agents should be healthy with system integration on 7.17.27

Screenshot:

Note: Reproducible on Ubuntu agents only.

Agents Logs:

elastic-agent-diagnostics-2025-01-13T05-29-48Z-00.zip

The text was updated successfully, but these errors were encountered:

elasticmachine · 2025-01-13T05:31:17Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

harshitgupta-qasource · 2025-01-13T05:31:28Z

@amolnater-qasource Kindly review

amolnater-qasource · 2025-01-13T06:00:09Z

Secondary review for this ticket is Done.

jlind23 · 2025-01-13T07:46:50Z

@nfritts @norrietaylor according to the Elastic Agent diag it seems like endpoint is being degraded, can someone on your end take a look please?

jlind23 · 2025-01-13T07:51:34Z

Looks like endpoint report this error:
error: 'Get "http://unix/": dial unix /opt/Elastic/Agent/data/tmp/default/endpoint-security/endpoint-security.sock: connect: no such file or directory'

cmacknz · 2025-01-13T21:27:29Z

Hmm, I'm not sure that's the root cause. We are for some reason trying to connect to endpoint to get monitoring data the same way we do for Beats, AFAIK endpoint has never exposed a monitoring socket like that. I suspect that log is a symptom of something else.

nfritts · 2025-01-14T11:02:45Z

My initial thought is that we may have ended up out of sync on pipe/named socket bootstrapping?

Endpoint was hoping to merge (but hasn't merged to 7.17 yet effectively a backport of the change we made for 8.15 with the bootstrap process to move it off of a localhost socket.

The endpoint PR isn't merged yet https://github.com/elastic/endpoint-dev/pull/15344

Has Agent made changes in anticipation of changing the bootstrap? (I did a quick search but didn't see anything that stood out) If so, then we're out of sync and either the agent change will have to be reverted or we'll have to get the endpoint change merged before things will work.

jlind23 · 2025-01-14T11:26:38Z

These are the changes merged between 7.17.26 and 7.17.27, not sure what could have caused this.

@harshitgupta-qasource this problem was not there in 7.17.26 right?

jlind23 · 2025-01-14T14:56:49Z

@harshitgupta-qasource what was the system integration version you were using?

pchila · 2025-01-14T15:12:10Z

@harshitgupta-qasource I tried reproducing this issue using a 7.17.27 deployment and a 7.17.27 BC1 elastic agent on ubuntu 22.04 but I cannot reproduce the agent being unhealthy.

I created a new empty policy and enrolled an elastic agent

After the agent was healthy I added System Integration v. 1.11.1 as shipped by 7.17.27 cloud stack

Waited a few minutes for the agent to become unhealthy but it didn't happen after a few minutes, so I added the defend integration to the same policy

Agent is still healthy after ~20 mins from the start of my test.

How long would it take for the agent to become unhealthy in your test ?
If I understood correctly you saw the agent unhealthy with just the System integration, correct ?
Is there any difference between my test steps and yours that could lead to a different result ?

marc-gr · 2025-01-14T15:29:21Z

Just adding my 2 cents here it seems latest system version with support for 7.17 was 1.15.1 (https://github.com/elastic/integrations/pull/3509/files#diff-d4cd9d386b49496970c932d312ae09b5a2acc2c3f85f75a7819064d67634248b) so it could be worth trying an update if necessary

nicholasberlin · 2025-01-14T20:18:53Z

Please gather an endpoint diagnostic package from the Ubuntu host.

$ sudo /opt/Elastic/Endpoint/elastic-endpoint diagnostics

And, upload here. Thanks.

I suspect that the kernel of the Ubuntu system has moved beyond the support within 7.17 and it's failing to install event sources.

harshitgupta-qasource added bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet]: Linux agents gets unhealthy with system integration on 7.17.27 #6519

[Fleet]: Linux agents gets unhealthy with system integration on 7.17.27 #6519

harshitgupta-qasource commented Jan 13, 2025 •

edited

Loading

elasticmachine commented Jan 13, 2025

harshitgupta-qasource commented Jan 13, 2025

amolnater-qasource commented Jan 13, 2025

jlind23 commented Jan 13, 2025

jlind23 commented Jan 13, 2025

cmacknz commented Jan 13, 2025

nfritts commented Jan 14, 2025

jlind23 commented Jan 14, 2025

jlind23 commented Jan 14, 2025

pchila commented Jan 14, 2025

marc-gr commented Jan 14, 2025

nicholasberlin commented Jan 14, 2025

[Fleet]: Linux agents gets unhealthy with system integration on 7.17.27 #6519

[Fleet]: Linux agents gets unhealthy with system integration on 7.17.27 #6519

Comments

harshitgupta-qasource commented Jan 13, 2025 • edited Loading

elasticmachine commented Jan 13, 2025

harshitgupta-qasource commented Jan 13, 2025

amolnater-qasource commented Jan 13, 2025

jlind23 commented Jan 13, 2025

jlind23 commented Jan 13, 2025

cmacknz commented Jan 13, 2025

nfritts commented Jan 14, 2025

jlind23 commented Jan 14, 2025

jlind23 commented Jan 14, 2025

pchila commented Jan 14, 2025

marc-gr commented Jan 14, 2025

nicholasberlin commented Jan 14, 2025

harshitgupta-qasource commented Jan 13, 2025 •

edited

Loading