-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet]: Linux agents gets unhealthy with system integration on 7.17.27 #6519
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
@amolnater-qasource Kindly review |
Secondary review for this ticket is Done. |
@nfritts @norrietaylor according to the Elastic Agent diag it seems like endpoint is being degraded, can someone on your end take a look please? |
Looks like endpoint report this error: |
Hmm, I'm not sure that's the root cause. We are for some reason trying to connect to endpoint to get monitoring data the same way we do for Beats, AFAIK endpoint has never exposed a monitoring socket like that. I suspect that log is a symptom of something else. |
My initial thought is that we may have ended up out of sync on pipe/named socket bootstrapping? Endpoint was hoping to merge (but hasn't merged to 7.17 yet effectively a backport of the change we made for 8.15 with the bootstrap process to move it off of a localhost socket. The endpoint PR isn't merged yet https://github.com/elastic/endpoint-dev/pull/15344 Has Agent made changes in anticipation of changing the bootstrap? (I did a quick search but didn't see anything that stood out) If so, then we're out of sync and either the agent change will have to be reverted or we'll have to get the endpoint change merged before things will work. |
These are the changes merged between 7.17.26 and 7.17.27, not sure what could have caused this. @harshitgupta-qasource this problem was not there in 7.17.26 right? |
@harshitgupta-qasource what was the system integration version you were using? |
@harshitgupta-qasource I tried reproducing this issue using a 7.17.27 deployment and a 7.17.27 BC1 elastic agent on ubuntu 22.04 but I cannot reproduce the agent being unhealthy. I created a new empty policy and enrolled an elastic agent After the agent was healthy I added System Integration v. 1.11.1 as shipped by 7.17.27 cloud stack Waited a few minutes for the agent to become unhealthy but it didn't happen after a few minutes, so I added the defend integration to the same policy Agent is still healthy after ~20 mins from the start of my test. How long would it take for the agent to become unhealthy in your test ? |
Just adding my 2 cents here it seems latest system version with support for 7.17 was 1.15.1 (https://github.com/elastic/integrations/pull/3509/files#diff-d4cd9d386b49496970c932d312ae09b5a2acc2c3f85f75a7819064d67634248b) so it could be worth trying an update if necessary |
Please gather an endpoint diagnostic package from the Ubuntu host.
And, upload here. Thanks. I suspect that the kernel of the Ubuntu system has moved beyond the support within 7.17 and it's failing to install event sources. |
Kibana Build details:
Host OS and Browser version: [Ubuntu 22] , [Ubuntu 18], [Sles]
Preconditions:
Steps to reproduce:
Expected:
Screenshot:
Note: Reproducible on Ubuntu agents only.
Agents Logs:
elastic-agent-diagnostics-2025-01-13T05-29-48Z-00.zip
The text was updated successfully, but these errors were encountered: