[Deployment]: Hosted fleet server gets unhealthy on 8.4 Snapshot. #1574

amolnater-qasource · 2022-06-21T11:01:12Z

Deployment Links:

Description:
Hosted fleet server gets unhealthy on 8.4 Snapshot and we have observed APM disabled under Deployment page.

Screenshots:

amolnater-qasource · 2022-06-21T11:01:37Z

@manishgupta-qasource Please review.

amolnater-qasource · 2022-06-21T11:01:54Z

FYI @jlind23 @joshdover

manishgupta-qasource · 2022-06-21T11:22:53Z

Secondary review for this ticket is Done

jlind23 · 2022-06-21T11:28:31Z

@amolnater-qasource Don't we need APM integration to enable this? Do you have access to fleet-server logs?

amolnater-qasource · 2022-06-21T13:52:53Z

Hi @jlind23
Thanks for looking into this.
APM integration is already added in the Hosted fleet server.

Screenshot:

As we are testing on a cloud build we are not sure how to get hosted fleet-server logs.
Further no logs are available under Logs tab for hosted agent, as logs collection is disabled for managed policy.

Could you please share any steps for this?

Thanks

jlind23 · 2022-06-22T07:00:10Z

As stated yesterday by @cmacknz no 8.4 snapshots were built for more than 15 days. Could you please try again with a fresh snapshot?

amolnater-qasource · 2022-06-22T08:30:34Z

Hi @jlind23
We have attempted to re-setup latest 8.4 Snapshot Kibana cloud environment and found this issue still reproducible.

Hosted fleet server gets unhealthy on 8.4 Snapshot.

Deployment Links:

Build details:
VERSION: 8.4.0
BUILD: 53825
COMMIT: e0446dac822f55f75c1d97b6d9c3f4647c445973
(June 21, 2022 04:04 PM GMT 5:30+)

We have re-validated this issue on build with above commits.
Please let us know if anything else is required from our end.
Thanks

jlind23 · 2022-06-22T14:13:42Z

@amolnater-qasource I tried on my end, no logs were available but after restarting the integration server it worked. Could you confirm?
@cmacknz @ph @pierrehilbert do you know how I can access more logs as there is nothing available in the logs UI and the Agent monitoring is disabled by default.

juliaElastic · 2022-06-22T15:18:38Z

have you tried checking the logs in https://admin.found.no ? it should work for staging instances, more info here: https://docs.google.com/presentation/d/1lIEQsQGgUR0H3MRhMqyZFP3wdofmZmiqmhYX--xT6FE/edit#slide=id.g12bd3a98d22_1_0

EDIT: this is the admin for staging: https://admin.staging.foundit.no/

jlind23 · 2022-06-22T16:00:45Z

Thanks @juliaElastic.
@amolnater-qasource I created a new deployment and did not succeed reproducing it.
As soon as you try again, could you please share your deployment ID in order to take a look at the logs?

jlind23 · 2022-06-22T16:30:59Z

@elastic/apm-server on this particular deployment I see a lot of apm error like:
precondition failed: dial tcp [::1]:9200: connect: cannot assign requested address

Does it ring a bell on your end?

axw · 2022-06-23T02:01:09Z

@jlind23 that error message means APM Server is trying to connect to Elasticsearch on localhost, which is obviously not going to work. This implies that Elastic Agent is not sending the Elasticsearch output config to APM Server, or otherwise there's a bug in APM Server related to handling the config.

We also have an issue open to investigate 8.4 failing here: elastic/apm-server#8426

amolnater-qasource · 2022-06-23T06:07:52Z

Hi @jlind23
Thanks for the update.

We have attempted to restart the integration server and we have got a Healthy Hosted Fleet server once under Agents tab.
However, this hosted fleet server again gets Unhealthy again in sometime.

Further the APM is still disabled under the deployment page.

Deployment id for this build is: 159a7d04412248a9a5ad9d9bd9a0e365

Thanks

jlind23 · 2022-06-23T08:23:56Z

Hi @amolnater-qasource , thanks for the deployment id.
Indeed I do observe the same connection problem coming from apm-server.
@amolnater-qasource can you also check the policy content to check what is the ES output value?

@pierrehilbert @ph can we quickly have someone from the control plane team looking at it?

amolnater-qasource · 2022-06-24T08:17:32Z

Hi @jlind23

you also check the policy content to check what is the ES output value?

Host value for ES is: http://fa60f7f1004648e78c8d6b853e89569a.containerhost:9244

Further for detailed information please find below attached Elastic Cloud agent policy:
elastic-agent.zip

Please let us know if anything else is required from our end.
Thanks

michel-laterman · 2022-06-24T16:45:42Z

The agent logs indicate that the APM server is degraded - this looks like it's caused by the APM server not being able to connect to ES (as noted above)

2022-06-23T18:27:40Z - message: Application: apm-server--8.4.0-SNAPSHOT[e0138b78-b40c-4839-8f77-8afa6d420554]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'

michel-laterman · 2022-06-24T16:56:20Z

One thing i've noted from the policy posted in #1574 (comment) that the fleet-server input has

    server:
      port: 8220
      host: 0.0.0.0

and the APM server input also has

      host: '0.0.0.0:8200'

I don't think this would effect the ES output, but it may be another issue

axw · 2022-06-25T00:55:53Z

I'm pretty sure this is related to some recent APM Server build changes I made, which inadvertently have us building without the Fleet management code. I'll merge a fix ASAP.

amolnater-qasource · 2022-06-28T11:30:28Z

Hi @jlind23
We have revalidated setting up 8.4 Snapshot kibana cloud-staging environment and found it fixed now.

Hosted fleet server remains Healthy on 8.4 Snapshot.
APM is enabled under Deployment settings

Screenshots:

Build details:
8.4 Snapshot
BUILD: 53965
COMMIT: 7c8b8f8cf32d752fd405ddf680175299fbd8cd32

Hence marking this as QA:Validated.
Thanks

amolnater-qasource added bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. labels Jun 21, 2022

juliaElastic mentioned this issue Jun 22, 2022

Fleet Server unhealthy on pr cloud deployment elastic/elastic-agent#575

Closed

juliaElastic mentioned this issue Jun 23, 2022

[Fleet] Improving bulk actions for more than 10k agents elastic/kibana#134565

Merged

4 tasks

axw mentioned this issue Jun 25, 2022

magefile: build x-pack/apm-server elastic/apm-server#8478

Merged

axw closed this as completed in elastic/apm-server#8478 Jun 25, 2022

amolnater-qasource added the QA:Validated Validated by the QA Team label Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Deployment]: Hosted fleet server gets unhealthy on 8.4 Snapshot. #1574

[Deployment]: Hosted fleet server gets unhealthy on 8.4 Snapshot. #1574

amolnater-qasource commented Jun 21, 2022

amolnater-qasource commented Jun 21, 2022

amolnater-qasource commented Jun 21, 2022

manishgupta-qasource commented Jun 21, 2022

jlind23 commented Jun 21, 2022

amolnater-qasource commented Jun 21, 2022

jlind23 commented Jun 22, 2022

amolnater-qasource commented Jun 22, 2022

jlind23 commented Jun 22, 2022

juliaElastic commented Jun 22, 2022 •

edited

Loading

jlind23 commented Jun 22, 2022

jlind23 commented Jun 22, 2022

axw commented Jun 23, 2022

amolnater-qasource commented Jun 23, 2022

jlind23 commented Jun 23, 2022

amolnater-qasource commented Jun 24, 2022

michel-laterman commented Jun 24, 2022 •

edited

Loading

michel-laterman commented Jun 24, 2022

axw commented Jun 25, 2022

amolnater-qasource commented Jun 28, 2022

[Deployment]: Hosted fleet server gets unhealthy on 8.4 Snapshot. #1574

[Deployment]: Hosted fleet server gets unhealthy on 8.4 Snapshot. #1574

Comments

amolnater-qasource commented Jun 21, 2022

amolnater-qasource commented Jun 21, 2022

amolnater-qasource commented Jun 21, 2022

manishgupta-qasource commented Jun 21, 2022

jlind23 commented Jun 21, 2022

amolnater-qasource commented Jun 21, 2022

jlind23 commented Jun 22, 2022

amolnater-qasource commented Jun 22, 2022

jlind23 commented Jun 22, 2022

juliaElastic commented Jun 22, 2022 • edited Loading

jlind23 commented Jun 22, 2022

jlind23 commented Jun 22, 2022

axw commented Jun 23, 2022

amolnater-qasource commented Jun 23, 2022

jlind23 commented Jun 23, 2022

amolnater-qasource commented Jun 24, 2022

michel-laterman commented Jun 24, 2022 • edited Loading

michel-laterman commented Jun 24, 2022

axw commented Jun 25, 2022

amolnater-qasource commented Jun 28, 2022

juliaElastic commented Jun 22, 2022 •

edited

Loading

michel-laterman commented Jun 24, 2022 •

edited

Loading