Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet]: Status not updated for Remote Elasticsearch cluster when set to default and agent is connected. #177927

Closed
amolnater-qasource opened this issue Mar 4, 2024 · 9 comments · Fixed by #178857
Assignees
Labels
bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@amolnater-qasource
Copy link

Kibana Build details:

VERSION: 8.13.0 BC3
BUILD: 71857
COMMIT: 82f46148c91eec93ac7382147936028db2eb8883

Host OS: All

Preconditions:

  1. 8.13.0-BC3 Kibana cloud environment should be available.
  2. An agent should be installed.

Steps to reproduce:

  1. Create Remote ES and set it to default.
  2. Now install an agent with agent policy outputs set to default.
  3. Observe inconsistently-> agent never sends data to the Remote Elasticsearch.
  4. Observe Status not updated for default remote cluster.
  5. Navigate to Settings and set default Elasticsearch as default output.
  6. Under Agent policy settings> set output to Remote Elasticsearch.
  7. Observe agent is now sending the data to the remote elasticsearch and status is also updated.

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2024-03-04.17-33-15.mp4

Expected Result:
Status should get updated for Remote Elasticsearch cluster when set to default and agent should send data to the remote cluster when set to Default.

@amolnater-qasource amolnater-qasource added bug Fixes for quality problems that affect the customer experience impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team labels Mar 4, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@amolnater-qasource
Copy link
Author

@manishgupta-qasource Please review.

@manishgupta-qasource
Copy link

Secondary review for this ticket is Done

@juliaElastic juliaElastic self-assigned this Mar 14, 2024
@juliaElastic
Copy link
Contributor

juliaElastic commented Mar 18, 2024

I looked into this, and tried to reproduce locally:

  • enrolled an agent to an agent policy with default remote es output
  • I'm seeing the data being sent to remote es correctly (3. not reproduced)
  • I can reproduce that the output health is not showing up on the UI. I found that this is because the remote es output health is reported under the name "default" instead of the output name. This comes from the fact that the agent policy refers to the remote es output as "default".

One way to solve this would be to improve the UI, to find the output health based on the name "default", if the default output is remote es. Alternatively fleet-server could look up the output id to report output health with the real id instead of "default".

I'm going to decrease the impact to medium, as this is an edge case of using default remote output, and there is an easy workaround (to set an elasticsearch output as default).

image image

@juliaElastic juliaElastic added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Mar 18, 2024
juliaElastic added a commit that referenced this issue Mar 19, 2024
…ault (#178857)

## Summary

Closes #177927

Replaced "default" with real output id in full agent policy. This fixes
the issue that the remote es health reporting was incorrect if the
output was set as default.

More explanation on the bug:
#177927 (comment)

To verify:
- create a remote es output and set as default (both data and
monitoring)
- create an agent policy that uses default output 
- enroll an agent
- expect that the agent sends system and elastic-agent metrics/logs to
remote es
- verify that the remote es health badge shows up on UI

<img width="1283" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/348406d4-69e6-4eda-b396-635771d1edf3">
<img width="695" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/cd03310c-d50d-42ea-8f28-136bf068c52d">



### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@amolnater-qasource amolnater-qasource added the QA:Ready for Testing Code is merged and ready for QA to validate label Mar 20, 2024
juliaElastic added a commit that referenced this issue Mar 22, 2024
…arch type (#179218)

## Summary

Related to #178857 and
#177927

It seems that using output id instead of "default" in full agent policy
had a higher impact than expected. There are a few places where agent
relies on the name "default".
([This](elastic/elastic-agent#4454) and
[this](elastic/elastic-agent#4453) pr)
Because of this, doing a partial revert, to keep using "default" for
elasticsearch output type to avoid breaking change.
However, for other types, using the output id. This will fix the
original issue of remote output health reporting.
I think it is a rarely used feature to use a non-elasticsearch output as
default, so it shouldn't have a big impact to not use "default" output
name for those.

To verify:
- create a remote es output and set as default (both data and
monitoring)
- create an agent policy that uses default output 
- enroll an agent
- expect that the agent sends system and elastic-agent metrics/logs to
remote es
- verify that the remote es health badge shows up on UI
- set elasticsearch output back as default
- verify that the agent policy has it as "default" in outputs section

<img width="704" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/ab46b00d-efc2-49e1-ad7f-9acd44b2a9e5">
<img width="1251" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/a07c0d78-9126-43d9-bd0e-a4df193d7e78">
<img width="1791" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/868a054b-2cae-42f3-8f60-f2bff3b29efd">

<img width="715" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/721cd809-5f97-47e5-bf99-19f542d8ff83">



### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@amolnater-qasource amolnater-qasource removed the QA:Ready for Testing Code is merged and ready for QA to validate label Mar 26, 2024
@amolnater-qasource
Copy link
Author

Hi @juliaElastic

We have revalidated this issue on latest 8.13.0 BC7 kibana cloud environment and had below observations:

Observations:

  • Status doesn't get updated and no data for agent for Remote Elasticsearch cluster when set to default.

Steps:

  1. Create two fresh environments.
  2. Install an agent with new Agent policy with default output set as elasticsearch(the default one available.).
  3. Create new Remote Elasticsearch output and set it to Default.
  4. Observe under Agent policy too, the default output is now Remote Elasticsearch.
  5. Observe no data under Discover tab of Remote Elasticsearch.

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2024-03-26.13-12-33.mp4

Logs:

elastic-agent-diagnostics-2024-03-26T07-53-36Z-00.zip

Build details:
VERSION: 8.13.0 BC7
BUILD: 72069
COMMIT: 2e3a5cd

Hence, we are reopening this issue.

Further we have shared the build details with you over slack.

Thanks!!

@juliaElastic
Copy link
Contributor

Sorry, the fix was not backported to 8.13, so that's why it is reproducible. I'll backport it now, but it won't make it to 8.13.0 as the last BC is built. We can include in 8.13.1.

juliaElastic added a commit to juliaElastic/kibana that referenced this issue Mar 26, 2024
…arch type (elastic#179218)

## Summary

Related to elastic#178857 and
elastic#177927

It seems that using output id instead of "default" in full agent policy
had a higher impact than expected. There are a few places where agent
relies on the name "default".
([This](elastic/elastic-agent#4454) and
[this](elastic/elastic-agent#4453) pr)
Because of this, doing a partial revert, to keep using "default" for
elasticsearch output type to avoid breaking change.
However, for other types, using the output id. This will fix the
original issue of remote output health reporting.
I think it is a rarely used feature to use a non-elasticsearch output as
default, so it shouldn't have a big impact to not use "default" output
name for those.

To verify:
- create a remote es output and set as default (both data and
monitoring)
- create an agent policy that uses default output 
- enroll an agent
- expect that the agent sends system and elastic-agent metrics/logs to
remote es
- verify that the remote es health badge shows up on UI
- set elasticsearch output back as default
- verify that the agent policy has it as "default" in outputs section

<img width="704" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/ab46b00d-efc2-49e1-ad7f-9acd44b2a9e5">
<img width="1251" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/a07c0d78-9126-43d9-bd0e-a4df193d7e78">
<img width="1791" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/868a054b-2cae-42f3-8f60-f2bff3b29efd">

<img width="715" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/721cd809-5f97-47e5-bf99-19f542d8ff83">



### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@juliaElastic
Copy link
Contributor

juliaElastic commented Mar 26, 2024

Looked at the clusters as well, and noticed that the issue is only partially that the backport was not done. The other issue is that the data is not being sent to the remote elasticsearch.
I found in agent logs that the agent can't access the remote ES because the API key is invalid. Did you create the service token in the remote ES? I'm not seeing any fleet-server-remote API keys in the second cluster.

{"log.level":"error","@timestamp":"2024-03-26T07:53:36.741Z","message":"Failed to connect to backoff(elasticsearch(https://f2f004010bde45539e6a68e6bacaef4f.us-west2.gcp.elastic-cloud.com:443)): 401 Unauthorized: {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: unable to find apikey with id P4ioeY4BR0MsIeJOyZ8w\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: unable to find apikey with id P4ioeY4BR0MsIeJOyZ8w\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":148,"file.name":"pipeline/client_worker.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
image

The remote service token should have been created manually with the API request in the remote output instructions:
image

@amolnater-qasource
Copy link
Author

amolnater-qasource commented Mar 26, 2024

Thank you for looking into this @juliaElastic

I found in agent logs that the agent can't access the remote ES because the API key is invalid. Did you create the service token in the remote ES? I'm not seeing any fleet-server-remote API keys in the second cluster.

We have created the Token in remote cluster only
image

However, the issue is only reproducible for the first time(inconsistent) and whenever we retry after that, it always sends the data.

Screenshot:
image

Further, thanks for backporting, we will retest the status issue once 8.13.1 is available.

Thanks!

@amolnater-qasource amolnater-qasource added the QA:Ready for Testing Code is merged and ready for QA to validate label Mar 26, 2024
juliaElastic added a commit that referenced this issue Mar 26, 2024
…"default" for non-elasticsearch outputs) (#179406)

## Summary

Backport #178857 and
#179218 to 8.13
Closes #177927
@amolnater-qasource
Copy link
Author

Hi Team,

We have revalidated this issue on latest 8.14.0 SNAPSHOT and found this issue fixed now.

Observations:

  • Status is updated for Remote Elasticsearch cluster when set to default and agent is connected.

Screenshot:
image

Build details:
VERSION: 8.14.0 SNAPSHOT
BUILD: 73332
COMMIT: 3f75f6a

Hence, we are closing and marking this issue as QA:Validated.
Thanks!

@amolnater-qasource amolnater-qasource removed the QA:Ready for Testing Code is merged and ready for QA to validate label Apr 17, 2024
@amolnater-qasource amolnater-qasource added the QA:Validated Issue has been validated by QA label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
4 participants