[Self managed]: elastic_agent.metricbeat/filebeat datastreams generated on installing fleet-server agent. #376

amolnater-qasource · 2021-05-19T11:11:42Z

Kibana version: 7.13.0 BC-7 Kibana self managed environment

Host OS and Browser version: Windows 10 x64, All

Build Details:

 Artifact link used: https://staging.elastic.co/7.13.0-8eb98cbf/summary-7.13.0.html
 BUILD: 40864
 COMMIT: 6ce6847436ff9bef0ad91268b6585e0f9339c9fd

Preconditions:

7.13.0 BC-7 self-managed Kibana environment should be available.
Windows 10 x64 Fleet Server agent must be installed using Default Fleet server policy having only Fleet Server integration.

Steps to reproduce:

Login to Kibana environment.
Restart elastic-agent from Services.
Navigate to Data Streams tab.
Observe data for few datasets not generating after agent restart.

Expected Result:
Data streams should restart for all datasets on restarting Fleet Server agent.

Logs:
restart issue logs.zip

Note:

This issue is observed on self-managed kibana, when first fleet server agent is installed with Default Fleet Server Policy.
Before restarting elastic-agent data streams for all the datasets were generating at regular intervals.

Screenshot:

The text was updated successfully, but these errors were encountered:

amolnater-qasource · 2021-05-19T11:11:54Z

@dikshachauhan-qasource Please review.

dikshachauhan-qasource · 2021-05-19T11:13:30Z

Reviewed and assigned to @EricDavisX

EricDavisX · 2021-05-20T17:28:37Z

@ph @ruflin I'd like to understand this before putting to the backlog - I'd want to know a work-around too. It actually sounds severe, doesn't it? that is, if it is a real issue and not otherwise accounted for in other tickets.

The first step to confirm is, with more time (more than 6 minutes, per the screenshot) would the other datastreams not eventually send data? I don't see what would be different for elastic_agent.elastic_agent for metrics vs logs, there seems no immediate pattern to the problem. We should test and confirm that all the same beats (system and monitoring) were re-started on the host.

EricDavisX · 2021-05-20T17:30:43Z

ok, digging into more git /email updates, this seems to exactly relate - metricbeat is crashed on a restart.
elastic/beats#25785

We can close as a dupe if everyone feels confident

amolnater-qasource · 2021-05-24T06:49:52Z

Hi @EricDavisX
We have revalidated this issue on 7.13.0 BC-9 self-managed Kibana Environment.

The first step to confirm is, with more time (more than 6 minutes, per the screenshot) would the other datastreams not eventually send data?

We observed it for more than 30 minutes and still no new data for few datasets was observed.
Please refer below screenshot:

Logs:
logs.zip

Build details:
Build: 40865
Commit: 9863e88bd63ad546b9d36e6b0c0c55cb65dd9081

Please let us know if anything else is required.
Thanks
QAS

EricDavisX · 2021-05-26T19:36:56Z

Adding the 7.13.1 label to help prioritize it until such time as we are confident it is a duplicate and close it, or we otherwise fix / resolve it.

amolnater-qasource · 2021-05-27T07:49:03Z

Hi @EricDavisX

Thanks for the feedback on slack @michalpristas
We have revalidated this issue and observed that on changing the logging level to "debug" we are getting data for all the expected datasets after agent reboot.

Hence closing this out.

Thanks
QAS

EricDavisX · 2021-05-27T20:23:43Z

I will work with Amol to do a retest based on my understanding and we'll report back indeed if anything further can be confirmed as a bug.

amolnater-qasource · 2021-05-28T07:38:53Z

Hi @EricDavisX
We have revalidated this issue on 7.13.0 self managed kibana environment with "info" logging level.
We have observed 15-20 minutes before and after reboot.

Please find below the observation table for the same.

DATASET	Type	Before Reboot	After Reboot
elastic_agent.fleet_server	metrics	Generating Normally	Generating Normally
elastic_agent.elastic_agent	metrics	Generating Normally	Generating Normally
elastic_agent.metricbeat	metrics	Generating Normally	Generated only Once in beginning of Fleet Server agent reboot.
elastic_agent.filebeat	metrics	Generating Normally	Generated only Once in beginning of Fleet Server agent reboot.
elastic_agent.fleet_server	logs	Generated only Once in beginning of Fleet Server agent installation.	Generated only Once in beginning of Fleet Server agent reboot.
elastic_agent	logs	Generated only Once in beginning of Fleet Server agent installation.	Generated only Once in beginning of Fleet Server agent reboot.

Screenshots:

Please let us know if anything else is required.

Thanks
QAS

EricDavisX · 2021-05-29T20:48:00Z

@michalpristas I am re-opening this for a re-review. It looks to me that after a reboot, the Agent stops collecting dataset elastic_agent.metricbeat and elastic_agent.filebeat data. Can we discuss / review please?

EricDavisX · 2021-06-15T13:47:30Z

@amolnater-qasource can you re-test this please? we believe this and
elastic/beats#25829 are dupes (or same root cause) and are both fixed now. thank you

amolnater-qasource · 2021-06-16T08:11:47Z

Hi @EricDavisX
We have revalidated this issue on 7.14.0 self-managed Kibana and found same observations as shared in comment #376 (comment)

Build details:

Build: 41559
Commit: 9838db392e7fcfc12f004b68fb1b09739f131148
Artifact Link: https://snapshots.elastic.co/7.14.0-28665d9b/downloads/beats/elastic-agent/elastic-agent-7.14.0-SNAPSHOT-windows-x86_64.zip

Please let us know if we are missing anything.

Thanks
QAS

EricDavisX · 2021-06-16T20:06:08Z

As of today the 7.14 snapshot is 7 days old, I'm not sure it has the latest fixes in it we need. I requested the re-testing with hopes that the build was new enough - that is my fault. Ideally, we'll re-test when the 7.x build is confirmed as new. Let's wait.

EricDavisX · 2021-06-24T17:36:14Z

all of the builds are green - so we have new artifacts, please retest this and report back. I am wondering if we still have some issue seen in this and elastic/beats#26034 - @amolnater-qasource thank you

amolnater-qasource · 2021-06-28T11:00:02Z

Hi @EricDavisX
We have revalidated this issue on 7.14.0 self managed kibana environment with "info" logging level.
We have observed 15-20 minutes before and after reboot.

Observations are similar to the one's shared earlier at #376 (comment)
Observation table:

Dataset	Type	Before reboot	After reboot
elastic_agent.elastic_agent	metrics	Generating regularly	Generating regularly
elastic_agent.fleet_server	metrics	Generating regularly	Generating regularly
elastic_agent.filebeat	metrics	Generating regularly	Generated only once after reboot
elastic_agent.metricbeat	metrics	Generating regularly	Generated only once after reboot
elastic_agent.fleet_server	logs	Generated only Once	Generated only once after reboot
elastic_agent	logs	Missing	Missing

Build details:

Build: 42089
Commit: 67a71c75d2da40e49fba2620f488c9b4ce2467d2
Artifact Link: 
https://snapshots.elastic.co/7.14.0-15b00b37/downloads/elasticsearch/elasticsearch-7.14.0-SNAPSHOT-windows-x86_64.zip
https://snapshots.elastic.co/7.14.0-15b00b37/downloads/kibana/kibana-7.14.0-SNAPSHOT-windows-x86_64.zip
https://snapshots.elastic.co/7.14.0-15b00b37/downloads/beats/elastic-agent/elastic-agent-7.14.0-SNAPSHOT-windows-x86_64.zip

Note:
We have reported issue for missing "elastic_agent" dataset at elastic/beats#26518

Please let us know if anything else is required from our end.
Thanks

ph · 2021-06-30T15:08:17Z

@michalpristas is looking into this one.

michalpristas · 2021-07-06T07:36:42Z

missing dataset most likely duplicate of elastic/beats#26518
i will check missing data on other ones

michalpristas · 2021-07-06T09:07:04Z

do we have any logs from experiment above?
i'm using same version and cannot reproduce. sometimes i needed to hit refresh on UI multiple times for timestamps to get updated though, sometimes they got updated to some stale value and then after few hits to latest ones. maybe some kind of cache (using cloud instance)
do we see events in these dataset using discovery and filters?

EricDavisX · 2021-07-13T14:08:31Z

@amolnater-qasource can you re-run the test and capture the relevant logs (Agent, Fleet Server, Metricbeat, Filebeat) - the last time we posted logs was May 24, I am hoping things may be working better with recent fixes (the one Michal notes was closed out, so that is good).

amolnater-qasource · 2021-07-14T10:27:30Z

Hi @EricDavisX
We have revalidated this on 7.14.0 BC-2 self-managed Kibana environment.

Please find the required Logs as follows:
logs.zip

Build details:

Build: 42401
Commit: 9826a943dc2e47f26ec6de94816e7d297b752994
Artifact Link: https://staging.elastic.co/7.14.0-e99135ef/summary-7.14.0.html

Screenshot:

Please let us know if anything else is required from our end.

Thanks
QAS

EricDavisX · 2021-07-14T12:12:26Z

Please do re-test on BC3 or newer snapshot (the BC2 is a week old at his point, and fixes were just merged we think).

amolnater-qasource · 2021-07-15T07:18:56Z

Hi @EricDavisX
We have revalidated this issue on 7.14.0 BC-3 self managed Kibana environment with "info" logging level.

Observations are still same, please find in below table:

DATASET	Type	Before Reboot	After Reboot
elastic_agent.fleet_server	metrics	Generating Normally	Generating Normally
elastic_agent.elastic_agent	metrics	Generating Normally	Generating Normally
elastic_agent.metricbeat	metrics	Generating Normally	Generated only Once in beginning of Fleet Server agent reboot.
elastic_agent.filebeat	metrics	Generating Normally	Generated only Once in beginning of Fleet Server agent reboot.
elastic_agent.fleet_server	logs	Generated only Once in beginning of Fleet Server agent installation.	Generated only Once in beginning of Fleet Server agent reboot.
elastic_agent	logs	Generated only Once in beginning of Fleet Server agent installation.	Generated only Once in beginning of Fleet Server agent reboot.

Build details:

BC-3 Artifact Link: https://staging.elastic.co/7.14.0-682a8012/summary-7.14.0.html
Build: 42545
Commit: c314921a9893e0b46d9a3958f5520e3d6b1ce7d5

Screenshots:

Thanks
QAS

michalpristas · 2021-07-15T08:51:04Z

so you don't see no metrics coming in dashboards as well.

@michel-laterman were you able to find time to see if you can repro locally?

michel-laterman · 2021-07-15T20:24:28Z

I'm having issues recreating this locally; I'm having issues setting up my environment

michel-laterman · 2021-07-16T16:04:51Z

@amolnater-qasource, I'm unable to reproduce this on QA cloud with a locally running agent/fleet server. There have been occasion issues where Kibana fails to refresh the data stream displays, but clicking the "reload" button on the upper right of the UI resolves this.

amolnater-qasource · 2021-07-19T09:58:35Z

Hi @michel-laterman

QA cloud with a locally running agent/fleet server.

This issue is not reported for Cloud builds. It is only reproducible on self-managed/on-prem setup.

There have been occasion issues where Kibana fails to refresh the data stream displays, but clicking the "reload" button on the upper right of the UI resolves this.

You can check that we have shared our observations based on nearly 30 mins test at #376 (comment). We were getting new logs for two of the datasets however not for others.

Thanks
QAS

michel-laterman · 2021-07-19T23:12:49Z

I think I have recreated this using elastic-package stack up on the latest snapshot (on macos). I can see the same behaviour if I try to restart the docker container. The log messages look the same below is the errors in metricbeat_monitoring-json.log:

{"log.level":"info","@timestamp":"2021-07-19T23:07:01.765Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset beat.state: failure to apply state schema: 4 errors: key `management` not found; key `module` not found; key `outpu
t` not found; key `queue` not found","service.name":"metricbeat","event.dataset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-19T23:07:01.771Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset beat.stats: failure to apply stats schema: 1 error: key `libbeat` not found","service.name":"metricbeat","event.dat
aset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-19T23:07:11.765Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset beat.state: failure to apply state schema: 4 errors: key `queue` not found; key `management` not found; key `module
` not found; key `output` not found","service.name":"metricbeat","event.dataset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-19T23:07:11.771Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset beat.stats: failure to apply stats schema: 1 error: key `libbeat` not found","service.name":"metricbeat","event.dat
aset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-19T23:07:21.766Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset beat.state: failure to apply state schema: 4 errors: key `output` not found; key `queue` not found; key `management
` not found; key `module` not found","service.name":"metricbeat","event.dataset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-19T23:07:21.771Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset beat.stats: failure to apply stats schema: 1 error: key `libbeat` not found","service.name":"metricbeat","event.dat
aset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}

elastic_agent.elastic_agent and elastic_agent.fleet_servermetrics appear in the data stream up to date,elastic_agent.fleet_serverandelastic_agent` logs timestamps correspond to a restart (I believe because they do not emit log entires when everything is running as expected), and the filebeat and metricbeat metricsets no transmitted. This is probably due to the above errors with metricbeat

michalpristas · 2021-07-20T06:19:18Z

this looks like what metricbeat receives during call to stats does not conform to schema defined in beats module of metricbeat (hence missing fields)
maybe it's returning empty body or some error message.
does it go live after restart is performed? which container are you trying to restart?

michel-laterman · 2021-07-20T14:57:12Z

@michalpristas, I was trying to restart the fleet-server container

michel-laterman · 2021-07-20T18:18:18Z

The error logs above are from metricbeat trying to collect state/stats from fleet-server.
The state endpoint returns null and the stats endpoint is missing the libbeat attribute.
However this is the case when the fleet-server initially starts (and all streams work as expected)

amolnater-qasource · 2021-07-21T04:49:20Z

Hi @michel-laterman
We have revalidated this issue on 7.14.0 BC-3 self managed Kibana with "debug" logging level.

Please find our observations below:

Dataset	Type	Reboot behaviour
elastic_agent.elastic_agent	metrics	Generating Normally
elastic_agent.fleet_server	metrics	Generating Normally
elastic_agent.fleet_server	logs	Generating Normally
elastic_agent	logs	Generating consistently [After longer period of time]
elastic_agent.metricbeat	metrics	Generated only Once in beginning of Fleet Server agent reboot.
elastic_agent.filebeat	metrics	Generated only Once in beginning of Fleet Server agent reboot.

Screenshot:

Logs:
logs.zip

cc: @michalpristas

Please let us know if we are missing anything.
Thanks
QAS

michel-laterman · 2021-07-27T17:10:53Z

Alright, after some discussion I think that the description for the bug is wrong.

The elastic_agent.metricbeat and elastic_agent.filebeat streams should not be generated by the fleet-server default policy. However, currently before the server is restarted the metricbeat_monitor instance is attempting to gather this data.

If you take a look at the entries from these datasets, it will show errors:

These socket errors are expected as metricbeat/filebeat should not be running integrations (just the _monitor versions to gather agent monitoring info).

So it appears as though the bug is that on startup, the metricbeat instance attempts to gather this data. A restart corrects this.

On startup (before a restart) the elastic-agent shows that the fleet-server is re-configuring:

bash-4.2$ elastic-agent status
Status: HEALTHY
Message: (no message)
Applications:
  * fleet-server	(CONFIGURING)
    Re-configuring
  * filebeat	(HEALTHY)
    Running
  * metricbeat	(HEALTHY)
    Running

The metricbeat_monitor log shows errors connecting to metricbeat and filebeat (same as the metrics stream)

{"log.level":"info","@timestamp":"2021-07-27T15:00:13.219Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset http.json: error making http request: Get \"http://unix/stats\": dial unix /usr/share/elastic-agent/data/tmp/defaul
t/filebeat/filebeat.sock: connect: no such file or directory","service.name":"metricbeat","event.dataset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-27T15:00:13.219Z","log.origin":{"file.name":"module/wrapper.go","file.line":259},"message":"Error fetching data for metricset beat.stats: error making http request: Get \"http://unix/stats\": dial unix /usr/share/elastic-agent/data/tmp/defau
lt/metricbeat/metricbeat.sock: connect: no such file or directory","service.name":"metricbeat","event.dataset":"metricbeat_monitor-json.log","ecs.version":"1.6.0"}

After a restart, fleet-server appears healthy (elastic-agent status shows running).

EricDavisX · 2021-07-28T20:22:06Z

Thanks Michael!

@amolnater-qasource @dikshachauhan-qasource if you agree and understand the reasoning, I would like to know. In this case, too, you can update the short description to align to the 'inverse' issue that is identified. We can also update our test suite steps to confirm what is configured and note the current bug, until such time as it may be fixed. Thanks. I'll mark it as 'done' from the Urgent review side, this is nice to have it off the list even if we still have a lesser priority bug we can evaluate fixing.

It remains on the 7.15 candidate list, and we can evaluate it against other issues / features we want to work on.

amolnater-qasource · 2021-07-30T06:14:59Z

Thanks @michel-laterman for looking up into this issue.
@EricDavisX yes we agree with michel as on cloud also we observed that only 4 datasets are generated for Fleet Server agent.
No elastic_agent.metricbeat/filebeat data is generated for fleet server on 7.14.0 BC-5 cloud build.

Screenshot:

Further we had a query regarding this: Is there any action, due to which elastic_agent.metricbeat/filebeat could generate in future?

We have updated our Expected result for self managed Fleet server testcase at C77071

We will be re-testing it on self-managed 7.15 when it will be fixed.

Please let us know if anything else is required from our end.
Thanks
QAS

amolnater-qasource assigned dikshachauhan-qasource May 19, 2021

dikshachauhan-qasource removed their assignment May 19, 2021

dikshachauhan-qasource added bug Something isn't working Team:Fleet Label for the Fleet team labels May 19, 2021

dikshachauhan-qasource assigned EricDavisX May 19, 2021

ruflin added the Team:Elastic-Agent Label for the Agent team label May 20, 2021

EricDavisX removed their assignment May 20, 2021

EricDavisX removed the Team:Fleet Label for the Fleet team label May 20, 2021

EricDavisX added impact:high Short-term priority; add to current release, or definitely next. v7.13.1 labels May 24, 2021

amolnater-qasource closed this as completed May 27, 2021

EricDavisX reopened this May 29, 2021

EricDavisX assigned michalpristas May 29, 2021

EricDavisX added v7.14.0 and removed v7.13.1 labels Jun 24, 2021

EricDavisX assigned michel-laterman Jul 15, 2021

amolnater-qasource mentioned this issue Jul 19, 2021

Fleet Server goes to permanent offline state when fleet server agent is reboot after Kibana restart. #357

Closed

EricDavisX removed impact:high Short-term priority; add to current release, or definitely next. v7.14.0 labels Jul 28, 2021

amolnater-qasource changed the title ~~[Self managed]: Data streams stops for some datasets on restarting Fleet Server agent.~~ [Self managed]: elastic_agent.metricbeat/filebeat datastreams generated on installing fleet-server agent. Jul 30, 2021

michel-laterman mentioned this issue Aug 3, 2021

Disable monitoring during fleetserver bootstrap process elastic/beats#27222

Merged

2 tasks

michel-laterman closed this as completed in elastic/beats#27222 Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Self managed]: elastic_agent.metricbeat/filebeat datastreams generated on installing fleet-server agent. #376

[Self managed]: elastic_agent.metricbeat/filebeat datastreams generated on installing fleet-server agent. #376

amolnater-qasource commented May 19, 2021

amolnater-qasource commented May 19, 2021

dikshachauhan-qasource commented May 19, 2021

EricDavisX commented May 20, 2021

EricDavisX commented May 20, 2021

amolnater-qasource commented May 24, 2021

EricDavisX commented May 26, 2021

amolnater-qasource commented May 27, 2021

EricDavisX commented May 27, 2021

amolnater-qasource commented May 28, 2021

EricDavisX commented May 29, 2021

EricDavisX commented Jun 15, 2021

amolnater-qasource commented Jun 16, 2021

EricDavisX commented Jun 16, 2021

EricDavisX commented Jun 24, 2021 •

edited

Loading

amolnater-qasource commented Jun 28, 2021

ph commented Jun 30, 2021

michalpristas commented Jul 6, 2021

michalpristas commented Jul 6, 2021 •

edited

Loading

EricDavisX commented Jul 13, 2021

amolnater-qasource commented Jul 14, 2021

EricDavisX commented Jul 14, 2021

amolnater-qasource commented Jul 15, 2021

michalpristas commented Jul 15, 2021

michel-laterman commented Jul 15, 2021

michel-laterman commented Jul 16, 2021

amolnater-qasource commented Jul 19, 2021

michel-laterman commented Jul 19, 2021

michalpristas commented Jul 20, 2021

michel-laterman commented Jul 20, 2021

michel-laterman commented Jul 20, 2021

amolnater-qasource commented Jul 21, 2021 •

edited

Loading

michel-laterman commented Jul 27, 2021

EricDavisX commented Jul 28, 2021

amolnater-qasource commented Jul 30, 2021

[Self managed]: elastic_agent.metricbeat/filebeat datastreams generated on installing fleet-server agent. #376

[Self managed]: elastic_agent.metricbeat/filebeat datastreams generated on installing fleet-server agent. #376

Comments

amolnater-qasource commented May 19, 2021

amolnater-qasource commented May 19, 2021

dikshachauhan-qasource commented May 19, 2021

EricDavisX commented May 20, 2021

EricDavisX commented May 20, 2021

amolnater-qasource commented May 24, 2021

EricDavisX commented May 26, 2021

amolnater-qasource commented May 27, 2021

EricDavisX commented May 27, 2021

amolnater-qasource commented May 28, 2021

EricDavisX commented May 29, 2021

EricDavisX commented Jun 15, 2021

amolnater-qasource commented Jun 16, 2021

EricDavisX commented Jun 16, 2021

EricDavisX commented Jun 24, 2021 • edited Loading

amolnater-qasource commented Jun 28, 2021

ph commented Jun 30, 2021

michalpristas commented Jul 6, 2021

michalpristas commented Jul 6, 2021 • edited Loading

EricDavisX commented Jul 13, 2021

amolnater-qasource commented Jul 14, 2021

EricDavisX commented Jul 14, 2021

amolnater-qasource commented Jul 15, 2021

michalpristas commented Jul 15, 2021

michel-laterman commented Jul 15, 2021

michel-laterman commented Jul 16, 2021

amolnater-qasource commented Jul 19, 2021

michel-laterman commented Jul 19, 2021

michalpristas commented Jul 20, 2021

michel-laterman commented Jul 20, 2021

michel-laterman commented Jul 20, 2021

amolnater-qasource commented Jul 21, 2021 • edited Loading

michel-laterman commented Jul 27, 2021

EricDavisX commented Jul 28, 2021

amolnater-qasource commented Jul 30, 2021

EricDavisX commented Jun 24, 2021 •

edited

Loading

michalpristas commented Jul 6, 2021 •

edited

Loading

amolnater-qasource commented Jul 21, 2021 •

edited

Loading