[metricbeat] Stack monitoring modules may ignore `xpack` configuration on error #30809

klacabane · 2022-03-14T16:45:32Z

Summary

Stack monitoring related modules have a specific configuration (xpack.enabled: true) that allows them to write events to .monitoring-{module}-* indices instead of the usual metricbeat-*.

In these modules, failing to generate metricsets in some code paths will send an event to the metricbeat-* indice regardless of their xpack.enabled configuration. This is aligned with the way metricbeat reports error in all its modules but stack monitoring modules are different beasts since they allow an override of the destination indice for the regular events. Considering that, it can be counter-intuitive ux to ship regular and errors events to different indices and could hinder discoverability of the errors from a user perspective.

7.x versions also have an inconsistent behavior in error handling where in some cases the errors will only be logged and not returned back to the metricbeat error reporting (example).

Questions:

is the metricbeat dependency future proof or should we think about a dedicated indice or only logging for error reporting ?

Next steps

Ideally we would build a mechanism that ingests all errors generated when xpack.enabled: true. The first step would be to stop reporting them to metricbeat-* and route them to the logger. As a next step we could use this mechanism to standardize and store these errors in a dedicated place (ie a new .monitoring-errors-mb datastream ?). The idea is to enable easy consumptions of these values:

when troubleshooting - a lookup to that place with for example dataset filters could provide insightful data for support or sdh
in the UI - with standardized errors Stack Monitoring can consume and surface underlying collection errors to allow customer to be aware of the issues and assuming enough context is provided, solve them

The text was updated successfully, but these errors were encountered:

matschaffer · 2022-03-14T22:27:57Z

Pretty sure I've seen metricbeat-* docs when errors occur in 8. Should be easy to confirm by monitoring kibana with an incorrect basepath setting I think.

klacabane · 2022-04-20T11:23:05Z

In 8.x the codepath that logs errors when xpack.enabled was removed and all errors are routed to metricbeat-* indice so we're already able to query that indice for relevant data (eg error.message : * and event.dataset : "elasticsearch.shard").

This makes me think that status quo is acceptable for 8.x since all errors are available and queryable, and it is an improvement over the 7.x inconsistent behavior that logs the error in some cases. The questions left are whether a dedicated index like .monitoring-errors could be easier to discover when the metricbeat error handling is well documented and known to users, and if the metricbeat-* dependency is problematic considering it is currently guaranteed to exist.

jasonrhodes · 2022-04-20T15:24:53Z

If we change the location at all, it'd be interesting to consider something like logs-monitoring.errors-default data stream or similar (if these are in fact "log" like?)...

botelastic · 2023-04-20T15:29:40Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

klacabane added v8.3.0 v7.17.0 Team:Infra Monitoring UI - DEPRECATED Infrastructure Monitoring UI team - DEPRECATED - Use Team:Monitoring labels Mar 14, 2022

klacabane mentioned this issue Apr 13, 2022

Standalone cluster showing up in Monitoring when monitoring docs with beats.state are present elastic/kibana#130029

Open

smith added bug Feature:Stack Monitoring labels Apr 14, 2022

smith mentioned this issue Apr 14, 2022

Stack Monitoring Tech Debt Plan elastic/kibana#127224

Closed

39 tasks

This was referenced Jun 7, 2022

[Stack Monitoring] Kibana should not report healthy when recent data is missing elastic/kibana#126386

Closed

[Stack Monitoring] Create a Stack Monitoring health endpoint elastic/kibana#127235

Open

klacabane mentioned this issue Aug 29, 2022

[Stack Monitoring][logstash] Possible float/long coercion issue with pipeline queue_size_in_bytes elastic/kibana#139607

Closed

botelastic bot added the Stalled label Apr 20, 2023

botelastic bot closed this as completed Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metricbeat] Stack monitoring modules may ignore `xpack` configuration on error #30809

[metricbeat] Stack monitoring modules may ignore `xpack` configuration on error #30809

klacabane commented Mar 14, 2022 •

edited

Loading

matschaffer commented Mar 14, 2022

klacabane commented Apr 20, 2022

jasonrhodes commented Apr 20, 2022

botelastic bot commented Apr 20, 2023

[metricbeat] Stack monitoring modules may ignore xpack configuration on error #30809

[metricbeat] Stack monitoring modules may ignore xpack configuration on error #30809

Comments

klacabane commented Mar 14, 2022 • edited Loading

Summary

Next steps

matschaffer commented Mar 14, 2022

klacabane commented Apr 20, 2022

jasonrhodes commented Apr 20, 2022

botelastic bot commented Apr 20, 2023

[metricbeat] Stack monitoring modules may ignore `xpack` configuration on error #30809

[metricbeat] Stack monitoring modules may ignore `xpack` configuration on error #30809

klacabane commented Mar 14, 2022 •

edited

Loading