Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] "Missing indices" log threshold alert error not descriptive #119777

Closed
psanz-estc opened this issue Nov 26, 2021 · 12 comments
Closed

[Logs UI] "Missing indices" log threshold alert error not descriptive #119777

psanz-estc opened this issue Nov 26, 2021 · 12 comments
Labels
bug Fixes for quality problems that affect the customer experience enhancement New value added to drive a business result Feature:Alerting Team:obs-ux-management Observability Management User Experience Team

Comments

@psanz-estc
Copy link

psanz-estc commented Nov 26, 2021

Kibana version:

Elasticsearch version:

Server OS version:

Browser version:

Browser OS version:

Original install method (e.g. download page, yum, from source, etc.):

Describe the bug:

Creating alerts that don't match any of the indices defined in the Logs UI settings results in an error like this:

Error: Failed to validate: in aggregations: undefined does not match expected type { groups: ({ buckets: Array<{ key: { [K in 
string]: string }, doc_count: number, filtered_results: ({ doc_count: number } & Partial<{ histogramBuckets: { buckets: Array<{ 
key: number, doc_count: number }> } }>) }> } & Partial<{ after_key: { [K in string]: string } }>) }

Steps to reproduce:

  1. Create a Log Threshold alert
  2. Change the Logs UI settings to reference non-existent indices
  3. Alert should fail and produce the error message shown above

Expected behavior:
The error should provide some information about the cause of the problem

AC:

  • This error should result in a message that says "No indices matching ${indices} could be found during the execution of this rule."
@psanz-estc psanz-estc added bug Fixes for quality problems that affect the customer experience Feature:Logs UI Logs UI feature labels Nov 26, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Nov 26, 2021
@weltenwort weltenwort added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Nov 29, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Nov 29, 2021
@weltenwort weltenwort changed the title Kibana Alert error not giving a descriptive information [Logs UI] Log threshold alert error not giving a descriptive information Nov 29, 2021
@jasonrhodes
Copy link
Member

  1. Create an alert on a random index

@psanz-estc

Question for clarification: how do you create a Log threshold alert on a random index? We don't allow you to choose the index for this rule type, I don't think? It just uses whatever indices are specified in the Logs UI settings as far as I remember, which makes reproducing this seem impossible. Can you help me understand how to reproduce?

@weltenwort
Copy link
Member

@jasonrhodes I think one would have to create the rule using valid indices and then change the source config, delete indices or change aliases so the index name pattern doesn't match any index anymore.

@weltenwort
Copy link
Member

I'm uncertain what the best way to handle this errors is. I see two options:

  1. Translate the error into a clearer message akin to "No indices matching logs-*,filebeat-* could be found during the execution of this rule.".
  2. Swallow the error and just treat the case as if we had indices but the thresholds are not crossed.

The former would cause more noise, but would make a usually undesirable configuration visible. The latter would be more tolerant of transient states in which this might be the case, at the risk of nobody noticing the rule doesn't work as intended.

@miltonhultgren
Copy link
Contributor

Is there some way in the alerts framework to do some kind of "alert once and stop" so we could notify of the config and then stop running that rule until something changes?
That way we'd notify that whatever change is happening (accidental or intentional) is causing this rule to stop, and once the situation is resolved the user can restart the rule execution?

@weltenwort
Copy link
Member

Interesting thought. But what would be the advantage for the user over continuing to retry (and possibly fail again) as we do now?

@miltonhultgren
Copy link
Contributor

I thought it would reduce the noise, and possibly it could free up some resources in the cluster. If we can determine on our side that this configuration won't work then we don't need to execute it anymore until it changes.

@weltenwort
Copy link
Member

True, but since it requires special alerting framework support we probably want to pick a solution that we can apply in the meantime.

@jasonrhodes
Copy link
Member

Doing some old issue notification clean-up 😬

I vote for wrapping this error in a clearer error message for now. If we get SDHs that reference harmless transient states that are triggering this error, at least it will be clearer to track down why it's happening, and we can use that real user info to determine if we should potentially find a way to silence that error in those cases (or always).

@jasonrhodes jasonrhodes changed the title [Logs UI] Log threshold alert error not giving a descriptive information [Logs UI] og threshold alert error not giving descriptive information Apr 19, 2022
@jasonrhodes jasonrhodes changed the title [Logs UI] og threshold alert error not giving descriptive information [Logs UI] "Missing indices" log threshold alert error not descriptive Apr 19, 2022
@pmeresanu85 pmeresanu85 added enhancement New value added to drive a business result and removed bug Fixes for quality problems that affect the customer experience labels Sep 8, 2022
@gbamparop gbamparop added the bug Fixes for quality problems that affect the customer experience label Nov 1, 2023
@gbamparop gbamparop added Team:obs-ux-logs Observability Logs User Experience Team and removed Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Nov 9, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

@botelastic botelastic bot added needs-team Issues missing a team label and removed needs-team Issues missing a team label labels Nov 9, 2023
@gbamparop gbamparop added Team:obs-ux-management Observability Management User Experience Team and removed Team:obs-ux-logs Observability Logs User Experience Team labels Feb 6, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@gbamparop gbamparop removed the Feature:Logs UI Logs UI feature label Feb 6, 2025
@jasonrhodes
Copy link
Member

We don't intend to fix this at this time. We're working on a plan to help users migrate uses of this kind of rule to use the custom threshold rule instead.

@jasonrhodes jasonrhodes closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience enhancement New value added to drive a business result Feature:Alerting Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

7 participants