Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Fix that Trigger page might freeze for high cardinality detectors #230

Merged
merged 2 commits into from
Dec 24, 2020

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Dec 24, 2020

Issue #, if available:
#220

Description of changes:
We can define triggers for an AD monitor using AD preview results. The trigger page runs fine for a single-stream detector where the preview results are few. For a high-cardinality detector, the trigger page is likely freezing simply because React needs to draw too many preview results on the page. This PR fixes the issue by holding a worst-case bound on the preview results to show. Specifically, when the number of preview results exceeded the bound, we split the preview time range into small chunks and only kept the maximum anomaly grade results within each chunk. The reduction can keep important results (i.e., the anomalies) intact while speeding up the trigger page rendering.

We have also seen null pointer exceptions during trigger evaluation when the anomaly result index does not exist. The exception can arise when anomaly result indices are deleted by index rollover, and there is no new result index generated. Monitors will send out alerts for the exception. This PR fixes the issue by installing extra null checks.

Testing done:

  1. Added/modified unit tests for the above 2 fixes.
  2. Manually verified the above 2 issues are fixed.

null pointer exception before the change:

null_pointer_before

after the change:

null_pointer_after

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

We can define triggers for an AD monitor using AD preview results. The trigger page runs fine for a single-stream detector where the preview results are few. For a high-cardinality detector, the trigger page is likely freezing simply because React needs to draw too many preview results on the page. This PR fixes the issue by holding a worst-case bound on the preview results to show. Specifically, when the number of preview results exceeded the bound, we split the preview time range into small chunks and only kept the maximum anomaly grade results within each chunk. The reduction can keep important results (i.e., the anomalies) intact while speeding up the trigger page rendering.

We have also seen null pointer exceptions during trigger evaluation when the anomaly result index does not exist. The exception can arise when anomaly result indices are deleted by index rollover, and there is no new result index generated. Monitors will send out alerts for the exception. This PR fixes the issue by installing extra null checks.

Testing done:
1. Added/modified unit tests for the above 2 fixes.
2. Manually verified the above 2 issues are fixed.
@kaituo kaituo added the bug Something isn't working label Dec 24, 2020
ftianli-amzn
ftianli-amzn previously approved these changes Dec 24, 2020
Copy link

@yizheliu-amazon yizheliu-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

});

let mid = startTime + (endTime - startTime) / 2;
console.log(mid);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: you may remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Comment on lines 37 to 42
if (
!!anomalyData &&
!!anomalyData.anomalyResult &&
!!anomalyData.anomalyResult.anomalies &&
anomalyData.anomalyResult.anomalies.length > 0
) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simplify it to use get(anomalyData, 'anomalyData.anomalyResult.anomalies', []).length > 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, changed to "get(anomalyData, 'anomalyResult.anomalies', []).length > 0"

@kaituo kaituo dismissed stale reviews from yizheliu-amazon and ftianli-amzn via c9d4914 December 24, 2020 20:35
@kaituo kaituo merged commit 1666076 into opendistro-for-elasticsearch:master Dec 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants