Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAM] Add alert summary API #146709

Merged
merged 19 commits into from
Dec 16, 2022
Merged

[RAM] Add alert summary API #146709

merged 19 commits into from
Dec 16, 2022

Conversation

XavierM
Copy link
Contributor

@XavierM XavierM commented Nov 30, 2022

Summary

Resolve: #141487

Checklist

@XavierM XavierM added release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.7.0 labels Nov 30, 2022
@XavierM XavierM marked this pull request as ready for review November 30, 2022 15:48
@XavierM XavierM requested review from a team as code owners November 30, 2022 15:48
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@simianhacker
Copy link
Member

When I include featureIDs I get an empty response:

curl --location --request POST 'http://localhost:5601/internal/rac/alerts/_alert_summary' \
--header 'Content-Type: application/json' \
--header 'kbn-xsrf: true' \
--header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \
--data-raw '{
    "fixed_interval": "1m",
    "gte": "2022-11-30T16:38:22.710Z",
    "lte": "2022-11-30T17:38:22.710Z",
    "featureIds": [
        "infrastructure"
    ]
}'

But when I omit the featureIDs I get the response I'm expecting

curl --location --request POST 'http://localhost:5601/internal/rac/alerts/_alert_summary' \
--header 'Content-Type: application/json' \
--header 'kbn-xsrf: true' \
--header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \
--data-raw '{
    "fixed_interval": "1m",
    "gte": "2022-11-30T16:38:22.710Z",
    "lte": "2022-11-30T17:38:22.710Z"
}'

I did a little digging and it appears the index it's trying to use with the infrastructure feature ID is .alerts-observability.metrics.alerts but the index actual index name (per Dev Console) is .alerts-observability.metrics.alerts-default.

@simianhacker
Copy link
Member

I think the way we are querying the "recovered alerts" is wrong. When an alert recovers the recovery should only show up in the bucket it recovered based on kibana.alert.end instead of using the kibana.alert.time_range. Currently, we are drawing the recovered alerts from when the alert started:

image

I think we want something like this:

image

Where 200 alerts were active from 2:12 pm to 2:16 pm and then at 2:16 pm all 200 alerts recovered. Here is how I modified the recovered alerts query for the second chart:

POST .alerts-*/_search
{
  "aggs": {
    "recovered_alerts_bucket": {
      "date_histogram": {
        "field": "kibana.alert.end",
        "fixed_interval": "1m",
        "extended_bounds": {
          "min": "2022-11-30T20:46:52.631Z",
          "max": "2022-11-30T21:46:52.631Z"
        }
      }
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "kibana.alert.end": {
              "gt": "2022-11-30T20:46:52.631Z",
              "lt": "2022-11-30T21:46:52.631Z"
            }
          }
        },
        {
          "term": {
            "kibana.alert.status": "recovered"
          }
        }
      ]
    }
  },
  "size": 0
}

@vinaychandrasekhar @maciejforcone Am I interpreting the design correctly?

@maciejforcone
Copy link

maciejforcone commented Dec 1, 2022

I also think that second makes more sense. User needs to see the duration of active alerts (so it should be a timerange to use for further investigation), where for recovered we should use just a moment of recovery (so it's a timestamp, when all went back to normal). And for the chart type, are we moving from line to area? I wanted us to use the same visual pattern that we have in APM, so line = recent timerange, area = previous timerange, for the comparison use case.

@XavierM
Copy link
Contributor Author

XavierM commented Dec 2, 2022

@simianhacker , we do not need the hard_bounds for the recovered alerts, correct? Look what I did and let me know if i need to change it. I also fixed your first observation :)

Copy link
Contributor

@watson watson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kibana Platform Security changes LGTM 👍

Copy link
Member

@maryam-saeidi maryam-saeidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was checking the result, the key_as_string was not the same for active and recovered:

image

Update: After a discussion with @XavierM, I learned that this format is returned from the related query and we can use key in our charts instead. So this difference will be kept as it is.

),
},
options: {
tags: ['access:rac'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this option do?

Copy link
Contributor Author

@XavierM XavierM Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for kibana security, I think we created a special rac access.

const { gte, lte, featureIds, filter, fixed_interval: fixedInterval, index } = request.body;
if (
!(
moment(gte, 'YYYY-MM-DDTHH:mm:ss.SSSZ', true).isValid() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use DateFromString instead to validate it at the body validation level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only imagined that will use this UTC format and not just a date. However, if you feel that we should allow just date without time, I can change it back to moment.ISO_8601

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is fine, I also used the same time format in my PR for the related component.

Copy link
Member

@maryam-saeidi maryam-saeidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to check the graphs as was shown by @simianhacker, so I'll wait for his approval from our team.

@maryam-saeidi maryam-saeidi dismissed their stale review December 12, 2022 16:24

I wasn't able to check the graphs as was shown by @simianhacker, so I'll wait for his approval from our team.

Copy link
Member

@simianhacker simianhacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

image

throw Boom.badRequest('gte and/or lte are not following the UTC format');
}

if (fixedInterval && fixedInterval?.match(/^\d{1,2}['m','h','d','w']$/) == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate this with a custom validator on the body params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find an easy way to do it with io-ts, so I will keep it this way for now.

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XavierM XavierM enabled auto-merge (squash) December 16, 2022 15:46
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #6 / endpoint endpoint list when there is data, when the hostname is clicked on, for the kql filtering for united.endpoint.host.hostname, table shows 1 item
  • [job] [logs] FTR Configs #6 / endpoint endpoint list when there is data, when the hostname is clicked on, for the kql query: na, table shows an empty list

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
ruleRegistry 211 213 +2
Unknown metric groups

API count

id before after diff
ruleRegistry 239 241 +2

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 61 67 +6
osquery 109 115 +6
securitySolution 440 446 +6
total +20

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 70 76 +6
osquery 110 117 +7
securitySolution 517 523 +6
total +21

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@XavierM XavierM merged commit 3917bf5 into elastic:main Dec 16, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RAM] Create API to draw timelines of alerts on charts
10 participants