Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with incorrect policy response attachment. #113430

Merged
merged 6 commits into from
Oct 4, 2021

Conversation

pjhampton
Copy link
Contributor

@pjhampton pjhampton commented Sep 29, 2021

Summary

This is a tricky PR to explain - the (sometimes) incorrect policy response has been being attached for 7.14.0 + 7.15.0 Endpoints.

Background

An EP telemetry document is built of 4 parts:

  • Top-level information. eg EP ID, EP Version, Package Version, etc
  • Metrics. eg CPU, Mem etc
  • Policy Configuration
  • Policy Response (if not 100% successful and Policy config exists)

To get all this information we have to.

1. Read EP metrics for last 24h
--> 2. Find all active Fleet agents in the last 24 hours
----> 3. Load *Agent policies*
------> 4. Find and load *EP Package Policies* from agent policies
--------> 5. Join Policy Configs with Endpoints via Elastic Agent (Step 1)
----------> 6. Load Policy Responses in last 24 hours
------------> 7. Join Policy Responses to Endpoint (if policy config exists)

The reason for this complexity is that there is no policy config reference in the EP Metrics document. To get this to work I look up all the agent policies and look up the EP package policies. The original assumption aggregated the policy document from the .ds-metrics-endpoint.policy* datastream on the policy id "terms": { "field": "Endpoint.policy.applied.id" }

GET /.ds-metrics-endpoint.policy*/_search?expand_wildcards=open,hidden
{
  "size": 0,
  "query": {
    "match": {
      "Endpoint.policy.applied.status": "failure"
    }
  },
  "aggs": {
    "policy_responses": {
      "terms": {
        "field": "Endpoint.policy.applied.id",
        "size": 10
      },
      "aggs": {
        "latest_response": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "@timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

However, this was a poor assumption. The reason for this is the policy config is applied to a fleet of endpoints, rather than a specific endpoint. The above has been changed to aggregate on the Endpoint ID "terms": { "field": "agent.id" } which conveniently is present on the policy response document.

GET /.ds-metrics-endpoint.policy*/_search?expand_wildcards=open,hidden
{
  "size": 0,
  "query": {
    "match": {
      "Endpoint.policy.applied.status": "failure"
    }
  },
  "aggs": {
    "policy_responses": {
      "terms": {
        "field": "agent.id",
        "size": 10
      },
      "aggs": {
        "latest_response": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "@timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Checklist

  • Documentation was added for features that require explanation or tutorials

For maintainers

@pjhampton pjhampton requested a review from a team as a code owner September 29, 2021 15:07
@pjhampton pjhampton marked this pull request as draft September 29, 2021 15:08
@pjhampton pjhampton self-assigned this Sep 29, 2021
@pjhampton pjhampton added auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes v7.15.1 v7.16.0 v8.0.0 bug Fixes for quality problems that affect the customer experience Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. labels Sep 29, 2021
@pjhampton pjhampton marked this pull request as ready for review September 29, 2021 15:32
@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@donaherc donaherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great thanks. The context writeup on this issue is superlative as well, great for context about how this particularly complicated telemetry document is assembled.

@pjhampton pjhampton enabled auto-merge (squash) October 4, 2021 16:37
@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💛 Build succeeded, but was flaky


Test Failures

Kibana Pipeline / general / Allows the rule to be duplicated from the edit screen.indicator match Detection rules, Indicator Match Duplicates the indicator rule Allows the rule to be duplicated from the edit screen

Link to Jenkins

Stack Trace

Failed Tests Reporter:
  - Test has not failed recently on tracked branches

AssertionError: Timed out retrying after 60000ms: expected '<button.euiContextMenuItem>' to be 'visible'

This element `<button.euiContextMenuItem>` is not visible because its parent `<div>` has CSS property: `visibility: hidden`
    at Context.eval (http://localhost:6121/__cypress/tests?p=cypress/integration/detection_rules/indicator_match_rule.spec.ts:31238:58)

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @pjhampton

@pjhampton pjhampton merged commit a565fa0 into master Oct 4, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Oct 4, 2021
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Oct 4, 2021
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x
7.15

The backport PRs will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Oct 4, 2021
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Pete Hampton <pjhampton@users.noreply.github.com>
kibanamachine added a commit that referenced this pull request Oct 5, 2021
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Pete Hampton <pjhampton@users.noreply.github.com>
@pjhampton pjhampton deleted the pjhampton/bug-policy-responses branch October 5, 2021 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed bug Fixes for quality problems that affect the customer experience release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.15.1 v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants