Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Implement telemetry for the Protections/Detections Coverage Overview functionality #158250

Open
7 of 8 tasks
maximpn opened this issue May 23, 2023 · 8 comments
Assignees
Labels
Feature:Rule Management Security Solution Detection Rule Management area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@maximpn
Copy link
Contributor

maximpn commented May 23, 2023

Epic: https://github.com/elastic/security-team/issues/2905 (internal)
Depends on: #158243, #158238, #158249

Summary

Implement telemetry for the implemented Protections/Detections Coverage Overview dashboard so it's able to answer the following questions

Feature adoption:

  • How many and which users use coverage overview - coverage page visits

Coverage page usage - via Fullstory

  • Filters usage
  • Cell expand/collapse buttons usage
  • Search bar usage
  • Techniques cells clicks
  • Enable all rules button usage
  • % of users enabling rules after visiting Coverage page
  • retention metrics
@maximpn maximpn added Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team 8.9 candidate labels May 23, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@maximpn maximpn added the Feature:Rule Management Security Solution Detection Rule Management area label May 23, 2023
@banderror
Copy link
Contributor

@maximpn @dplumlee When we start working on this, we should ask @approksiu to connect us with folks from Threat Hunting who have experience in working with telemetry.

@banderror
Copy link
Contributor

@approksiu We'll need to chat about the telemetry requirements after On Week. I'll schedule a meeting. Before the meeting, @dplumlee and I will post some thoughts and concerns in this ticket.

@approksiu
Copy link

Great, thanks @banderror

@dplumlee
Copy link
Contributor

In advance of the meeting about telemetry requirements, posting my initial thoughts:

What rules did users view and enable via coverage overview?

I'm not sure how great the telemetry data would be for this case. The cardinality would be incredibly high, even if we limited the collected data to our own prebuilt rules package (900+ rules). A possible alternative that might be helpful would be for what tactics and techniques are rules enabled for, but even then, there's a very high number of techniques so the data cardinality wouldn't be that small either. Perhaps it would be good to discuss further what we're trying to get out of the telemetry data in order to choose the best methods from which to obtain the data.

What filters are being used?

Since we have the initial filters of Enabled, Elastic Rules, and Custom Rules, do we want to add some logic to filter this initial state out? Or just deal with that state having way more occurrences than others?

cc @banderror @approksiu

@banderror banderror self-assigned this Sep 13, 2023
@banderror
Copy link
Contributor

banderror commented Sep 17, 2023

@approksiu I agree with @dplumlee's comment above. From my side:

  • Feature adoption: how many and which users use coverage overview - coverage page visits

Any page visit counter should be available in FullStory, right?

What about "how many and which users"? What do we mean by "user" in this case: a Kibana user (e.g. email), a customer, a deployment id, or a combination of these?

  • User visits to MITRE site via tactic/technique links on the coverage page

Tracking the number of clicks on anything should be doable in FullStory, if we can unambiguously locate the element being clicked. In this case, we'd need to assign an id or a classname to the link.

  • What techniques did users view using coverage page?

This should be doable via "event-based telemetry" aka EBT.

Can you please define:

  • "techniques": Do you want to track technique ids or names? What about tactics?
  • "view": When do you want to track this?
  • Number of rules enabled via coverage overview

This should be doable via basic telemetry counters (one of the existing telemetry mechanisms) or EBT 👍

Do you want to track how many rules were attempted to be enabled, or how many were actually enabled (attempted - failed - skipped)?

  • What filters are being used?

This should be doable via EBT. But let's define

  • "filters": In the telemetry event we could create, would you want a separate field for each filter, where the field's value would be an array of selected filter options? Or a diff between the next and the previously selected options?
  • "used": Do you want to track every single change of the filters? A user, when selecting filters, will normally do many clicks, would you want all of them to be tracked, some of them, or only the "final" state of the filters?
  • What rules did users view and enable via coverage overview?
  • ?Search bar usage - what terms did users search for?

As Davis already noted above, tracking such things would result in very high cardinality datasets. Can you explain how exactly would you like this data to be represented, and how you'd use this data?

Also, tracking all names/ids of enabled rules doesn't scale well. If a user creates 10000 rules mapped to a single technique, we'd need to send either a single telemetry event with 10000 rule names in it (huge object), or 10000 telemetry events with a single rule name in each. We would need to add limitations to such tracking, e.g. limit the number of rules, and the length of rule names. Because of complexity and performance/scaling considerations, I'd suggest we track something "finite" and low cardinality instead of rule names or ids, such as what rule types are enabled, are these prebuilt or custom rules, from what technique the user enables rules, etc.

@approksiu
Copy link

Having explored this topic more, and after a discussion, this is how we are going to approach it:

  1. Collect information about daily coverage page visits with product telemetry (security solution telemetry) per cluster. This data we will be able to further aggregate and analyse with other signals.
  2. Telemetry regarding page usage will be collected with FS.
  3. As the feature matures we will revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Rule Management Security Solution Detection Rule Management area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

No branches or pull requests

5 participants