Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Separate rule versions and revisions #136213

Open
2 tasks done
Tracked by #174166
xcrzx opened this issue Jul 12, 2022 · 19 comments
Open
2 tasks done
Tracked by #174166

[Security Solution] Separate rule versions and revisions #136213

xcrzx opened this issue Jul 12, 2022 · 19 comments
Labels
Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.8.0

Comments

@xcrzx
Copy link
Contributor

xcrzx commented Jul 12, 2022

Summary

Currently, we use the version field of the rule object for two purposes:

  • For prebuilt rules, that field corresponds to a rule version distributed by Elastic. It is controlled by Elastic and gets bumped every time a new version of the rule is released.
  • For custom rules, the version corresponds to the rule revision, i.e., it increases every time user modifies the rule.

As we plan to make prebuilt Elastic rules editable, we won't be able to bump the version field on user edit anymore as that would break rule update logic that relies on the version to not be mutable on the user's side. So the idea is to separate the two use cases above by introducing a new revision rule field.

Version vs revision

Screenshot 2022-07-18 at 11 46 20

Version - corresponds to a remote rule version Revision - reflects any local rule modifications
We bump versions only on rule upgrades; All other local modifications do not affect the version number. We bump revision on every rule modification (no matter rule upgrade or user modifications).
The version number could go down if the user rolls back the rule. The revision number could only increase.
We use versions to determine whether a rule needs to be upgraded. Users could use revisions to determine which copy of a rule is the latest (e.g., when comparing exported rules). Later, we could use revisions to retain rule history.
Exportable and importable. Exportable, but ignored during import.
Can be modified via HTTP requests. Ignored in all HTTP requests that mutate rules.

Rule edit workflows

Screenshot 2022-07-12 at 18 20 19

Other notes

  • Remote rules don't need revisions as they will be ignored during rule creation/upgrade.
  • We don't plan to expose version editing in UI for custom rules. But users could update custom rule versions via Export/Import or HTTP API.
  • Alerts generated by rules should contain both rule version and revision numbers.
  • Not all rule changes should lead to revision bump, e.g., rule activation or deactivation
  • Changes to rule exceptions/lists wouldn't bump rule revision.

Subtasks

@botelastic botelastic bot added the needs-team Issues missing a team label label Jul 12, 2022
@xcrzx xcrzx added Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team labels Jul 12, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Jul 12, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@xcrzx xcrzx added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Jul 12, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@xcrzx xcrzx self-assigned this Jul 12, 2022
@rylnd
Copy link
Contributor

rylnd commented Jul 14, 2022

This sounds great to me, thanks for the writeup and the helpful diagram! 👍

For existing users, were you imagining migrating custom rules' version over to revision?

Until such time as revisions contain diff history (and the point is mostly moot), prebuilt upgrades remain semantically ambiguous; I could see them being "revisions" just as easily as not.

@xcrzx
Copy link
Contributor Author

xcrzx commented Jul 18, 2022

For existing users, were you imagining migrating custom rules' version over to revision?

@rylnd It could be tricky to migrate existing user rules as they could be exported and saved elsewhere. So if we migrate only the local version of their rules, it will become problematic to compare them with exported rules. So, if it is possible to avoid migrations, I would do that.

We can mention in the documentation that what was version is revision now for custom rules. And that users will be free to set versions to whatever value they want using HTTP API.

The other option is to swap the proposed semantic meaning of version and revision. That way, there won't be any potentially breaking changes for our users. In that case, we'll only need to migrate our prebuilt rules to use revision instead of version.

@joshdover
Copy link
Contributor

The separation of revision and version makes sense to me, but we'll need to think about how the package upgrade logic is intended to be implemented. Today, integration packages install their SOs simply by doing a bulk import with overwrite: true set:

savedObjectsImporter.import({
overwrite: true,
readStream: createListStream(toBeSavedObjects),
createNewCopies: false,
refresh: false,
})

This isn't going to work with this change, since it will overwrite the entire object, including user edits and revision numbers. The UX designs for this feature also include a review and confirm workflow which also isn't supported by this install logic. I wonder if we need to separate the objects installed by Fleet from the rules actually used by detection engine. This would work like:

  • Fleet installs a security-rule-template in the same way it does now, overwriting any previous object
  • Security creates rules from these templates, and tracks a revision number. The rule upgrade UX would compare the templates with the configured rules to allow updating the rules from the new templates with the diff/approval workflow.

This would closely mirror how the CSP team structured their csp-rule-template objects in their package, separate from each concrete csp-rule for each benchmark policy.

@xcrzx
Copy link
Contributor Author

xcrzx commented Jul 18, 2022

I wonder if we need to separate the objects installed by Fleet from the rules actually used by detection engine.

@joshdover This separation is already implemented in Security Solution. Fleet installs rules as security-rule SO type, and then, during the upgrade process, we copy rule fields to saved objects actually used by the detection engine.

@joshdover
Copy link
Contributor

This separation is already implemented in Security Solution. Fleet installs rules as security-rule SO type, and then, during the upgrade process, we copy rule fields to saved objects actually used by the detection engine.

Great, thanks for clarifying that!

@kobelb
Copy link
Contributor

kobelb commented Jul 18, 2022

Separating the concepts of "revision" and "version" makes total sense to me and I completely agree with the differences in behavior that you've outlined @xcrzx.

However, I think we should use the name prebuilt_version, epm_version or some other more precise name as opposed to just version for the field on the alert saved-object that maintains a reference to the security-rule saved-object. This will make it explicitly clear what "version" we're referring to in this context as there are a ton of versions this field might be referring to:

  • The version of the HTTP API that created the security detection rule
  • The version of the Elasticsearch document that persists the alerting rule definition
  • The "version" of the alerting rule definition, which we're now calling "revision"
  • The version of the Stack
  • The version of the alerting rule according to an external source control system used for infrastructure as code

Using the name version alone isn't specific enough and will create confusion for our users.

@mikecote
Copy link
Contributor

However, I think we should use the name prebuilt_version, epm_version or some other more precise name as opposed to just version for the field on the alert saved-object that maintains a reference to the security-rule saved-object.

I agree, the use of prebuilt_version and epm_version or along those lines makes sense to me as well. Given the rule already tracks different "versions" already like the migraiton version and the version the API key was last modified, keeping the naming more precise will remove confusion 👍

@pgayvallet
Copy link
Contributor

The proposal makes total sense ihmo, and the workflow / scenarios examples seems fine.

However, I think we should use the name prebuilt_version, epm_version or some other more precise name as opposed to just version

++. We already have 99 things called version (or context, or client, or...). A more specialized naming seems preferable.

Just for my information, what is the current version / future revision functionally used for? Concurrency write protection, or more than just that?

@xcrzx
Copy link
Contributor Author

xcrzx commented Jul 19, 2022

However, I think we should use the name prebuilt_version, epm_version or some other more precise name as opposed to just version for the field on the alert saved-object that maintains a reference to the security-rule saved-object.

I agree. We already have many things named version, so having a more specific name seems reasonable. The problem is that the version field already exists in rule parameters, and renaming it could be a breaking change. So we need to consider how that change could affect our users and whether it can be introduced in a minor release. Probably we'll need to introduce a new version field and deprecate the current one before we can remove it altogether.

@xcrzx
Copy link
Contributor Author

xcrzx commented Jul 19, 2022

Just for my information, what is the current version / future revision functionally used for? Concurrency write protection, or more than just that?

Currently, the revision could be used to refer to a specific historical version of a rule. For example, I have an alert generated by a rule that was edited/deleted. Using revision, I could find that specific historical version of the rule when needed (if I have backups). And in the future, we're also planning to implement features like rollback to a previous revision.

@kobelb
Copy link
Contributor

kobelb commented Jul 19, 2022

I agree. We already have many things named version, so having a more specific name seems reasonable. The problem is that the version field already exists in rule parameters, and renaming it could be a breaking change. So we need to consider how that change could affect our users and whether it can be introduced in a minor release. Probably we'll need to introduce a new version field and deprecate the current one before we can remove it altogether.

I completely agree that we should be cautious about removing fields that our users rely on. However, we also need to show some caution in changing the semantics of existing fields. For non-prebuilt rules, the version field is incremented every time that the user modifies the security detection rule, so changing the situations when the version is incremented would also be a breaking change.

@xcrzx
Copy link
Contributor Author

xcrzx commented Jul 19, 2022

For non-prebuilt rules, the version field is incremented every time that the user modifies the security detection rule, so changing the situations when the version is incremented would also be a breaking change.

Yes, unfortunately, the current field usage is ambiguous. So its behavior will inevitably change either for prebuilt rules or for custom.

If we want to play very safe, we could keep the semantics of version for custom rules, as I mentioned before:

The other option is to swap the proposed semantic meaning of version and revision. That way, there won't be any potentially breaking changes for our users. In that case, we'll only need to migrate our prebuilt rules to use revision instead of version.

@xcrzx
Copy link
Contributor Author

xcrzx commented Jul 25, 2022

We have discussed renaming the version field to something more specific and decided to keep the name as is for now. In the case of renaming, we would need to write migrations for all existing rules, which is not trivial as users could have exported rules out of reach for us. So that outweighs the benefits of renaming the field. But probably, we'll rename the version field when we decide to move it to the alerting framework.

@banderror
Copy link
Contributor

banderror commented Jul 25, 2022

Just to keep it clear what our suggested implementation plan is (and @xcrzx please correct me here if I'm wrong):

Stage 1 (we're only discussing this right now):

1.1. Implement revision at the Alerting Framework level. Keep version in the Security rule parameters as is.
1.2. Migrate Security Solution rules to the new revision field. For custom rules: revision = version; version = version;. For prebuilt rules: revision = 0; version = version;.
1.3. Add revision to rules returned from the Security Solution's API endpoints.

Stage 2 (this is out of the scope and will require a separate ticket):

2.1. Move version to the Alerting Framework. Rename it to better reflect the semantics (prebuilt_version or content_version, etc).
2.2. Migrate Security Solution rules to this Framework field. Remove version from rule params. Add backwards compatibility to rule import/export and add_prepackaged_rules endpoints.

@kobelb
Copy link
Contributor

kobelb commented Jul 25, 2022

But probably, we'll rename the version field when we decide to move it to the alerting framework.

We will 100% be renaming the version field when it is moved to the alerting framework.

@banderror banderror changed the title Separate rule versions and revisions [Security Solution] Separate rule versions and revisions Jul 28, 2022
@simianhacker
Copy link
Member

This looks fine for Observability, we don't have any "out of the box" rules that we create on behalf of the customer so we are mostly unaffected by this change.

@banderror banderror assigned spong and unassigned xcrzx Dec 6, 2022
@banderror banderror added 8.7 candidate and removed Feature:Detection Rules Security Solution rules and Detection Engine labels Dec 29, 2022
spong added a commit that referenced this issue Mar 10, 2023
## Summary

Resolves #137164. 

This PR adds a new `revision` field (number) to Alerting Rules that
auto-increments when relevant content changes have been made via the
`Rules Client`. _Relevant content changes_ are defined as any content
change that the user may either want to be notified about, or have the
option to later revert to. This will include most all changes, except
the enabling/disabling of the rule, or updates to metadata fields like
`executionStatus` or `monitoring`. This `revision` field is not intended
to be user editable, and should be ignored if ever provided via the API.

See #136213 for additional
details. To be followed-up by
#137168, which will remove the
version bump logic from security solution.

## Details

### Migrations

Includes SO migration to default `revision` to `0` when upgrading a
cluster, or when importing `pre-8.8.0` rules via the SO Management UI.
For consistency, security rule import follows the same logic as SO
Management and will not reset the `revision` to `0` when overriding or
creating a new rule.

### Downstream Index Updates

* EventLog _has not_ been updated to include `revision` along with basic
rule fields currently being written. Should we?
* AAD Schema will be updated in
#151388 (as this one is getting
pretty big) to include `revision` so alerts written will include which
specific revision of the rule created the alert.

### Reference Fields

Any creation of or modification to `actions` will result in a revision
increment. More typical reference fields like `exception lists` on the
security side will only result in a revision increment when the list is
initially associated/deleted from the rule (as subsequent updates will
be done directly against the list).

### RuleClient Updates

The following methods within the RuleClient have been updated to support
incrementing revision when relevant field changes have been detected:

* `clone()` - resets to 0 currently (see open question)
* `update()` - increments `revision` so long a change has been made to
relevant fields (fields not in [ignore
list](https://github.com/elastic/kibana/pull/147398/files#diff-6736e143ede2dc06e825bddcdc23b4d088a6620805751db4eddc5900d586c4dfR69-R85))
* `bulkEdit()` - increments `revision` for relevant fields (all current
bulk edit fields minus api key/snooze/mute)

Mutation methods not updated to include revision log:
* `snooze()`
* `unsnooze()`
*  `clearExpiredSnoozes()`
*  `muteAll()`
*  `unmuteAll()`
*  `muteInstance()`
*  `unmuteInstance()`
* `updateApiKey()` - increments revision as rule functionality could be
impacted


## Open questions:
- [X] Should `clone()` in RulesClient reset revision to 0 as if it's a
new rule, or keep the current value? (see
[comment](https://github.com/elastic/kibana/pull/147398/files#r1106484105))
- [X] What about snooze/un-snooze, and mute/unmute? Should we update
revision on these field changes as well? (see
[comment](https://github.com/elastic/kibana/pull/147398/files#r1106431966))
- Discussed with @XavierM and determined to not update on
snooze/mute/API key changes as this actions could be plentiful and don't
necessarily represent a version of the rule a user would want to revert
to, thus polluting the revision history.
- [ ] Should we write `revision` to EventLog?


---

### Checklist

Delete any items that are not applicable to this PR.

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
  - [ ] To work with docs team 
- [X] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### For maintainers

- [X] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
bmorelli25 pushed a commit to bmorelli25/kibana that referenced this issue Mar 10, 2023
## Summary

Resolves elastic#137164. 

This PR adds a new `revision` field (number) to Alerting Rules that
auto-increments when relevant content changes have been made via the
`Rules Client`. _Relevant content changes_ are defined as any content
change that the user may either want to be notified about, or have the
option to later revert to. This will include most all changes, except
the enabling/disabling of the rule, or updates to metadata fields like
`executionStatus` or `monitoring`. This `revision` field is not intended
to be user editable, and should be ignored if ever provided via the API.

See elastic#136213 for additional
details. To be followed-up by
elastic#137168, which will remove the
version bump logic from security solution.

## Details

### Migrations

Includes SO migration to default `revision` to `0` when upgrading a
cluster, or when importing `pre-8.8.0` rules via the SO Management UI.
For consistency, security rule import follows the same logic as SO
Management and will not reset the `revision` to `0` when overriding or
creating a new rule.

### Downstream Index Updates

* EventLog _has not_ been updated to include `revision` along with basic
rule fields currently being written. Should we?
* AAD Schema will be updated in
elastic#151388 (as this one is getting
pretty big) to include `revision` so alerts written will include which
specific revision of the rule created the alert.

### Reference Fields

Any creation of or modification to `actions` will result in a revision
increment. More typical reference fields like `exception lists` on the
security side will only result in a revision increment when the list is
initially associated/deleted from the rule (as subsequent updates will
be done directly against the list).

### RuleClient Updates

The following methods within the RuleClient have been updated to support
incrementing revision when relevant field changes have been detected:

* `clone()` - resets to 0 currently (see open question)
* `update()` - increments `revision` so long a change has been made to
relevant fields (fields not in [ignore
list](https://github.com/elastic/kibana/pull/147398/files#diff-6736e143ede2dc06e825bddcdc23b4d088a6620805751db4eddc5900d586c4dfR69-R85))
* `bulkEdit()` - increments `revision` for relevant fields (all current
bulk edit fields minus api key/snooze/mute)

Mutation methods not updated to include revision log:
* `snooze()`
* `unsnooze()`
*  `clearExpiredSnoozes()`
*  `muteAll()`
*  `unmuteAll()`
*  `muteInstance()`
*  `unmuteInstance()`
* `updateApiKey()` - increments revision as rule functionality could be
impacted


## Open questions:
- [X] Should `clone()` in RulesClient reset revision to 0 as if it's a
new rule, or keep the current value? (see
[comment](https://github.com/elastic/kibana/pull/147398/files#r1106484105))
- [X] What about snooze/un-snooze, and mute/unmute? Should we update
revision on these field changes as well? (see
[comment](https://github.com/elastic/kibana/pull/147398/files#r1106431966))
- Discussed with @XavierM and determined to not update on
snooze/mute/API key changes as this actions could be plentiful and don't
necessarily represent a version of the rule a user would want to revert
to, thus polluting the revision history.
- [ ] Should we write `revision` to EventLog?


---

### Checklist

Delete any items that are not applicable to this PR.

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
  - [ ] To work with docs team 
- [X] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### For maintainers

- [X] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.8.0
Projects
None yet
Development

No branches or pull requests

10 participants