-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restoring snoozeEndTime to AlertAttributesExcludedFromAAD #135663
Restoring snoozeEndTime to AlertAttributesExcludedFromAAD #135663
Conversation
Co-authored-by: Gidi Meir Morris <github@gidi.io>
Pinging @elastic/response-ops (Team:ResponseOps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
…5663) * Restoring snoozeEndTime to AlertAttributesExcludedFromAAD * Apply suggestions from code review Co-authored-by: Gidi Meir Morris <github@gidi.io> * Temp fix migration * New way to do migrations * Add two scenarios * Skip functional test * Revert some archive changes Co-authored-by: Ying Mao <ying.mao@elastic.co> Co-authored-by: Gidi Meir Morris <github@gidi.io> (cherry picked from commit 2d29934) # Conflicts: # x-pack/plugins/alerting/server/task_runner/task_runner.ts
@elasticmachine merge upstream |
ignoring request to update branch, pull request is closed |
) (#135714) * Restoring snoozeEndTime to AlertAttributesExcludedFromAAD (#135663) * Restoring snoozeEndTime to AlertAttributesExcludedFromAAD * Apply suggestions from code review Co-authored-by: Gidi Meir Morris <github@gidi.io> * Temp fix migration * New way to do migrations * Add two scenarios * Skip functional test * Revert some archive changes Co-authored-by: Ying Mao <ying.mao@elastic.co> Co-authored-by: Gidi Meir Morris <github@gidi.io> (cherry picked from commit 2d29934) # Conflicts: # x-pack/plugins/alerting/server/task_runner/task_runner.ts * Fix backport Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Re-submit of #135602.
Resolves #135740
Summary
This PR fixes a bug introduced in
8.3.0
which causes any rule created or updated by an8.2.x
cluster to fail on migration on upgrade to8.3.0
or8.3.1
.The PR also fixes an issue where Kibana could crash while running snoozed rules. This was caused by the task runner updating the rule twice after a run when snoozed, one of the updates is partial while the other takes OCC into consideration. If the OCC failed, Kibana would crash due to missing
await
.The root cause of the bug is that the
snoozeEndTime
field which was added in8.2.0
and subsequently removed in8.3.0
was omitted from theAlertAttributesExcludedFromAAD
list, which meant they could no longer be decrypted.As a result, any
rule
Saved Object which was created or updated in8.2.x
would try to use thesnoozeEndTime
field to decrypt the rule as part of the8.3.0
migration - which consistently fails as the SO uses the field (even if empty) as part of AAD.When this migration fails for such a rule the following error message is visible in the server log:
Once Kibana runs it might still try to run these failed rules, which should result in an error such as this:
Remediation for customers who have already upgraded to 8.3.0 or 8.3.1
If a customer upgrades to 8.3.0 or 8.3.1 and their alerting rules have stopped running with an error similar to the example above, they will need to use the Update API key operation in
Stack Management > Rules and Connectors
to restore the alerting rule to a functional state. It should be noted that the Update API Key operation causes the alerting rule to run as the user that performed the operation.Checklist
Delete any items that are not applicable to this PR.
Risk Matrix
Delete this section if it is not applicable to this PR.
Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.
When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:
For maintainers
To verify
You will notice the tests are skipped for the time being. They are currently flaky until further enhancements are made to make them part of CI. In the meantime, it is best to test manually, reproduce the issue with main and then confirm this branch fixes the issue.
Scenario 1
Scenario 2
Testing scenarios
Release Note
Fixes bug where alerting rules that were created or edited in 8.2 will stop running on upgrade to 8.3.0 or 8.3.1. Users upgrading directly to 8.3.2 will not experience this bug. For additional details and remediation steps, see https://www.elastic.co/guide/en/kibana/current/release-notes-8.3.0.html#known-issues-8.3.0