Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Defend Workflows][8.11] Unblock fleet setup when cannot decrypt uninstall tokens #171998

Conversation

gergoabraham
Copy link
Contributor

@gergoabraham gergoabraham commented Nov 27, 2023

Summary

This PR should help in case of an improper encryption key rotation. Prior to this PR, the user was blocked on Fleet setup:
image

This PR does the following:

  • don't decrypt tokens when trying to generate missing tokens on Fleet setup -> hence no error is thrown
  • but, checks if existing tokens can be decrypted, and returns with a non-fatal error during Fleet setup if not
    image
image image
  • prohibits user to enable tamper protection when the given policy's uninstall token cannot be decrypted
    image

  • no change: still throws error when trying to decrypt
    image

image

Reproducing the encryption key issue

  • setup a Kibana, add at least one policy (probably Fleet policy is enough)
  • modify (or add a new) encryption key in your kibana.dev.yml:
xpack.encryptedSavedObjects.encryptionKey: "some-random-encryption-key-min-32-bytes"

Checklist

Delete any items that are not applicable to this PR.

@gergoabraham gergoabraham added release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting v8.11.2 labels Nov 27, 2023
@gergoabraham gergoabraham self-assigned this Nov 27, 2023
@gergoabraham gergoabraham marked this pull request as ready for review November 27, 2023 16:35
Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still try to add logic here that checks for the .error property set on saved objects returned from createPointInTimeFinderDecryptedAsInternalUser.

The above method will "strip" encrypted attributes (which is how we get into a state with token = undefined in the first place, generally) here:

} catch (error) {
// catch error and enrich SO with it, return stripped attributes. Then consumer of API can decide either proceed
// with only unsecured properties or stop when error happens
const { attributes: strippedAttrs } = await service.stripOrDecryptAttributes(
descriptor,
savedObject.attributes as Record<string, unknown>
);
return { ...savedObject, attributes: strippedAttrs as T, error };
}
.

We need to surface this error property that's appended to the ESO object somewhere in Fleet setup.

it('rejects if any of the tokens is missing', async () => {
await expect(
uninstallTokenService.checkTokenValidityForAllPolicies()
).rejects.toThrowError('Uninstall Token is missing the token.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try to improve this error message? It's not particularly helpful to users in its current state. We could provide something like:

Invalid uninstall token for policy ID ${policyId}. Saved object is missing the "token" attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 27, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member

We need to surface this error property that's appended to the ESO object somewhere in Fleet setup.

We should be able to get into this state by encrypting values once -> changing the encryption key in kibana.dev.yml -> observe errors in Fleet setup following. Errors would be expected, and we probably want to do the following in that case:

  1. Disable agent tamper protection
  2. Show the decryption errors to the user

@gergoabraham
Copy link
Contributor Author

We need to surface this error property that's appended to the ESO object somewhere in Fleet setup.

We should be able to get into this state by encrypting values once -> changing the encryption key in kibana.dev.yml -> observe errors in Fleet setup following. Errors would be expected, and we probably want to do the following in that case:

  1. Disable agent tamper protection
  2. Show the decryption errors to the user

I'm not sure we can disable agent tamper protection at this point, if encryption key and therefore message signing key is changed, so maybe all we can do is inform the user about the issue, and unblock accessing Fleet pages. I think we should continue investigation, but I'd propose to add this quick fix to tomorrow's 8.11.2 FF. @kpollich, what do you think about this approach?

@kpollich
Copy link
Member

what do you think about this approach?

SGTM in interest of getting this in for feature freeze 👍

@joeypoon
Copy link
Member

joeypoon commented Nov 27, 2023

We should still try to add logic here that checks for the .error property set on saved objects returned from createPointInTimeFinderDecryptedAsInternalUser.

The above method will "strip" encrypted attributes (which is how we get into a state with token = undefined in the first place, generally) here:

} catch (error) {
// catch error and enrich SO with it, return stripped attributes. Then consumer of API can decide either proceed
// with only unsecured properties or stop when error happens
const { attributes: strippedAttrs } = await service.stripOrDecryptAttributes(
descriptor,
savedObject.attributes as Record<string, unknown>
);
return { ...savedObject, attributes: strippedAttrs as T, error };
}

.
We need to surface this error property that's appended to the ESO object somewhere in Fleet setup.

Agree with this one. Right now we're ignoring the fact that ESO is returning an error (in the body) so we'll need to add a check for this in getDecryptedTokens. In the event that ESO returns an error, I think the expected behavior from UninstallTokenService should be to throw but from fleet setup we catch and log the error without blocking the rest of fleet setup (at least for now, since defend is the only integration using the feature right now).

@kpollich
Copy link
Member

Unblocking Fleet setup with explicit errors messages available in logs is the way to go here. Even if we don't have any real logic, if we can detect the error from the ESO plugin and surface it in Fleet's setup logs that will be good enough here.

@gergoabraham
Copy link
Contributor Author

@joeypoon @kpollich
34500d7
added a modification to surface the error returning from the encrypted SO client, now we got this error message:
image

👉 please approve the PR if it looks good, approval is needed for the merge

👉 @joeypoon I'd appreciate if you could merge the PR if it looks good, so it can make it before FF - I'm signing off for today

Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM - thanks for addressing the feedback! 🚀

Copy link
Member

@joeypoon joeypoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for handling this one. Left a comment but I think it can be done as a follow up. Will watch and merge.

({ id: _id, attributes, created_at: createdAt }) => {
({ id: _id, attributes, created_at: createdAt, error }) => {
if (error) {
throw new UninstallTokenError(`Error when reading Uninstall Token: ${error.message}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be safer to not have error.message within the error here since we don't own what's in the error and it may potentially contain sensitive data. I don't think it's blocking since from what I can tell looking at ESO code, they never expose the sensitive data in the error messages. But given that this is something that can change at anytime and ESO already logs these errors on their end, I think we should update this in a follow up to not contain error.message.

@joeypoon joeypoon enabled auto-merge (squash) November 27, 2023 19:59
@kibana-ci
Copy link
Collaborator

kibana-ci commented Nov 27, 2023

💔 Build Failed

Failed CI Steps

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @gergoabraham

@joeypoon joeypoon merged commit 75cdadf into elastic:8.11 Nov 27, 2023
@gergoabraham gergoabraham deleted the unblock-fleet-setup-when-cannot-decrypt-uninstall-tokens branch November 28, 2023 12:21
@gergoabraham gergoabraham restored the unblock-fleet-setup-when-cannot-decrypt-uninstall-tokens branch November 28, 2023 12:21
vitaliidm pushed a commit to vitaliidm/kibana that referenced this pull request Nov 28, 2023
… uninstall tokens (elastic#172058)

## Summary

This PR is the `8.12` port of:
- elastic#171998

The original PR was opened to `8.11` to make it faster to include it in
`8.12.2`. Now this PR is meant to port the changes to `main`, so:
- we can build upon it,
- and can easily backport any further changes to `8.11.x`

> [!Important]
> The changes cannot be tested on `main` because they are hidden by
other behaviours (namely the retry logic for reading Message SIgning
key) that weren't part of `8.11`. Those behaviours will be also adapted
in follow up PRs.
gergoabraham added a commit that referenced this pull request Dec 1, 2023
…eet setup (#172072)

## Summary

When encryption key is rotated improperly, Message Signing key retrieval
is retried infinitely, and because this is done in Fleet setup, none of
the Fleet pages are loaded when the user tries to visit any of them.

This PR re-configures the retry logic so it now attempts significantly
less times than infinite.
Also, it changes the errors to non-fatal from the Fleet setup point of
view, similarly to these PRs:
- #171998
- #172058

## Reproducing the encryption key issue
- setup a Kibana, add at least one policy (probably Fleet policy is
enough)
- modify (or add a new) encryption key in your `kibana.dev.yml`:
```yml
xpack.encryptedSavedObjects.encryptionKey: "some-random-encryption-key-min-32-bytes"
```

## Screenshots

After ~15 sec of loading spinner, this is what the user sees:

![image](https://github.com/elastic/kibana/assets/39014407/2a29d0d9-4975-46b5-b662-bfbb6e888b0f)



### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@gergoabraham gergoabraham deleted the unblock-fleet-setup-when-cannot-decrypt-uninstall-tokens branch December 5, 2023 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.11.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants