-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Endpoint] Policy creation callback fixes + Improved error handling in user manifest loop #71269
Conversation
Pinging @elastic/endpoint-app-team (Feature:Endpoint) |
Pinging @elastic/endpoint-response (Team:Endpoint Response) |
try { | ||
if (snapshot && snapshot.diffs.length > 0) { | ||
// create new artifacts | ||
const errors = await manifestManager.syncArtifacts(snapshot, 'add'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does syncArtifacts relates getPackageConfigCreateCallback. That appears to update something, perhaps a side effect?
Sorry I see the method is handlePackageConfigCreate. There is a few condition blocks that may need to be tested to make sure all branches functions as expected. Is it possible to break this up so it may be more testable and probably maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnamdifrankie If new exception list entries were created, we may need to create new artifacts before sending the updated manifest. That's what syncArtifacts
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XavierM and I have been talking about how to improve the way this callback happens... I think we will need some changes to make this more robust. Happy to discuss it @nnamdifrankie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a test around what branches are supposed to exist together will suffice for now. Not sure what is the relationship between the snapshot and new package config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnamdifrankie Just pushed some of the tests, working on more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnamdifrankie @paul-tavares I've added several unit tests and have run through some manual tests. I think this fix will ensure that policy creation is not interrupted by any of the manifest processing code.
Pinging @elastic/ingest-management (Team:Ingest Management) |
@elasticmachine merge upstream |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
All of my comments are minor in nature and optional. Also, I assume you (during Dev.) added a few throw
's in the code for testing purposes and that he callback always returned successfully 😁
My only concern overall is that we seem to be doing lots of sync call (await
) and that can extend overall time it takes for a Package config to be created in Ingest (as it is, it already take a little bit of time due to the work it does with Packages).
FYI:
One of the things I thought about implementing in the Ingest code around these callbacks for Create Package Config was a way to "timeout" the returned Promise after x
amount of time. so that no one callback can hold up the entire operation. (cc/ @jfsiii , @jen-huang , @nchaulet )
version: '0.9.0', | ||
}, | ||
inputs: [], | ||
} as NewPackageConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: add the return type to the function definition instead of casting. Same below
return updatedPackageConfig; | ||
// Until we get the Default Policy Configuration in the Endpoint package, | ||
// we will add it here manually at creation time. | ||
if (newPackageConfig.inputs.length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally - I would remove this if()
and always insert our input into the Package Config.
History:
The if()
is only here because at one point we thought that we would get data included in the Package Config at creation time by defining it in the Endpoint Package, thus why we were doing a conditional check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paul-tavares Will inputs ever have length > 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madirey - For endpoint Package Configs, no. At the moment we only expect 1 input at most that includes all of our data.
let snapshot: ManifestSnapshot | null = null; | ||
let success = true; | ||
try { | ||
// Try to get most up-to-date manifest data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is what's returned here different from what you retrieved above in manifestManager.getLastDispatchedManifest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paul-tavares Possibly. Perhaps could look at renaming this function. The above builds a Manifest from the SO record of the last manifest dispatched. This function builds a manifest dynamically from the latest exception lists... there could be new items, so we try to get the most up-to-date data possible to avoid having to immediately re-dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madirey thanks for explaining that. I'm not too familiar with the design around Lists, thus the questions 😃 .
I think there is a process in place that periodically (on a timer) goes through and picks up new versions of lists and re-generates the Manifest (which pushes updates to the Package configs), right? If thats the case, I would suggest that we avoid the additional delays here in explicitly triggering that process, so that the overall Package Config creation is not incur a longer delay.
|
||
try { | ||
return updatedPackageConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be just be a personal choice 😄
I find this return
here as well as the one inside of catch()
confusing because one (a future "us" ) add another return to finally {}
and that would be the value that is returned out of the function.
I would propose (for clarify) that the return
be consolidated to one under the finally {}
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paul-tavares It was an attempt to be sure that we return the data before doing other things... but we still don't have a good way to verify that the policy was actually created. We probably need to work together to figure out a better way to do this. Can we address in a follow-up to this PR? There are some important fixes here that will help us in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XavierM ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed @madirey
My point really was that the finally {}
block could actually change what the function returns (ex. it could overwrite the return
from either the catch()
or the regular try {}
.
Also - I'm tempted to suggest that if we refactor this, that we split up the concerns into 2 callbacks: one for handling only adding the policy data and a second to handle only adding the manifest information, an we register both callbacks with Ingest. We can work on this post 7.9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paul-tavares As written, that should not happen, but I agree with the concerns. It's ugly.
I can remove the section that pulls the snapshot and tries to get the most up-to-date information. That comes with a few caveats though:
- There's no guarantee that we will have already created the manifest when the callback runs, so we may have to send an empty artifact list... the policy returned will not be deterministic on the first policy creation (artifacts may or may not exist), so that will have unit testing implications as well,
- If there have been new exception items added within the last minute, we won't pick it up.
In both of the cases above, we'll have to re-send the policy a minute later when the updates are detected. This is probably fine for now, but it may not be ideal when we have large deployments (could result in 2 back-to-back large rollouts if I'm not mistaken).
@@ -6,6 +6,8 @@ | |||
|
|||
import { ILegacyScopedClusterClient, SavedObjectsClientContract } from 'kibana/server'; | |||
import { loggingSystemMock, savedObjectsServiceMock } from 'src/core/server/mocks'; | |||
// eslint-disable-next-line @kbn/eslint/no-restricted-paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to ask the Kibana Platform team to include this in src/core/server/mocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paul-tavares I see you approved, thank you! But let's follow up and figure out a better way to do this. +++
…ed error handling in user manifest loop (#71269) (#71376) * Clean up matcher types * Rework promise and error-handling in ManifestManager * Write tests for ingest callback and ensure policy is returned when errors occur * More tests for ingest callback * Update tests * Fix tests Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* master: (78 commits) Bump lodash package version (elastic#71392) refactor: 💡 use allow-list in AppArch codebase (elastic#71400) improve bugfix 7198 test stability (elastic#71250) [Security Solution][Ingest Manager][Endpoint] Optional ingest manager (elastic#71198) [Metrics UI] Round metric threshold time buckets to nearest unit (elastic#71172) [Security Solution][Endpoint] Policy creation callback fixes + Improved error handling in user manifest loop (elastic#71269) [Security Solution] Allow to configure Event Renderers settings (elastic#69693) Fix a11y keyboard overlay (elastic#71214) [APM] UI text updates (elastic#71333) [Logs UI] Limit `extendDatemath` to valid ranges (elastic#71113) [SIEM] fix tooltip of notes (elastic#71342) address index templates feedback (elastic#71353) Upgrade EUI to v26.3.1 (elastic#70243) [build] Creates Linux aarch64 archive (elastic#69165) [SIEM][Detection Engine] Fixes skipped tests (elastic#71347) [SIEM][Detection Engine][Lists] Adds read_privileges route for lists and list items [kbn/optimizer] implement "requiredBundles" property of KP plugins (elastic#70911) [Security Solution][Exceptions] - Exceptions modal pt 2 (elastic#70886) [ML] DF Analytics: stop status polling when job stopped (elastic#71159) [SIEM][CASE] IBM Resilient Connector (elastic#66385) ...
Summary
Checklist
For maintainers