Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the maximum number of saved objects that could be installed with a Fleet package #148441

Merged
merged 1 commit into from
Jan 13, 2023

Conversation

xcrzx
Copy link
Contributor

@xcrzx xcrzx commented Jan 5, 2023

Resolves: #148175

Summary

If we try to install a package that contains more than 10000 saved objects, the installation fails with the following error:

Error installing security_detection_engine 8.4.1: Can't import more than 10000 objects

The max number of objects to import is controlled by the savedObjects.maxImportExportSize config option in kibana.yml. However, we cannot control that option when installing packages, so we must find out how to overcome this limitation.

Implemented solution

I've created a setting for the saved object importer that allows you to override the default limit of 10,000 saved objects per import. This setting can be used when initializing the importer if it is determined that the limit can be increased safely.

Usage example:

const savedObjectsImporter = appContextService
  .getSavedObjects()
  .createImporter(savedObjectsClient, { importSizeLimit: 15_000 });

I tested various limit values using the security_detection_engine package and found that a safe limit for our use case is around 30,000 saved objects. To test that, I limited Kibana's available memory to 1 GB using the NODE_OPTIONS=--max-old-space-size=1024 flag and tried importing packages of different sizes to determine the upper limit. I found that packages with 35,000 to 40,000 saved objects can cause Kibana to crash due to an out-of-memory error. However, this should be more than sufficient for the needs of the security solution, as we will likely never have more than 30,000 saved objects in a package.

Alternative solutions

It may be possible to divide the installation of a package into smaller chunks, but this approach may only work for certain types of saved objects. One of the steps in the import process is reference validation, which requires all of the saved objects from a package to be loaded into memory at once. This may be fine for the security_detection_engine package, as it does not contain any references. However, creating separate import logic based on the contents of a package could result in additional maintenance costs on the Fleet side.

An alternative solution would be to use the maxImportPayloadBytes setting instead of maxImportExportSize for the saved objects importer. This is because the size of saved objects in a package can vary significantly, and a package with a small number of large objects could potentially create more memory pressure than a package with a large number of small objects. Therefore, using the payload size as a reference point may be a more stable approach.

I welcome any thoughts or suggestions from the @elastic/kibana-core and @elastic/fleet teams regarding this issue and potential solutions.

@xcrzx xcrzx added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Fleet Team label for Observability Data Collection Fleet team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area v8.7.0 labels Jan 5, 2023
@xcrzx xcrzx self-assigned this Jan 5, 2023
@xcrzx xcrzx marked this pull request as ready for review January 5, 2023 16:32
@xcrzx xcrzx requested review from a team as code owners January 5, 2023 16:32
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@xcrzx xcrzx added the release_note:skip Skip the PR/issue when compiling release notes label Jan 5, 2023
Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleet change LGTM

@xcrzx xcrzx force-pushed the fleet-objects-limit branch from c317562 to 8544b40 Compare January 9, 2023 16:43
@rudolf
Copy link
Contributor

rudolf commented Jan 11, 2023

Thanks for taking a look at this.

What's our "heap budget"?

Ignoring how we implement the limit for now... To decide on a safe limit we need to know two things, how much heap is available to Kibana and how much heap the import consumes. The available heap differs a lot between dev and a Kibana build. On my laptop an idle Kibana running in dev mode from the main branch currently uses ~620MB. An 8.6.0 build uses about 256MB just after startup.

Because Nodejs consumes more memory than what's available for kibana's javascript code, a 1GB cloud instance gets an --max-old-space-size=800. So I'm assuming a 1GB cloud instance would have ~544MB free memory but we should be able to double check that by looking at the output of api/stats on a cloud instance and checking the "used_bytes" key under processes.memory.heap.

Assuming you used a dev Kibana instance to test I'm guessing the import of 30k saved objects was getting close to consuming ~404MB.

A production Kibana with some load would have less available heap and a higher risk of OOM. It's a bit arbitrary where we draw the line but it feels like if there's ~500MB free heap for an idle instance then consuming more than 256MB heap might start getting dangerous (we might be able to use metrics from our cloud clusters to make a bit more of an educated decision).

Based on your projections:
Current rules: 6000
New rules per release: 600
Releases per year: 6
Number of rule SO's after 2 years = 6000 + 600 * 6 * 2 = 13200

So an override limit of 15k might be sufficient for the time being.

In the final implementation it would probably be a good idea to have an FTR test that installs the latest package against an 800MB (or even smaller) Kibana instance to ensure we don't hit an OOM.

How do we enforce the "heap budget"?

Something I just recently thought about is that rules might differ a lot in size based on the specific rule and even specific version. I'm guessing one rule might be a simple query and another one a very elaborate query? This might be a good reason to prefer the maxImportBytes solution. The downside of that is that it would probably require a bit more refactoring since we currently enforce the payload size at the route level so we would have to move that check into the importer.

@xcrzx
Copy link
Contributor Author

xcrzx commented Jan 11, 2023

Thank you, @rudolf, for the detailed feedback.

So an override limit of 15k might be sufficient for the time being.

Yes, if we look at the number of rules we've been releasing recently (see docs), I can assume that the 15k limit might be enough for 10+ upcoming releases. So that would give us some time to implement an alternative approach.

In the final implementation it would probably be a good idea to have an FTR test that installs the latest package against an 800MB (or even smaller) Kibana instance to ensure we don't hit an OOM.

It's a good idea. We have a ticket for improving test coverage of detection rules package installation. Added a note to test for OOMs there: #148176.

Something I just recently thought about is that rules might differ a lot in size based on the specific rule and even specific version. I'm guessing one rule might be a simple query and another one a very elaborate query?

Rules indeed vary in size, the distribution looks like this:

mean std min 25% 50% 75% max
3.1kB 1.9kB 1.0kB 1.9kB 2.3kB 3.8kB 12.0kB

I couldn't say that the difference is vast, but we still need to take it into consideration. Overall, relying on the maxImportBytes instead of simply counting the number of objects looks more suitable for the import.

@xcrzx xcrzx force-pushed the fleet-objects-limit branch from 8544b40 to 3b0be70 Compare January 13, 2023 11:34
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/core-saved-objects-server 91 92 +1
core 1018 1019 +1
total +2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
@kbn/core-saved-objects-server 0 1 +1
Unknown metric groups

API count

id before after diff
@kbn/core-saved-objects-server 305 306 +1
core 2822 2823 +1
total +2

History

  • 💛 Build #98865 was flaky 8544b40e9e2149e6d147baf78b4beb7e5d8cb476
  • 💚 Build #98296 succeeded c317562e422caa9a14098d39c170af6c6a585ea5

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @xcrzx

Copy link
Contributor

@banderror banderror left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@xcrzx xcrzx merged commit 47d1a0e into elastic:main Jan 13, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:Fleet Team label for Observability Data Collection Fleet team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Error when importing more than 10000 saved objects in a Fleet package
8 participants