Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting/Cloud] Soft disable reporting functionality on 1GB Kibanas #130651

Merged
merged 52 commits into from
Apr 28, 2022

Conversation

jloleysens
Copy link
Contributor

@jloleysens jloleysens commented Apr 20, 2022

Summary

Initial work for soft-disabling rather than crashing Kibana when generating reports on Kibana instances with low available RAM. This work ensures that Kibana server responds with the appropriate messages via our reporting HTTP APIs when generating a report.

How to test

  1. Start Kibana & ES
  2. Enable a trial license
  3. Either test on 1GB Cloud deployment (see this PR [Reporting] Soft disable cloud test - do not merge #130846) or, if testing locally replace if (this.systemHasInsufficientMemory()) { with if (true) { in screenshotting/screenshots/index.ts to trigger that code path.
  4. Generate a PDF or PNG report
  5. See the error message in the UI
  6. See the error message in the report info panel flyout that contains a link

Screenshots

Screenshot 2022-04-22 at 12 27 14

Screenshot 2022-04-22 at 12 28 04

Implementation details and notes

  • The InsufficientMemoryAvailableOnCloudError will be thrown on Cloud when we are running on a Kibana with <2GB RAM. Specifically, the error is thrown when we call getScreenshots exposed by the screenshotting plugin. In this way we block all screenshots from running.
  • Reporting now knows about this error type and sets an appropriate message explaining what is wrong. Additionally, the UI shows a message with a link, guiding users to how the can get around the issue
  • For ReportingError classes it is convenient to read this.code and ClassName.code I've opted to make code both an instance and static property name
  • In collaboration with Cloud stakeholders, we have decided that reading os.totalmem is sufficient for informing the soft disable decision
  • No new methods were added to screenhotting's API, although we may have to consider this if we want to pre-emptively disable the "generate report" button

Related to

#129989

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Does not stop from generating reports on Cloud Low High Must be tested manually on Cloud

For maintainers

jloleysens and others added 30 commits April 12, 2022 11:22
…disable-server-side

* 'main' of github.com:elastic/kibana: (35 commits)
  [Uptime] remove latency limit warnings when using monitor management (elastic#129597)
  [Security Solution] [ReponseOps] Executes Cases Cypress test when there is a change on cases plugin (elastic#129992)
  Paramaterized Discover tests (elastic#129684)
  [Security Solution][Investigations] - Minor bug fixes (elastic#130054)
  [DOCS} Adds technical preview to Lens annotations (elastic#130058)
  [Security solution] [Endpoint] Revisit blocklist wrong labels (elastic#128773)
  [Security Solutions] Adds API docs for value lists (elastic#129962)
  [CI] Move jest tests to spot instances, and fix spot retries in PRs (elastic#130045)
  chore(NA): upgrades rules_node_js to v5.4.0 (elastic#130051)
  [SecuritySolution] Remove the cell hovers actions for agent status (elastic#130042)
  Upgrade RxJS to 7 (elastic#129087)
  [SecuritySolution] Clean up CaseContext (elastic#130036)
  Revert "chore(NA): upgrades rules_node_js to v5.4.0 (elastic#130021)"
  Use RuleDataReader to query for threshold signal history (elastic#129763)
  Remove securityRulesCancelEnabled setting and set shorter default timeouts (elastic#129769)
  Upgrade EUI to v54.0.0 (elastic#129653)
  [Security Solution] More Ransomware exceptionable fields (elastic#130039)
  Add e2e for the apm integration policy form (elastic#129860)
  chore(NA): upgrades rules_node_js to v5.4.0 (elastic#130021)
  [ML] Fix Single Metric Viewer chart failing to load if no points during calendar event (elastic#130000)
  ...

# Conflicts:
#	x-pack/plugins/screenshotting/server/screenshots/index.test.ts
…disable-server-side

* 'main' of github.com:elastic/kibana: (103 commits)
  [Osquery] Update eslint config (elastic#129637)
  [Uptime] Update monitor saved object mappings (elastic#130433)
  Add links to metricbeat module docs (elastic#130519)
  Add link to troubleshooting guide in confirm data copy (elastic#130420)
  [Step 3] Cleanup charts plugin (elastic#130132)
  [Visualize] Adds a deprecation warning to the pie app (elastic#130447)
  [Maps] fix vector tile load errors not displayed in legend (elastic#130395)
  [CI] Split alerting-api-integration tests into separate cigroups (elastic#130414)
  [CI] Use spot instances for default cigroups in PR CI (elastic#130476)
  [functional-tests] TimePicker optimizations (elastic#130200)
  [kbn/pm] use stable module ids in dist (elastic#130497)
  [8.2.1][Security Solution][Session view] fix full screen session view margin (elastic#130496)
  Fix wrong config in comments (elastic#130378)
  Add deprecated telemetry (elastic#130458)
  Add eslint rule to support breaking up packages (elastic#130483)
  [Security Solution][Endpoint] Fix test stability and un-skip flaky tests (elastic#130176)
  Update object types for SharePoint Online external connector (elastic#130478)
  [Workplace Search] Fix broken feedback link (elastic#130475)
  Rename the term "execution" in config to "run" (elastic#130172)
  [Cloud Posture] use index with keyword mapping (elastic#130456)
  ...

# Conflicts:
#	docs/user/reporting/index.asciidoc
#	x-pack/plugins/reporting/public/types.ts
#	x-pack/plugins/screenshotting/server/screenshots/index.test.ts
#	x-pack/plugins/screenshotting/server/screenshots/index.ts
@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

* should only occur on cloud.
*/
info.error_code === VisualReportingSoftDisabledError.code
? sharedI18nTexts.cloud.insufficientMemoryError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not make this message part of the Job.getError implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point... But I'd prefer to not raise this logic to a more general level yet. Because the i18n contains JSX we would no longer guarantee that Job.getError will only return a undefined | string.

I prefer having this code implemented in a more "dumb" way because it remains a bit easier to delete (something may still want to do one day).

Copy link
Member

@tsullivan tsullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I just left one comment/suggestion.

>
{errorText}
<EuiCallOut size="m" color="danger" data-test-errorText={errorText}>
{job.errorCode === errors.VisualReportingSoftDisabledError.code
Copy link
Member

@tsullivan tsullivan Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this logic to ReportingAPIClient.getError, where the passed-in errorText comes from?

Copy link
Contributor

@gchaps gchaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doclink LGTM

One small nit on the message. I think we can remove the word "minimum" in this sentence, so it reads:

Check the RAM requirements.

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
reporting 83 85 +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/doc-links 64 65 +1
cloud 23 24 +1
total +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
reporting 57.9KB 58.1KB +220.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
core 288.6KB 288.7KB +115.0B
reporting 38.8KB 42.0KB +3.3KB
screenshotting 7.7KB 7.7KB +6.0B
total +3.4KB
Unknown metric groups

API count

id before after diff
@kbn/doc-links 64 65 +1
cloud 28 29 +1
total +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jloleysens jloleysens added the auto-backport Deprecated - use backport:version if exact versions are needed label Apr 28, 2022
@jloleysens jloleysens merged commit 0c5dbca into elastic:main Apr 28, 2022
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.2 Backport failed because of merge conflicts

You might need to backport the following PRs to 8.2:
- [Reporting] Fix copy issue in report notifications (#123607)
- [Reporting] Reporting info panel touch ups (#120617)

Manual backport

To create the backport manually run:

node scripts/backport --pr 130651

Questions ?

Please refer to the Backport tool documentation

@jloleysens jloleysens deleted the reportings/soft-disable branch April 28, 2022 15:24
@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label May 2, 2022
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create backports run node scripts/backport --pr 130651 or prevent reminders by adding the backport:skip label.

1 similar comment
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create backports run node scripts/backport --pr 130651 or prevent reminders by adding the backport:skip label.

@jloleysens jloleysens added backport:skip This commit does not require backporting and removed backport missing Added to PRs automatically when the are determined to be missing a backport. auto-backport Deprecated - use backport:version if exact versions are needed v8.2.1 labels May 4, 2022
kertal pushed a commit to kertal/kibana that referenced this pull request May 24, 2022
…elastic#130651)

* added callout about reporting on cloud

* slight rewording

* added functionality to disbale screenshotting on <2GB cloud instances

* added test and fixed public facing API to throw an observable error instead of sync

* update reporting integration with new screenshotting error

* fix typo

* added specific error code to telemetry

* added positive test case

* updated mappings

* update jest test snapshots

* update snapshot

* fix use of deprecated RxJS

* [revert this] added some logs for cloud

* also check for deploymentId

* remove logs

* removed logger being passed to system check function

* remove unused import

* slight update to test

* added cloud min requirements link

* add link to UI

* [CI] Auto-commit changed files from 'node scripts/eslint --no-cache --fix'

* added todo comment

* read mem limit from cgroup files

* update jest

* tidy up and simplify some of the logic

* [revert this] test with 1024

* [revert this] added some logging again

* actually pass the cloud plugin to Screenshots 🤦

* [revert this] try and read instance data file instead

* Revert "[revert this] try and read instance data file instead"

This reverts commit ebdea21.

* Revert "[revert this] test with 1024"

This reverts commit 7ae3e39.

* Revert "[revert this] added some logging again"

This reverts commit d21e808.

* use injected environment variable to read instance size

* implement copy feedback

* fix variable rename

* fix test

* fixed some copy

* fix test code

* remove unused i18n

* update snapshots

* minor tidying up

* tighten up the copy

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@tylersmalley tylersmalley added ci:cloud-deploy Create or update a Cloud deployment and removed ci:deploy-cloud labels Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:cloud-deploy Create or update a Cloud deployment (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead release_note:skip Skip the PR/issue when compiling release notes v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants