From e47086af3621e1074134787d2df0600ba2ec043c Mon Sep 17 00:00:00 2001 From: Gerard Soldevila Date: Thu, 18 Jul 2024 14:40:41 +0200 Subject: [PATCH 1/3] Migrations: Don't auto-create temp index (#158182) Try to fix https://github.com/elastic/kibana/issues/156117#issuecomment-1557029863 Fixes a race condition that could cause intermittent upgrade migration failures when Kibana connects to a single node Elasticsearch cluster. Delete any items that are not applicable to this PR. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) (cherry picked from commit 8e7e2632bbb5f2dfffe8ab6c2563e176c5d7cf6b) --- ...grations_state_action_machine.test.ts.snap | 6 ++++ .../bulk_overwrite_transformed_documents.ts | 9 +++++- .../actions/integration_tests/actions.test.ts | 29 ++++++++++++++++++- .../migrationsv2/initial_state.test.ts | 1 + .../migrationsv2/initial_state.ts | 1 + .../batch_size_bytes.test.ts | 4 +-- .../migrationsv2/model/model.test.ts | 1 + .../saved_objects/migrationsv2/model/model.ts | 2 +- .../server/saved_objects/migrationsv2/next.ts | 9 +++++- .../saved_objects/migrationsv2/types.ts | 10 +++++-- 10 files changed, 64 insertions(+), 8 deletions(-) diff --git a/src/core/server/saved_objects/migrationsv2/__snapshots__/migrations_state_action_machine.test.ts.snap b/src/core/server/saved_objects/migrationsv2/__snapshots__/migrations_state_action_machine.test.ts.snap index 950eca922072c..f1073654a1e67 100644 --- a/src/core/server/saved_objects/migrationsv2/__snapshots__/migrations_state_action_machine.test.ts.snap +++ b/src/core/server/saved_objects/migrationsv2/__snapshots__/migrations_state_action_machine.test.ts.snap @@ -44,6 +44,7 @@ Object { "properties": Object {}, }, "tempIndex": ".my-so-index_7.11.0_reindex_temp", + "tempIndexAlias": ".my-so-index_7.11.0_reindex_temp_alias", "tempIndexMappings": Object { "dynamic": false, "properties": Object { @@ -181,6 +182,7 @@ Object { "properties": Object {}, }, "tempIndex": ".my-so-index_7.11.0_reindex_temp", + "tempIndexAlias": ".my-so-index_7.11.0_reindex_temp_alias", "tempIndexMappings": Object { "dynamic": false, "properties": Object { @@ -322,6 +324,7 @@ Object { "properties": Object {}, }, "tempIndex": ".my-so-index_7.11.0_reindex_temp", + "tempIndexAlias": ".my-so-index_7.11.0_reindex_temp_alias", "tempIndexMappings": Object { "dynamic": false, "properties": Object { @@ -467,6 +470,7 @@ Object { "properties": Object {}, }, "tempIndex": ".my-so-index_7.11.0_reindex_temp", + "tempIndexAlias": ".my-so-index_7.11.0_reindex_temp_alias", "tempIndexMappings": Object { "dynamic": false, "properties": Object { @@ -643,6 +647,7 @@ Object { "properties": Object {}, }, "tempIndex": ".my-so-index_7.11.0_reindex_temp", + "tempIndexAlias": ".my-so-index_7.11.0_reindex_temp_alias", "tempIndexMappings": Object { "dynamic": false, "properties": Object { @@ -791,6 +796,7 @@ Object { "properties": Object {}, }, "tempIndex": ".my-so-index_7.11.0_reindex_temp", + "tempIndexAlias": ".my-so-index_7.11.0_reindex_temp_alias", "tempIndexMappings": Object { "dynamic": false, "properties": Object { diff --git a/src/core/server/saved_objects/migrationsv2/actions/bulk_overwrite_transformed_documents.ts b/src/core/server/saved_objects/migrationsv2/actions/bulk_overwrite_transformed_documents.ts index 9353ede9be6ac..b5f28474c572c 100644 --- a/src/core/server/saved_objects/migrationsv2/actions/bulk_overwrite_transformed_documents.ts +++ b/src/core/server/saved_objects/migrationsv2/actions/bulk_overwrite_transformed_documents.ts @@ -50,6 +50,12 @@ export interface BulkOverwriteTransformedDocumentsParams { index: string; transformedDocs: SavedObjectsRawDoc[]; refresh?: estypes.Refresh; + /** + * If true, we prevent Elasticsearch from auto-creating the index if it + * doesn't exist. We use the ES paramater require_alias: true so `index` + * must be an alias, otherwise the bulk index will fail. + */ + useAliasToPreventAutoCreate?: boolean; } /** @@ -62,6 +68,7 @@ export const bulkOverwriteTransformedDocuments = index, transformedDocs, refresh = false, + useAliasToPreventAutoCreate = false, }: BulkOverwriteTransformedDocumentsParams): TaskEither.TaskEither< | RetryableEsClientError | TargetIndexHadWriteBlock @@ -83,7 +90,7 @@ export const bulkOverwriteTransformedDocuments = // mappings. Such tampering could lead to many other problems and is // probably unlikely so for now we'll accept this risk and wait till // system indices puts in place a hard control. - require_alias: false, + require_alias: useAliasToPreventAutoCreate, wait_for_active_shards: WAIT_FOR_ALL_SHARDS_TO_BE_ACTIVE, refresh, filter_path: ['items.*.error'], diff --git a/src/core/server/saved_objects/migrationsv2/actions/integration_tests/actions.test.ts b/src/core/server/saved_objects/migrationsv2/actions/integration_tests/actions.test.ts index a58ddc5c8eda1..9b2272efbd73b 100644 --- a/src/core/server/saved_objects/migrationsv2/actions/integration_tests/actions.test.ts +++ b/src/core/server/saved_objects/migrationsv2/actions/integration_tests/actions.test.ts @@ -70,6 +70,7 @@ describe('migration actions', () => { await createIndex({ client, indexName: 'existing_index_with_docs', + aliases: ['existing_index_with_docs_alias'], mappings: { dynamic: true, properties: {}, @@ -133,7 +134,9 @@ describe('migration actions', () => { expect(res.right).toEqual( expect.objectContaining({ existing_index_with_docs: { - aliases: {}, + aliases: { + existing_index_with_docs_alias: {}, + }, mappings: expect.anything(), settings: expect.anything(), }, @@ -1519,6 +1522,30 @@ describe('migration actions', () => { } `); }); + it('resolves left index_not_found_exception if the index does not exist and useAliasToPreventAutoCreate=true', async () => { + const newDocs = [ + { _source: { title: 'doc 5' } }, + { _source: { title: 'doc 6' } }, + { _source: { title: 'doc 7' } }, + ] as unknown as SavedObjectsRawDoc[]; + await expect( + bulkOverwriteTransformedDocuments({ + client, + index: 'existing_index_with_docs_alias_that_does_not_exist', + useAliasToPreventAutoCreate: true, + transformedDocs: newDocs, + refresh: 'wait_for', + })() + ).resolves.toMatchInlineSnapshot(` + Object { + "_tag": "Left", + "left": Object { + "index": "existing_index_with_docs_alias_that_does_not_exist", + "type": "index_not_found_exception", + }, + } + `); + }); it('resolves left target_index_had_write_block if there are write_block errors', async () => { const newDocs = [ { _source: { title: 'doc 5' } }, diff --git a/src/core/server/saved_objects/migrationsv2/initial_state.test.ts b/src/core/server/saved_objects/migrationsv2/initial_state.test.ts index 601eb7cf1ce36..17014866ba394 100644 --- a/src/core/server/saved_objects/migrationsv2/initial_state.test.ts +++ b/src/core/server/saved_objects/migrationsv2/initial_state.test.ts @@ -72,6 +72,7 @@ describe('createInitialState', () => { }, }, tempIndex: '.kibana_task_manager_8.1.0_reindex_temp', + tempIndexAlias: '.kibana_task_manager_8.1.0_reindex_temp_alias', tempIndexMappings: { dynamic: false, properties: { diff --git a/src/core/server/saved_objects/migrationsv2/initial_state.ts b/src/core/server/saved_objects/migrationsv2/initial_state.ts index a61967be9242c..578b2aafb8ce5 100644 --- a/src/core/server/saved_objects/migrationsv2/initial_state.ts +++ b/src/core/server/saved_objects/migrationsv2/initial_state.ts @@ -73,6 +73,7 @@ export const createInitialState = ({ versionAlias: `${indexPrefix}_${kibanaVersion}`, versionIndex: `${indexPrefix}_${kibanaVersion}_001`, tempIndex: `${indexPrefix}_${kibanaVersion}_reindex_temp`, + tempIndexAlias: `${indexPrefix}_${kibanaVersion}_reindex_temp_alias`, kibanaVersion, preMigrationScript: Option.fromNullable(preMigrationScript), targetIndexMappings: targetMappings, diff --git a/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts b/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts index ad201af3c3a0d..eef0d5bc6d8b9 100644 --- a/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts +++ b/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts @@ -90,7 +90,7 @@ describe('migration v2', function () { await root.preboot(); await root.setup(); await expect(root.start()).rejects.toMatchInlineSnapshot( - `[Error: Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715277 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.]` + `[Error: Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715237 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.]` ); await retryAsync( @@ -103,7 +103,7 @@ describe('migration v2', function () { expect( records.find((rec) => rec.message.startsWith( - `Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715277 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.` + `Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715237 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.` ) ) ).toBeDefined(); diff --git a/src/core/server/saved_objects/migrationsv2/model/model.test.ts b/src/core/server/saved_objects/migrationsv2/model/model.test.ts index 3e48a7147bffd..b99b796f4dd8e 100644 --- a/src/core/server/saved_objects/migrationsv2/model/model.test.ts +++ b/src/core/server/saved_objects/migrationsv2/model/model.test.ts @@ -81,6 +81,7 @@ describe('migrations v2 model', () => { versionAlias: '.kibana_7.11.0', versionIndex: '.kibana_7.11.0_001', tempIndex: '.kibana_7.11.0_reindex_temp', + tempIndexAlias: '.kibana_7.11.0_reindex_temp_alias', unusedTypesQuery: { bool: { must_not: [ diff --git a/src/core/server/saved_objects/migrationsv2/model/model.ts b/src/core/server/saved_objects/migrationsv2/model/model.ts index 5d8862e48df1a..635cb052563f0 100644 --- a/src/core/server/saved_objects/migrationsv2/model/model.ts +++ b/src/core/server/saved_objects/migrationsv2/model/model.ts @@ -504,7 +504,7 @@ export const model = (currentState: State, resW: ResponseType): if (stateP.corruptDocumentIds.length === 0 && stateP.transformErrors.length === 0) { const batches = createBatches( res.right.processedDocs, - stateP.tempIndex, + stateP.tempIndexAlias, stateP.maxBatchSizeBytes ); if (Either.isRight(batches)) { diff --git a/src/core/server/saved_objects/migrationsv2/next.ts b/src/core/server/saved_objects/migrationsv2/next.ts index 433c0998f7567..bf93391620bb2 100644 --- a/src/core/server/saved_objects/migrationsv2/next.ts +++ b/src/core/server/saved_objects/migrationsv2/next.ts @@ -87,6 +87,7 @@ export const nextActionMap = (client: ElasticsearchClient, transformRawDocs: Tra Actions.createIndex({ client, indexName: state.tempIndex, + aliases: [state.tempIndexAlias], mappings: state.tempIndexMappings, }), REINDEX_SOURCE_TO_TEMP_OPEN_PIT: (state: ReindexSourceToTempOpenPit) => @@ -110,8 +111,14 @@ export const nextActionMap = (client: ElasticsearchClient, transformRawDocs: Tra REINDEX_SOURCE_TO_TEMP_INDEX_BULK: (state: ReindexSourceToTempIndexBulk) => Actions.bulkOverwriteTransformedDocuments({ client, - index: state.tempIndex, + index: state.tempIndexAlias, transformedDocs: state.transformedDocBatches[state.currentBatch], + /* + * Since other nodes can delete the temp index while we're busy writing + * to it, we use the alias to prevent the auto-creation of the index if + * it doesn't exist. + */ + useAliasToPreventAutoCreate: true, /** * Since we don't run a search against the target index, we disable "refresh" to speed up * the migration process. diff --git a/src/core/server/saved_objects/migrationsv2/types.ts b/src/core/server/saved_objects/migrationsv2/types.ts index 4f6419930c6cc..8160c7e2e9d1d 100644 --- a/src/core/server/saved_objects/migrationsv2/types.ts +++ b/src/core/server/saved_objects/migrationsv2/types.ts @@ -117,10 +117,16 @@ export interface BaseState extends ControlState { */ readonly versionIndex: string; /** - * An alias on the target index used as part of an "reindex block" that - * prevents lost deletes e.g. `.kibana_7.11.0_reindex`. + * A temporary index used as part of an "reindex block" that + * prevents lost deletes e.g. `.kibana_7.11.0_reindex_temp`. */ readonly tempIndex: string; + /** + * An alias to the tempIndex used to prevent ES from auto-creating the temp + * index if one node deletes it while another writes to it + * e.g. `.kibana_7.11.0_reindex_temp_alias`. + */ + readonly tempIndexAlias: string; /** * When reindexing we use a source query to exclude saved objects types which * are no longer used. These saved objects will still be kept in the outdated From 4f10d77d640d70b645cc7129015f4926ff7369c0 Mon Sep 17 00:00:00 2001 From: Gerard Soldevila Date: Thu, 18 Jul 2024 16:37:17 +0200 Subject: [PATCH 2/3] Adapt failing test --- .../integration_tests/batch_size_bytes.test.ts | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts b/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts index eef0d5bc6d8b9..b5469ab60da4d 100644 --- a/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts +++ b/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts @@ -43,7 +43,7 @@ describe('migration v2', function () { es: { license: 'basic', dataArchive: Path.join(__dirname, 'archives', '7.14.0_xpack_sample_saved_objects.zip'), - esArgs: ['http.max_content_length=1715277b'], + esArgs: ['http.max_content_length=1715329b'], }, }, })); @@ -61,7 +61,7 @@ describe('migration v2', function () { }); it('completes the migration even when a full batch would exceed ES http.max_content_length', async () => { - root = createRoot({ maxBatchSizeBytes: 1715277 }); + root = createRoot({ maxBatchSizeBytes: 1715329 }); esServer = await startES(); await root.preboot(); await root.setup(); @@ -90,7 +90,7 @@ describe('migration v2', function () { await root.preboot(); await root.setup(); await expect(root.start()).rejects.toMatchInlineSnapshot( - `[Error: Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715237 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.]` + `[Error: Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715283 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.]` ); await retryAsync( @@ -103,7 +103,7 @@ describe('migration v2', function () { expect( records.find((rec) => rec.message.startsWith( - `Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715237 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.` + `Unable to complete saved object migrations for the [.kibana] index: The document with _id "canvas-workpad-template:workpad-template-061d7868-2b4e-4dc8-8bf7-3772b52926e5" is 1715283 bytes which exceeds the configured maximum batch size of 1015275 bytes. To proceed, please increase the 'migrations.maxBatchSizeBytes' Kibana configuration option and ensure that the Elasticsearch 'http.max_content_length' configuration option is set to an equal or larger value.` ) ) ).toBeDefined(); From b195d3a4715e51d5ed369393fa89f8bb2d2d5fdb Mon Sep 17 00:00:00 2001 From: Gerard Soldevila Date: Fri, 19 Jul 2024 13:22:26 +0200 Subject: [PATCH 3/3] Adapt test to new size (alias is larger than index name) --- .../batch_size_bytes_exceeds_es_content_length.test.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes_exceeds_es_content_length.test.ts b/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes_exceeds_es_content_length.test.ts index 6d93d8c3ae9fc..a384b4ea07cfc 100644 --- a/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes_exceeds_es_content_length.test.ts +++ b/src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes_exceeds_es_content_length.test.ts @@ -54,7 +54,7 @@ describe('migration v2', () => { }); it('fails with a descriptive message when maxBatchSizeBytes exceeds ES http.max_content_length', async () => { - root = createRoot({ maxBatchSizeBytes: 1715277 }); + root = createRoot({ maxBatchSizeBytes: 1715329 }); esServer = await startES(); await root.preboot(); await root.setup();