Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logsdb and source only snapshots. #122199

Merged
merged 22 commits into from
Feb 13, 2025

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Feb 10, 2025

Addresses a few issues with logsdb and source only snapshots:

  • Avoid initializing index sorting, because sort fields will not have doc values.
  • Also disable doc value skippers when doc values get disabled.
  • As part of source only validation figure out what the nested parent field is.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.
* Avoid initializing _source.mode, because otherwise an empty _source:{} gets serialized in the restored mapping.

Also added a few more tests that snapshot and restore logsdb data streams.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@martijnvg martijnvg force-pushed the logsdb_search_only_snapshots branch from 9bd067a to 385e821 Compare February 10, 2025 18:39
@@ -176,7 +176,8 @@ public Builder(
true,
() -> null,
(n, c, o) -> Mode.valueOf(o.toString().toUpperCase(Locale.ROOT)),
m -> toType(m).enabled.explicit() ? null : toType(m).mode,
// Avoid initializing _source.mode if it doesn't need to be serialized:
m -> toType(m).enabled.explicit() ? null : toType(m).serializeMode ? toType(m).mode : null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's add parentheses to denote the sequence of the conditional branches.

assertDocCount(client(), dataStreamName, 100);
}

static void snapshotAndFail(String sourceMode, String arrayType, boolean sourceOnly) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Params need to be "synthetic" and false, for this to fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the issues are with sourceOnly snapshots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's then hard-code these and remove them from method args.


restoreSnapshot(repositoryName, snapshotName, true);
assertDataStream(dataStreamName, sourceMode);
assertDocCount(client(), dataStreamName, 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also compare the source for the two old and the new index?

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, a few minor comments and questions.

String dataStreamName = "logs-my-test";
String repositoryName = "my-repository";
if (sourceOnly) {
if (true) {
var repositorySettings = Settings.builder().put("delegate_type", "fs").put("location", getRepoPath()).build();
registerRepository(repositoryName, "source", true, repositorySettings);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove, redundant.

.setting("xpack.license.self_generated.type", "trial")
// TODO: remove when initializing / serializing default SourceFieldMapper instance have been fixed:
// (SFM's mode attribute often gets initialized, even when mode attribute isn't set)
.jvmArg("-da:org.elasticsearch.index.mapper.DocumentMapper")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue that we need to address. We always serialize an empty _source, because the mode attribute's initializer initializes the mode even if we don't need to include/serialize the mode attribute. Changing that is tricky and we need to take bwc into account, because DocumentMapper's assertion will trip on older nodes.

The reason why we need to disable DocumentMapper's and MapperService's assertions, is because the mapping update that happens when source only snapshots (in SourceOnlySnapshotRepository#metadataToSnapshot(...)). Which is metadata only mapping update that triggers the mentioned assertions to trip. This isn't an issue in production, but should fixed.

I will work on a follow up pr that avoids serializing an empty source if _source defaults are active. That fixes the assertion from tripping (like the initial commit did in this PR). But given the trickiness is better to do in a followup pr.

@martijnvg martijnvg enabled auto-merge (squash) February 13, 2025 08:01
@martijnvg martijnvg merged commit 1b452c7 into elastic:main Feb 13, 2025
17 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2025
Backporting elastic#122199 to 8.x branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2025
Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.
martijnvg added a commit that referenced this pull request Feb 14, 2025
Backporting #122199 to 9.0 branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.
elasticsearchmachine pushed a commit that referenced this pull request Feb 14, 2025
* [8.x] Logsdb and source only snapshots.

Backporting #122199 to 8.x branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.

* fix test
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2025
* [8.x] Logsdb and source only snapshots.

Backporting elastic#122199 to 8.x branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.

* fix test
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2025
* [8.x] Logsdb and source only snapshots.

Backporting elastic#122199 to 8.x branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.

* fix test
elasticsearchmachine pushed a commit that referenced this pull request Feb 14, 2025
* [8.x] Logsdb and source only snapshots.

Backporting #122199 to 8.x branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.

* fix test
elasticsearchmachine pushed a commit that referenced this pull request Feb 14, 2025
* [8.x] Logsdb and source only snapshots.

Backporting #122199 to 8.x branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants