Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Derived Source] Add support for deriving source field in FieldMapper #17073

Open
rayshrey opened this issue Jan 21, 2025 · 3 comments
Open

[Derived Source] Add support for deriving source field in FieldMapper #17073

rayshrey opened this issue Jan 21, 2025 · 3 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance untriaged

Comments

@rayshrey
Copy link
Contributor

Is your feature request related to a problem? Please describe

docValues are a columnar storage format used by Lucene to store indexed data in a way that facilitates efficient aggregation, sorting etc.
Stored fields, on the other hand, are used to store the actual values of fields as they were inserted into the index.

Currently the _source field stores the original documents as stored field. We can possibly skip storing the _source field in cases where the values are already store in docValues and retrieve the field values from docValue instead. This will help in reducing the storage cost significantly. Based on the nature of the query we can skip or fetch some or all of the fields from docValues to serve the search queries.

Describe the solution you'd like

Add a new parameter in MappedFieldType to indicate whether deriving source is supported for the filed or not, something similar to below

private final boolean derivedSourceSupported;

public boolean isDerivedSourceSupported() {
    return derivedSourceSupported;
}

Add a new method in FieldMapper with the following signature

public void buildDerivedSource(XContentBuilder builder, LeafReader leafReader, int docId) throws IOException {
        // Implement this method in respective Mappers
}

We will implement this method in all the sub-classes where we will be supporting deriving source.
Usage - will call this method in the Search or Get path and pass the leafReader(to read the docValues from) and docId(for which we need the docValues). We will also be passing a XContentBuilder to which we will be adding the source after deriving. Will be calling this method for all the FieldMappers that we have and will finally set the source to the builder object that we will get.

Related component

Indexing:Performance

Describe alternatives you've considered

N/A

Additional context

#9568 (comment)

@HUSTERGS
Copy link
Contributor

This issue seems to be duplicated with #9568 ? The proposal looks like the one proposed by @bugmakerrrrrr #9558 (comment)

@bugmakerrrrrr
Copy link
Contributor

This issue seems to be duplicated with #9568 ? The proposal looks like the one proposed by @bugmakerrrrrr #9558 (comment)

It does look like it, and I don't think we should create duplicate RFC

@rayshrey
Copy link
Contributor Author

@HUSTERGS @bugmakerrrrrr The main purpose of creating this issue was tracking the first task (creating the basic interfaces for supporting derived source). Supporting derived source will require multiple such PRs, which we will be tracking in this recently created meta issue - #17048

The intention was not to create a duplicate RFCs but start adding concrete PRs for better reviews and understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance untriaged
Projects
None yet
Development

No branches or pull requests

3 participants