-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add semantic field mapper. #1225
base: main
Are you sure you want to change the base?
Conversation
@@ -250,6 +250,7 @@ def knnJarDirectory = "$buildDir/dependencies/opensearch-knn" | |||
|
|||
dependencies { | |||
api "org.opensearch:opensearch:${opensearch_version}" | |||
implementation group: 'org.opensearch.plugin', name:'mapper-extras-client', version: "${opensearch_version}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to access TokenCountFieldMapper to delegate work for token_count raw_field_type.
Signed-off-by: Bo Zhang <bzhangam@amazon.com>
|
||
import java.util.Map; | ||
|
||
public class FeatureFlagUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use a feature flag like NEURAL_SEARCH_HYBRID_SEARCH_DISABLED. It's actually used in a wrong way here. I create an issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the distinction between feature flags and cluster settings? One is configurable at runtime and the other isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can view the settings in the feature flag is a private static variable which is only used to hold some feature flags while the cluster settings is shared across the cluster. And it can be any setting related to the cluster.
node.put(FieldConstants.TYPE, rawFieldType); | ||
return semanticConfig; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed we're extracting semantic config parameters and removing them from the original node before passing it to the delegate field mapper. This approach could potentially cause validation issues in the delegate mapper, which might expect to see all original parameters during its validation phase.
To ensure more robust behavior, consider:
- Using a copy of the node instead of modifying the original:
Map<String, Object> nodeCopy = new HashMap<>(node);
- Or creating a separate delegate node that preserves all original parameters while only changing the field type
This would prevent any unexpected side effects from parameter removal and ensure that both semantic and delegate mappers can perform complete validation with their expected parameter sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delegate field mapper cannot process the semantic parameters so we have to extract them. The semantic field mapper should validate the semantic parameters and the delegate field mapper only need to process the non-semantic parameters. If we pass all original parameters to the delegate field mapper it will always fail since it cannot handle the semantic parameters.
* FieldMapper for the semantic field. It will hold a delegate field mapper to delegate the data parsing and query work | ||
* based on the raw_field_type. | ||
*/ | ||
public class SemanticFieldMapper extends ParametrizedFieldMapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this is extended from core, but it should be spelled: parameterized not Parametrized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think it's a typo in core and change it will need all the plugin using it to change accordingly. So don't think we want to make it part of this PR. Probably we can create an issue in core for this.
MultiFields multiFields, | ||
CopyTo copyTo, | ||
ParametrizedFieldMapper delegateFieldMapper, | ||
SemanticFieldMapper.Builder builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why we're passing in Builder
, and not just SemanticFieldMapper
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we build the SemanticFieldMapper we only have the builder and we use the builder to build it. Here I pass the build to pass all the parameters to SemanticFieldMapper.
return CONTENT_TYPE; | ||
} | ||
|
||
public static class Builder extends ParametrizedFieldMapper.Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be static class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. In this way we can access it like SemanticFieldMapper.Builder. This is a common pattern used across OpenSearch.
} | ||
} | ||
|
||
public static class TypeParser implements Mapper.TypeParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid defining multiple classes in a single file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can. But I think this is a common pattern used across OpenSearch.
@NonNull final String rawFieldType, | ||
SemanticFieldMapper.Builder builder | ||
) { | ||
if (BinaryFieldMapper.CONTENT_TYPE.equals(rawFieldType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] using switch/case
will make this cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with switch/case we cannot do STRING_FIELD_TYPES.contains(rawFieldType). So would like to keep this.
import org.opensearch.neuralsearch.mapper.SemanticFieldMapper; | ||
|
||
@Getter | ||
public class SemanticFieldTypeParameters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if having Parameters as suffix to class name is necessary. The defined fields should already imply having parameters. Can we include these fields in SemanticFieldType
instead??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view this as a class to help us hold all the parameters. In this way when we need to add new parameters we don't need to change it in all field types. We just need to update it here.
Description
Add semantic field mapper to support the semantic field type.
Test will fail until this PR is merged into core.
Related Issues
Resolves #[803]
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.