Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic field mapper. #1225

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

bzhangam
Copy link
Contributor

@bzhangam bzhangam commented Mar 11, 2025

Description

Add semantic field mapper to support the semantic field type.

Test will fail until this PR is merged into core.

Related Issues

Resolves #[803]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -250,6 +250,7 @@ def knnJarDirectory = "$buildDir/dependencies/opensearch-knn"

dependencies {
api "org.opensearch:opensearch:${opensearch_version}"
implementation group: 'org.opensearch.plugin', name:'mapper-extras-client', version: "${opensearch_version}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to access TokenCountFieldMapper to delegate work for token_count raw_field_type.

Signed-off-by: Bo Zhang <bzhangam@amazon.com>
@bzhangam bzhangam reopened this Mar 11, 2025

import java.util.Map;

public class FeatureFlagUtil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use a feature flag like NEURAL_SEARCH_HYBRID_SEARCH_DISABLED. It's actually used in a wrong way here. I create an issue for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the distinction between feature flags and cluster settings? One is configurable at runtime and the other isn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can view the settings in the feature flag is a private static variable which is only used to hold some feature flags while the cluster settings is shared across the cluster. And it can be any setting related to the cluster.

node.put(FieldConstants.TYPE, rawFieldType);
return semanticConfig;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we're extracting semantic config parameters and removing them from the original node before passing it to the delegate field mapper. This approach could potentially cause validation issues in the delegate mapper, which might expect to see all original parameters during its validation phase.

To ensure more robust behavior, consider:

  1. Using a copy of the node instead of modifying the original: Map<String, Object> nodeCopy = new HashMap<>(node);
  2. Or creating a separate delegate node that preserves all original parameters while only changing the field type

This would prevent any unexpected side effects from parameter removal and ensure that both semantic and delegate mappers can perform complete validation with their expected parameter sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delegate field mapper cannot process the semantic parameters so we have to extract them. The semantic field mapper should validate the semantic parameters and the delegate field mapper only need to process the non-semantic parameters. If we pass all original parameters to the delegate field mapper it will always fail since it cannot handle the semantic parameters.

* FieldMapper for the semantic field. It will hold a delegate field mapper to delegate the data parsing and query work
* based on the raw_field_type.
*/
public class SemanticFieldMapper extends ParametrizedFieldMapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this is extended from core, but it should be spelled: parameterized not Parametrized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think it's a typo in core and change it will need all the plugin using it to change accordingly. So don't think we want to make it part of this PR. Probably we can create an issue in core for this.

MultiFields multiFields,
CopyTo copyTo,
ParametrizedFieldMapper delegateFieldMapper,
SemanticFieldMapper.Builder builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why we're passing in Builder, and not just SemanticFieldMapper directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we build the SemanticFieldMapper we only have the builder and we use the builder to build it. Here I pass the build to pass all the parameters to SemanticFieldMapper.

return CONTENT_TYPE;
}

public static class Builder extends ParametrizedFieldMapper.Builder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be static class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. In this way we can access it like SemanticFieldMapper.Builder. This is a common pattern used across OpenSearch.

}
}

public static class TypeParser implements Mapper.TypeParser {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid defining multiple classes in a single file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can. But I think this is a common pattern used across OpenSearch.

@NonNull final String rawFieldType,
SemanticFieldMapper.Builder builder
) {
if (BinaryFieldMapper.CONTENT_TYPE.equals(rawFieldType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] using switch/case will make this cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with switch/case we cannot do STRING_FIELD_TYPES.contains(rawFieldType). So would like to keep this.

import org.opensearch.neuralsearch.mapper.SemanticFieldMapper;

@Getter
public class SemanticFieldTypeParameters {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if having Parameters as suffix to class name is necessary. The defined fields should already imply having parameters. Can we include these fields in SemanticFieldType instead??

Copy link
Contributor Author

@bzhangam bzhangam Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view this as a class to help us hold all the parameters. In this way when we need to add new parameters we don't need to change it in all field types. We just need to update it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications neural-search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PROPOSAL] Neural Search field type
4 participants