Skip to content

Commit 064ded4

Browse files
committed
Add semantic field mapper.
Signed-off-by: Bo Zhang <bzhangam@amazon.com>
1 parent 5f25d6c commit 064ded4

19 files changed

+1487
-14
lines changed

DEVELOPER_GUIDE.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -351,9 +351,9 @@ through the same build issue.
351351

352352
### Class and package names
353353

354-
Class names should use `CamelCase`.
354+
Class names should use `CamelCase`.
355355

356-
Try to put new classes into existing packages if package name abstracts the purpose of the class.
356+
Try to put new classes into existing packages if package name abstracts the purpose of the class.
357357

358358
Example of good class file name and package utilization:
359359

@@ -371,7 +371,7 @@ methods rather than a long single one and does everything.
371371
### Documentation
372372

373373
Document you code. That includes purpose of new classes, every public method and code sections that have critical or non-trivial
374-
logic (check this example https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java#L238).
374+
logic (check this example https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java#L238).
375375

376376
When you submit a feature PR, please submit a new
377377
[documentation issue](https://github.com/opensearch-project/documentation-website/issues/new/choose). This is a path for the documentation to be published as part of https://opensearch.org/docs/latest/ documentation site.
@@ -384,17 +384,17 @@ For the most part, we're using common conventions for Java projects. Here are a
384384

385385
1. Use descriptive names for classes, methods, fields, and variables.
386386
2. Avoid abbreviations unless they are widely accepted
387-
3. Use `final` on all method arguments unless it's absolutely necessary
387+
3. Use `final` on all method arguments unless it's absolutely necessary
388388
4. Wildcard imports are not allowed.
389389
5. Static imports are preferred over qualified imports when using static methods
390390
6. Prefer creating non-static public methods whenever possible. Avoid static methods in general, as they can often serve as shortcuts.
391391
Static methods are acceptable if they are private and do not access class state.
392-
7. Use functional programming style inside methods unless it's a performance critical section.
392+
7. Use functional programming style inside methods unless it's a performance critical section.
393393
8. For parameters of lambda expression please use meaningful names instead of shorten cryptic ones.
394394
9. Use Optional for return values if the value may not be present. This should be preferred to returning null.
395395
10. Do not create checked exceptions, and do not throw checked exceptions from public methods whenever possible. In general, if you call a method with a checked exception, you should wrap that exception into an unchecked exception.
396396
11. Throwing checked exceptions from private methods is acceptable.
397-
12. Use String.format when a string includes parameters, and prefer this over direct string concatenation. Always specify a Locale with String.format;
397+
12. Use String.format when a string includes parameters, and prefer this over direct string concatenation. Always specify a Locale with String.format;
398398
as a rule of thumb, use Locale.ROOT.
399399
13. Prefer Lombok annotations to the manually written boilerplate code
400400
14. When throwing an exception, avoid including user-provided content in the exception message. For secure coding practices,
@@ -440,17 +440,17 @@ Fix any new warnings before submitting your PR to ensure proper code documentati
440440

441441
### Tests
442442

443-
Write unit and integration tests for your new functionality.
443+
Write unit and integration tests for your new functionality.
444444

445445
Unit tests are preferred as they are cheap and fast, try to use them to cover all possible
446-
combinations of parameters. Utilize mocks to mimic dependencies.
446+
combinations of parameters. Utilize mocks to mimic dependencies.
447447

448-
Integration tests should be used sparingly, focusing primarily on the main (happy path) scenario or cases where extensive
449-
mocking is impractical. Include one or two unhappy paths to confirm that correct response codes are returned to the user.
450-
Whenever possible, favor scenarios that do not require model deployment. If model deployment is necessary, use an existing
448+
Integration tests should be used sparingly, focusing primarily on the main (happy path) scenario or cases where extensive
449+
mocking is impractical. Include one or two unhappy paths to confirm that correct response codes are returned to the user.
450+
Whenever possible, favor scenarios that do not require model deployment. If model deployment is necessary, use an existing
451451
model, as tests involving new model deployments are the most resource-intensive.
452452

453-
If your changes could affect backward compatibility, please include relevant backward compatibility tests along with your
453+
If your changes could affect backward compatibility, please include relevant backward compatibility tests along with your
454454
PR. For guidance on adding these tests, refer to the [Backwards Compatibility Testing](#backwards-compatibility-testing) section in this guide.
455455

456456
### Outdated or irrelevant code

build.gradle

+2-1
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,8 @@ validateNebulaPom.enabled = false
249249
def knnJarDirectory = "$buildDir/dependencies/opensearch-knn"
250250

251251
dependencies {
252-
api "org.opensearch:opensearch:${opensearch_version}"
252+
implementation "org.opensearch:opensearch:${opensearch_version}"
253+
implementation group: 'org.opensearch.plugin', name:'mapper-extras-client', version: "${opensearch_version}"
253254
zipArchive group: 'org.opensearch.plugin', name:'opensearch-job-scheduler', version: "${opensearch_build}"
254255
zipArchive group: 'org.opensearch.plugin', name:'opensearch-knn', version: "${opensearch_build}"
255256
zipArchive group: 'org.opensearch.plugin', name:'opensearch-ml-plugin', version: "${opensearch_build}"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.constants;
6+
7+
public class FieldConstants {
8+
public static final String TYPE = "type";
9+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.constants;
6+
7+
public class SemanticFieldConstants {
8+
public static final String MODEL_ID = "model_id";
9+
public static final String SEARCH_MODEL_ID = "search_model_id";
10+
public static final String RAW_FIELD_TYPE = "raw_field_type";
11+
public static final String SEMANTIC_INFO_FIELD_NAME = "semantic_info_field_name";
12+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.mapper;
6+
7+
import lombok.Getter;
8+
import lombok.Setter;
9+
import org.opensearch.core.xcontent.XContentBuilder;
10+
import org.opensearch.index.mapper.BinaryFieldMapper;
11+
import org.opensearch.index.mapper.KeywordFieldMapper;
12+
import org.opensearch.index.mapper.MappedFieldType;
13+
import org.opensearch.index.mapper.Mapper;
14+
import org.opensearch.index.mapper.MapperParsingException;
15+
import org.opensearch.index.mapper.MatchOnlyTextFieldMapper;
16+
import org.opensearch.index.mapper.ParametrizedFieldMapper;
17+
import org.opensearch.index.mapper.ParseContext;
18+
import org.opensearch.index.mapper.TextFieldMapper;
19+
import org.opensearch.index.mapper.TokenCountFieldMapper;
20+
import org.opensearch.index.mapper.WildcardFieldMapper;
21+
import org.opensearch.neuralsearch.constants.FieldConstants;
22+
import org.opensearch.neuralsearch.mapper.semanticFieldTypes.SemanticFieldTypeFactory;
23+
24+
import java.io.IOException;
25+
import java.util.HashMap;
26+
import java.util.List;
27+
import java.util.Map;
28+
29+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.MODEL_ID;
30+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.RAW_FIELD_TYPE;
31+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.SEARCH_MODEL_ID;
32+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.SEMANTIC_INFO_FIELD_NAME;
33+
34+
/**
35+
* FieldMapper for the semantic field. It will hold a delegate field mapper to delegate the data parsing and query work
36+
* based on the raw_field_type.
37+
*/
38+
public class SemanticFieldMapper extends ParametrizedFieldMapper {
39+
public static final String CONTENT_TYPE = "semantic";
40+
private final String modelId;
41+
private final String rawFieldType;
42+
private final String semanticInfoFieldName;
43+
private final String searchModelId;
44+
45+
@Setter
46+
@Getter
47+
private ParametrizedFieldMapper delegateFieldMapper;
48+
49+
protected SemanticFieldMapper(
50+
String simpleName,
51+
MappedFieldType mappedFieldType,
52+
MultiFields multiFields,
53+
CopyTo copyTo,
54+
ParametrizedFieldMapper delegateFieldMapper,
55+
SemanticFieldMapper.Builder builder
56+
) {
57+
super(simpleName, mappedFieldType, multiFields, copyTo);
58+
this.delegateFieldMapper = delegateFieldMapper;
59+
this.modelId = builder.modelId.getValue();
60+
this.rawFieldType = builder.rawFieldType.getValue();
61+
this.semanticInfoFieldName = builder.semanticInfoFieldName.getValue();
62+
this.searchModelId = builder.searchModelId.getValue();
63+
}
64+
65+
@Override
66+
public Builder getMergeBuilder() {
67+
Builder semanticFieldMapperBuilder = (Builder) new Builder(simpleName()).init(this);
68+
ParametrizedFieldMapper.Builder delegateBuilder = delegateFieldMapper.getMergeBuilder();
69+
semanticFieldMapperBuilder.setDelegateBuilder(delegateBuilder);
70+
return semanticFieldMapperBuilder;
71+
}
72+
73+
@Override
74+
public final ParametrizedFieldMapper merge(Mapper mergeWith) {
75+
if (mergeWith instanceof SemanticFieldMapper) {
76+
try {
77+
delegateFieldMapper = delegateFieldMapper.merge(((SemanticFieldMapper) mergeWith).delegateFieldMapper);
78+
} catch (IllegalArgumentException e) {
79+
String err = "Failed to update the mapper ["
80+
+ this.name()
81+
+ "] because failed to update the delegate "
82+
+ "mapper for the raw_field_type "
83+
+ this.rawFieldType
84+
+ ". "
85+
+ e.getMessage();
86+
throw new IllegalArgumentException(err, e);
87+
}
88+
}
89+
return super.merge(mergeWith);
90+
}
91+
92+
@Override
93+
protected void parseCreateField(ParseContext context) throws IOException {
94+
delegateFieldMapper.parse(context);
95+
}
96+
97+
@Override
98+
protected String contentType() {
99+
return CONTENT_TYPE;
100+
}
101+
102+
public static class Builder extends ParametrizedFieldMapper.Builder {
103+
@Getter
104+
protected final Parameter<String> modelId = Parameter.stringParam(MODEL_ID, true, m -> ((SemanticFieldMapper) m).modelId, null);
105+
@Getter
106+
protected final Parameter<String> searchModelId = Parameter.stringParam(
107+
SEARCH_MODEL_ID,
108+
true,
109+
m -> ((SemanticFieldMapper) m).searchModelId,
110+
null
111+
);
112+
@Getter
113+
protected final Parameter<String> rawFieldType = Parameter.stringParam(
114+
RAW_FIELD_TYPE,
115+
false,
116+
m -> ((SemanticFieldMapper) m).rawFieldType,
117+
TextFieldMapper.CONTENT_TYPE
118+
);
119+
@Getter
120+
protected final Parameter<String> semanticInfoFieldName = Parameter.stringParam(
121+
SEMANTIC_INFO_FIELD_NAME,
122+
false,
123+
m -> ((SemanticFieldMapper) m).semanticInfoFieldName,
124+
null
125+
).acceptsNull();
126+
127+
@Setter
128+
protected ParametrizedFieldMapper.Builder delegateBuilder;
129+
private final SemanticFieldTypeFactory semanticFieldTypeFactory;
130+
131+
protected Builder(String name) {
132+
super(name);
133+
semanticFieldTypeFactory = new SemanticFieldTypeFactory();
134+
}
135+
136+
@Override
137+
protected List<Parameter<?>> getParameters() {
138+
return List.of(modelId, searchModelId, rawFieldType, semanticInfoFieldName);
139+
}
140+
141+
@Override
142+
public SemanticFieldMapper build(BuilderContext context) {
143+
ParametrizedFieldMapper delegateMapper = delegateBuilder.build(context);
144+
145+
MappedFieldType semanticFieldType = semanticFieldTypeFactory.createSemanticFieldType(
146+
delegateMapper,
147+
rawFieldType.getValue(),
148+
this
149+
);
150+
151+
return new SemanticFieldMapper(
152+
name,
153+
semanticFieldType,
154+
multiFieldsBuilder.build(this, context),
155+
copyTo.build(),
156+
delegateMapper,
157+
this
158+
);
159+
}
160+
}
161+
162+
public static class TypeParser implements Mapper.TypeParser {
163+
164+
private final static List<String> SUPPORTED_RAW_FIELD_TYPE = List.of(
165+
TextFieldMapper.CONTENT_TYPE,
166+
KeywordFieldMapper.CONTENT_TYPE,
167+
MatchOnlyTextFieldMapper.CONTENT_TYPE,
168+
WildcardFieldMapper.CONTENT_TYPE,
169+
TokenCountFieldMapper.CONTENT_TYPE,
170+
BinaryFieldMapper.CONTENT_TYPE
171+
);
172+
173+
@Override
174+
public Builder parse(String name, Map<String, Object> node, ParserContext parserContext) throws MapperParsingException {
175+
final String rawFieldType = (String) node.getOrDefault(RAW_FIELD_TYPE, TextFieldMapper.CONTENT_TYPE);
176+
177+
validateRawFieldType(rawFieldType);
178+
179+
final ParametrizedFieldMapper.TypeParser typeParser = (ParametrizedFieldMapper.TypeParser) parserContext.typeParser(
180+
rawFieldType
181+
);
182+
final Builder semanticFieldMapperBuilder = new Builder(name);
183+
184+
// semantic field mapper builder parse semantic fields
185+
Map<String, Object> semanticConfig = extractSemanticConfig(node, semanticFieldMapperBuilder.getParameters(), rawFieldType);
186+
semanticFieldMapperBuilder.parse(name, parserContext, semanticConfig);
187+
188+
// delegate field mapper builder parse remaining fields
189+
ParametrizedFieldMapper.Builder delegateBuilder = typeParser.parse(name, node, parserContext);
190+
semanticFieldMapperBuilder.setDelegateBuilder(delegateBuilder);
191+
192+
return semanticFieldMapperBuilder;
193+
}
194+
195+
private void validateRawFieldType(final String rawFieldType) {
196+
if (rawFieldType == null || !SUPPORTED_RAW_FIELD_TYPE.contains(rawFieldType)) {
197+
throw new IllegalArgumentException(
198+
RAW_FIELD_TYPE
199+
+ ": ["
200+
+ rawFieldType
201+
+ "] is not supported. It "
202+
+ "should be one of ["
203+
+ String.join(",", SUPPORTED_RAW_FIELD_TYPE)
204+
+ "]"
205+
);
206+
}
207+
}
208+
209+
private Map<String, Object> extractSemanticConfig(Map<String, Object> node, List<Parameter<?>> parameters, String rawFieldType) {
210+
final Map<String, Object> semanticConfig = new HashMap<>();
211+
for (Parameter<?> parameter : parameters) {
212+
Object config = node.get(parameter.name);
213+
if (config != null) {
214+
semanticConfig.put(parameter.name, config);
215+
node.remove(parameter.name);
216+
}
217+
}
218+
semanticConfig.put(FieldConstants.TYPE, SemanticFieldMapper.CONTENT_TYPE);
219+
node.put(FieldConstants.TYPE, rawFieldType);
220+
return semanticConfig;
221+
}
222+
}
223+
224+
@Override
225+
protected void doXContentBody(XContentBuilder builder, boolean includeDefaults, Params params) throws IOException {
226+
builder.field(FieldConstants.TYPE, contentType());
227+
228+
// semantic parameters
229+
final List<Parameter<?>> parameters = getMergeBuilder().getParameters();
230+
for (Parameter<?> parameter : parameters) {
231+
// By default, we will not return the default value. But raw_field_type is useful info to let users know how
232+
// we will handle the raw data. So we explicitly return it even it is using the default value.
233+
if (RAW_FIELD_TYPE.equals(parameter.name)) {
234+
parameter.toXContent(builder, true);
235+
} else {
236+
parameter.toXContent(builder, includeDefaults);
237+
}
238+
}
239+
240+
// non-semantic parameters
241+
// semantic field mapper itself does not handle multi fields or copy to. The delegate field mapper will handle it.
242+
delegateFieldMapper.multiFields().toXContent(builder, params);
243+
delegateFieldMapper.copyTo().toXContent(builder, params);
244+
delegateFieldMapper.getMergeBuilder().toXContent(builder, includeDefaults);
245+
}
246+
}

0 commit comments

Comments
 (0)