Skip to content

Commit a0c82c6

Browse files
vibrantvarunshatejaschishuiconggguanmartin-gaievski
authored
Rebasing with main (#826)
* Adds method_parameters in neural search query to support ef_search (#787) (#814) Signed-off-by: Tejas Shah <shatejas@amazon.com> * Add BWC for batch ingestion (#769) * Add BWC for batch ingestion Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Update Changelog Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Fix spotlessLicenseCheck Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Fix comments Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Reuse the same code Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rename some functions Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rename a function Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Minor change to trigger rebuild Signed-off-by: Liyun Xiu <xiliyun@amazon.com> --------- Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Neural sparse query two-phase search processor's bwc test (#777) * Poc of pipeline Signed-off-by: conggguan <congguan@amazon.com> * Complete some settings for two phase pipeline. Signed-off-by: conggguan <congguan@amazon.com> * Change the implement of two-phase from QueryBuilderVistor to custom process funciton. Signed-off-by: conggguan <congguan@amazon.com> * Add It and fix some bug on the state of multy same neuralsparsequerybuilder. Signed-off-by: conggguan <congguan@amazon.com> * Simplify some logic, and correct some format. Signed-off-by: conggguan <congguan@amazon.com> * Optimize some format. Signed-off-by: conggguan <congguan@amazon.com> * Add some test case. Signed-off-by: conggguan <congguan@amazon.com> * Optimize some logic for zhichao-aws's comments. Signed-off-by: conggguan <congguan@amazon.com> * Optimize a line without application. Signed-off-by: conggguan <congguan@amazon.com> * Add some comments, remove some redundant lines, fix some format. Signed-off-by: conggguan <congguan@amazon.com> * Remove a redundant null check, fix a if format. Signed-off-by: conggguan <congguan@amazon.com> * Fix a typo for a comment, camelcase format for some variable. Signed-off-by: conggguan <congguan@amazon.com> * Add some comments to illustrate the influence of the modify on 2-phase search pipeline to neural sparse query builder. Signed-off-by: conggguan <congguan@amazon.com> * Add restart and rolling upgrade bwc test for neural sparse two phase processor. Signed-off-by: conggguan <congguan@amazon.com> * Spotless on qa. Signed-off-by: conggguan <congguan@amazon.com> * Update change log for two-phase BWC test. Signed-off-by: conggguan <congguan@amazon.com> * Remove redundant lines of two-phase BWC test. Signed-off-by: conggguan <congguan@amazon.com> * Add changelog. Signed-off-by: conggguan <congguan@amazon.com> * Add the PR link and number for the CHANGELOG.md. Signed-off-by: conggguan <congguan@amazon.com> * [Fix] NeuralSparseTwoPhaseProcessorIT created wrong ingest pipeline, fix it to correct API. Signed-off-by: conggguan <congguan@amazon.com> --------- Signed-off-by: conggguan <congguan@amazon.com> Signed-off-by: conggguan <157357330+conggguan@users.noreply.github.com> * Enable '.' for nested field in text embedding processor (#811) * Added nested structure for text embed processor mapping Signed-off-by: Martin Gaievski <gaievski@amazon.com> * Fix linux build CI error due to action runner env upgrade node 20 (#821) * Fix linux build CI error due to action runner env upgrade node 20 Signed-off-by: Varun Jain <varunudr@amazon.com> * Fix linux build on additional integ tests Signed-off-by: Varun Jain <varunudr@amazon.com> --------- Signed-off-by: Varun Jain <varunudr@amazon.com> --------- Signed-off-by: Tejas Shah <shatejas@amazon.com> Signed-off-by: Liyun Xiu <xiliyun@amazon.com> Signed-off-by: conggguan <congguan@amazon.com> Signed-off-by: conggguan <157357330+conggguan@users.noreply.github.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com> Signed-off-by: Varun Jain <varunudr@amazon.com> Co-authored-by: Tejas Shah <shatejas@amazon.com> Co-authored-by: Liyun Xiu <chishui2@gmail.com> Co-authored-by: conggguan <157357330+conggguan@users.noreply.github.com> Co-authored-by: Martin Gaievski <gaievski@amazon.com>
1 parent ded2788 commit a0c82c6

35 files changed

+1101
-119
lines changed

.github/workflows/CI.yml

+2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ on:
1010
branches:
1111
- "*"
1212
- "feature/**"
13+
env:
14+
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
1315

1416
jobs:
1517
Get-CI-Image-Tag:

.github/workflows/test_aggregations.yml

+2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ on:
1010
branches:
1111
- "*"
1212
- "feature/**"
13+
env:
14+
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
1315

1416
jobs:
1517
Get-CI-Image-Tag:

.github/workflows/test_security.yml

+2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ on:
1010
branches:
1111
- "*"
1212
- "feature/**"
13+
env:
14+
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
1315

1416
jobs:
1517
Get-CI-Image-Tag:

CHANGELOG.md

+4
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
1515
## [Unreleased 2.x](https://github.com/opensearch-project/neural-search/compare/2.15...2.x)
1616
### Features
1717
### Enhancements
18+
- Adds dynamic knn query parameters efsearch and nprobes [#814](https://github.com/opensearch-project/neural-search/pull/814/)
19+
- Enable '.' for nested field in text embedding processor ([#811](https://github.com/opensearch-project/neural-search/pull/811))
1820
### Bug Fixes
1921
### Infrastructure
22+
- Add BWC for batch ingestion ([#769](https://github.com/opensearch-project/neural-search/pull/769))
23+
- Add backward test cases for neural sparse two phase processor ([#777](https://github.com/opensearch-project/neural-search/pull/777))
2024
### Documentation
2125
### Maintenance
2226
### Refactoring

qa/restart-upgrade/build.gradle

+18-2
Original file line numberDiff line numberDiff line change
@@ -90,10 +90,18 @@ task testAgainstOldCluster(type: StandaloneRestIntegTestTask) {
9090
}
9191
}
9292

93-
// Excluding the k-NN radial search tests because we introduce this feature in 2.14
93+
// Excluding the k-NN radial search tests and batch ingestion tests because we introduce these features in 2.14
9494
if (ext.neural_search_bwc_version.startsWith("2.9") || ext.neural_search_bwc_version.startsWith("2.10") || ext.neural_search_bwc_version.startsWith("2.11") || ext.neural_search_bwc_version.startsWith("2.12") || ext.neural_search_bwc_version.startsWith("2.13")){
9595
filter {
9696
excludeTestsMatching "org.opensearch.neuralsearch.bwc.KnnRadialSearchIT.*"
97+
excludeTestsMatching "org.opensearch.neuralsearch.bwc.BatchIngestionIT.*"
98+
}
99+
}
100+
101+
// Excluding the NeuralSparseQuery two-phase search pipeline tests because we introduce this feature in 2.15
102+
if (ext.neural_search_bwc_version.startsWith("2.9") || ext.neural_search_bwc_version.startsWith("2.10") || ext.neural_search_bwc_version.startsWith("2.11") || ext.neural_search_bwc_version.startsWith("2.12") || ext.neural_search_bwc_version.startsWith("2.13") || ext.neural_search_bwc_version.startsWith("2.14")){
103+
filter {
104+
excludeTestsMatching "org.opensearch.neuralsearch.bwc.NeuralSparseTwoPhaseProcessorIT.*"
97105
}
98106
}
99107

@@ -146,10 +154,18 @@ task testAgainstNewCluster(type: StandaloneRestIntegTestTask) {
146154
}
147155
}
148156

149-
// Excluding the k-NN radial search tests because we introduce this feature in 2.14
157+
// Excluding the k-NN radial search tests and batch ingestion tests because we introduce these features in 2.14
150158
if (ext.neural_search_bwc_version.startsWith("2.9") || ext.neural_search_bwc_version.startsWith("2.10") || ext.neural_search_bwc_version.startsWith("2.11") || ext.neural_search_bwc_version.startsWith("2.12") || ext.neural_search_bwc_version.startsWith("2.13")){
151159
filter {
152160
excludeTestsMatching "org.opensearch.neuralsearch.bwc.KnnRadialSearchIT.*"
161+
excludeTestsMatching "org.opensearch.neuralsearch.bwc.BatchIngestionIT.*"
162+
}
163+
}
164+
165+
// Excluding the NeuralSparseQuery two-phase search pipeline tests because we introduce this feature in 2.15
166+
if (ext.neural_search_bwc_version.startsWith("2.9") || ext.neural_search_bwc_version.startsWith("2.10") || ext.neural_search_bwc_version.startsWith("2.11") || ext.neural_search_bwc_version.startsWith("2.12") || ext.neural_search_bwc_version.startsWith("2.13") || ext.neural_search_bwc_version.startsWith("2.14")){
167+
filter {
168+
excludeTestsMatching "org.opensearch.neuralsearch.bwc.NeuralSparseTwoPhaseProcessorIT.*"
153169
}
154170
}
155171

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.bwc;
6+
7+
import org.opensearch.neuralsearch.util.TestUtils;
8+
9+
import java.nio.file.Files;
10+
import java.nio.file.Path;
11+
import java.util.List;
12+
import java.util.Map;
13+
14+
import static org.opensearch.neuralsearch.util.BatchIngestionUtils.prepareDataForBulkIngestion;
15+
import static org.opensearch.neuralsearch.util.TestUtils.NODES_BWC_CLUSTER;
16+
import static org.opensearch.neuralsearch.util.TestUtils.SPARSE_ENCODING_PROCESSOR;
17+
18+
public class BatchIngestionIT extends AbstractRestartUpgradeRestTestCase {
19+
private static final String PIPELINE_NAME = "pipeline-BatchIngestionIT";
20+
private static final String TEXT_FIELD_NAME = "passage_text";
21+
private static final String EMBEDDING_FIELD_NAME = "passage_embedding";
22+
private static final int batchSize = 3;
23+
24+
public void testBatchIngestionWithNeuralSparseProcessor_E2EFlow() throws Exception {
25+
waitForClusterHealthGreen(NODES_BWC_CLUSTER);
26+
String indexName = getIndexNameForTest();
27+
if (isRunningAgainstOldCluster()) {
28+
String modelId = uploadSparseEncodingModel();
29+
loadModel(modelId);
30+
createPipelineForSparseEncodingProcessor(modelId, PIPELINE_NAME);
31+
createIndexWithConfiguration(
32+
indexName,
33+
Files.readString(Path.of(classLoader.getResource("processor/SparseIndexMappings.json").toURI())),
34+
PIPELINE_NAME
35+
);
36+
List<Map<String, String>> docs = prepareDataForBulkIngestion(0, 5);
37+
bulkAddDocuments(indexName, TEXT_FIELD_NAME, PIPELINE_NAME, docs, batchSize);
38+
validateDocCountAndInfo(indexName, 5, () -> getDocById(indexName, "4"), EMBEDDING_FIELD_NAME, Map.class);
39+
} else {
40+
String modelId = null;
41+
modelId = TestUtils.getModelId(getIngestionPipeline(PIPELINE_NAME), SPARSE_ENCODING_PROCESSOR);
42+
loadModel(modelId);
43+
try {
44+
List<Map<String, String>> docs = prepareDataForBulkIngestion(5, 5);
45+
bulkAddDocuments(indexName, TEXT_FIELD_NAME, PIPELINE_NAME, docs, batchSize);
46+
validateDocCountAndInfo(indexName, 10, () -> getDocById(indexName, "9"), EMBEDDING_FIELD_NAME, Map.class);
47+
} finally {
48+
wipeOfTestResources(indexName, PIPELINE_NAME, modelId, null);
49+
}
50+
}
51+
}
52+
53+
}

qa/restart-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/HybridSearchIT.java

+12-3
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import java.util.Arrays;
1111
import java.util.List;
1212
import java.util.Map;
13+
1314
import org.opensearch.index.query.MatchQueryBuilder;
1415
import static org.opensearch.neuralsearch.util.TestUtils.getModelId;
1516
import static org.opensearch.neuralsearch.util.TestUtils.NODES_BWC_CLUSTER;
@@ -69,6 +70,7 @@ private void validateNormalizationProcessor(final String fileName, final String
6970
loadModel(modelId);
7071
addDocuments(getIndexNameForTest(), false);
7172
validateTestIndex(modelId, getIndexNameForTest(), searchPipelineName);
73+
validateTestIndex(modelId, getIndexNameForTest(), searchPipelineName, Map.of("ef_search", 100));
7274
} finally {
7375
wipeOfTestResources(getIndexNameForTest(), pipelineName, modelId, searchPipelineName);
7476
}
@@ -96,10 +98,14 @@ private void createSearchPipeline(final String pipelineName) {
9698
);
9799
}
98100

99-
private void validateTestIndex(final String modelId, final String index, final String searchPipeline) throws Exception {
101+
private void validateTestIndex(final String modelId, final String index, final String searchPipeline) {
102+
validateTestIndex(modelId, index, searchPipeline, null);
103+
}
104+
105+
private void validateTestIndex(final String modelId, final String index, final String searchPipeline, Map<String, ?> methodParameters) {
100106
int docCount = getDocCount(index);
101107
assertEquals(6, docCount);
102-
HybridQueryBuilder hybridQueryBuilder = getQueryBuilder(modelId);
108+
HybridQueryBuilder hybridQueryBuilder = getQueryBuilder(modelId, methodParameters);
103109
Map<String, Object> searchResponseAsMap = search(index, hybridQueryBuilder, null, 1, Map.of("search_pipeline", searchPipeline));
104110
assertNotNull(searchResponseAsMap);
105111
int hits = getHitCount(searchResponseAsMap);
@@ -110,12 +116,15 @@ private void validateTestIndex(final String modelId, final String index, final S
110116
}
111117
}
112118

113-
private HybridQueryBuilder getQueryBuilder(final String modelId) {
119+
private HybridQueryBuilder getQueryBuilder(final String modelId, Map<String, ?> methodParameters) {
114120
NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder();
115121
neuralQueryBuilder.fieldName("passage_embedding");
116122
neuralQueryBuilder.modelId(modelId);
117123
neuralQueryBuilder.queryText(QUERY);
118124
neuralQueryBuilder.k(5);
125+
if (methodParameters != null) {
126+
neuralQueryBuilder.methodParameters(methodParameters);
127+
}
119128

120129
MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("text", QUERY);
121130

qa/restart-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/KnnRadialSearchIT.java

+2
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ private void validateIndexQuery(final String modelId) {
6060
null,
6161
0.01f,
6262
null,
63+
null,
6364
null
6465
);
6566
Map<String, Object> responseWithMinScoreQuery = search(getIndexNameForTest(), neuralQueryBuilderWithMinScoreQuery, 1);
@@ -74,6 +75,7 @@ private void validateIndexQuery(final String modelId) {
7475
100000f,
7576
null,
7677
null,
78+
null,
7779
null
7880
);
7981
Map<String, Object> responseWithMaxDistanceQuery = search(getIndexNameForTest(), neuralQueryBuilderWithMaxDistanceQuery, 1);

qa/restart-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/MultiModalSearchIT.java

+1
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ private void validateTestIndex(final String modelId) throws Exception {
6262
null,
6363
null,
6464
null,
65+
null,
6566
null
6667
);
6768
Map<String, Object> response = search(getIndexNameForTest(), neuralQueryBuilder, 1);
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.bwc;
6+
7+
import org.opensearch.common.settings.Settings;
8+
import org.opensearch.neuralsearch.query.NeuralSparseQueryBuilder;
9+
import org.opensearch.neuralsearch.util.TestUtils;
10+
11+
import java.nio.file.Files;
12+
import java.nio.file.Path;
13+
import java.util.List;
14+
15+
import static org.opensearch.neuralsearch.util.TestUtils.NODES_BWC_CLUSTER;
16+
import static org.opensearch.neuralsearch.util.TestUtils.SPARSE_ENCODING_PROCESSOR;
17+
18+
public class NeuralSparseTwoPhaseProcessorIT extends AbstractRestartUpgradeRestTestCase {
19+
20+
private static final String NEURAL_SPARSE_INGEST_PIPELINE_NAME = "nstp-nlp-ingest-pipeline-dense";
21+
private static final String NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME = "nstp-nlp-two-phase-search-pipeline-sparse";
22+
private static final String TEST_ENCODING_FIELD = "passage_embedding";
23+
private static final String TEST_TEXT_FIELD = "passage_text";
24+
private static final String TEXT_1 = "Hello world a b";
25+
26+
public void testNeuralSparseQueryTwoPhaseProcessor_NeuralSearch_E2EFlow() throws Exception {
27+
waitForClusterHealthGreen(NODES_BWC_CLUSTER);
28+
NeuralSparseQueryBuilder neuralSparseQueryBuilder = new NeuralSparseQueryBuilder().fieldName(TEST_ENCODING_FIELD).queryText(TEXT_1);
29+
if (isRunningAgainstOldCluster()) {
30+
String modelId = uploadSparseEncodingModel();
31+
loadModel(modelId);
32+
neuralSparseQueryBuilder.modelId(modelId);
33+
createPipelineForSparseEncodingProcessor(modelId, NEURAL_SPARSE_INGEST_PIPELINE_NAME);
34+
createIndexWithConfiguration(
35+
getIndexNameForTest(),
36+
Files.readString(Path.of(classLoader.getResource("processor/SparseIndexMappings.json").toURI())),
37+
NEURAL_SPARSE_INGEST_PIPELINE_NAME
38+
);
39+
addSparseEncodingDoc(getIndexNameForTest(), "0", List.of(), List.of(), List.of(TEST_TEXT_FIELD), List.of(TEXT_1));
40+
createNeuralSparseTwoPhaseSearchProcessor(NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME);
41+
updateIndexSettings(
42+
getIndexNameForTest(),
43+
Settings.builder().put("index.search.default_pipeline", NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME)
44+
);
45+
Object resultWith2PhasePipeline = search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits");
46+
assertNotNull(resultWith2PhasePipeline);
47+
} else {
48+
String modelId = null;
49+
try {
50+
modelId = TestUtils.getModelId(getIngestionPipeline(NEURAL_SPARSE_INGEST_PIPELINE_NAME), SPARSE_ENCODING_PROCESSOR);
51+
loadModel(modelId);
52+
neuralSparseQueryBuilder.modelId(modelId);
53+
Object resultWith2PhasePipeline = search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits");
54+
assertNotNull(resultWith2PhasePipeline);
55+
} finally {
56+
wipeOfTestResources(
57+
getIndexNameForTest(),
58+
NEURAL_SPARSE_INGEST_PIPELINE_NAME,
59+
modelId,
60+
NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME
61+
);
62+
}
63+
}
64+
}
65+
}

qa/restart-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/TextChunkingProcessorIT.java

+14-13
Original file line numberDiff line numberDiff line change
@@ -56,20 +56,21 @@ private void createChunkingIndex(String indexName) throws Exception {
5656
createIndexWithConfiguration(indexName, indexSetting, PIPELINE_NAME);
5757
}
5858

59-
private void validateTestIndex(String indexName, String fieldName, int documentCount, Object expected) {
60-
int docCount = getDocCount(indexName);
61-
assertEquals(documentCount, docCount);
59+
private Map<String, Object> getFirstDocumentInQuery(String indexName, int resultSize) {
6260
MatchAllQueryBuilder query = new MatchAllQueryBuilder();
63-
Map<String, Object> searchResults = search(indexName, query, 10);
61+
Map<String, Object> searchResults = search(indexName, query, resultSize);
6462
assertNotNull(searchResults);
65-
Map<String, Object> document = getFirstInnerHit(searchResults);
66-
assertNotNull(document);
67-
Object documentSource = document.get("_source");
68-
assert (documentSource instanceof Map);
69-
@SuppressWarnings("unchecked")
70-
Map<String, Object> documentSourceMap = (Map<String, Object>) documentSource;
71-
assert (documentSourceMap).containsKey(fieldName);
72-
Object ingestOutputs = documentSourceMap.get(fieldName);
73-
assertEquals(expected, ingestOutputs);
63+
return getFirstInnerHit(searchResults);
64+
}
65+
66+
private void validateTestIndex(String indexName, String fieldName, int documentCount, Object expected) {
67+
Object outputs = validateDocCountAndInfo(
68+
indexName,
69+
documentCount,
70+
() -> getFirstDocumentInQuery(indexName, 10),
71+
fieldName,
72+
List.class
73+
);
74+
assertEquals(expected, outputs);
7475
}
7576
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"request_processors": [
3+
{
4+
"neural_sparse_two_phase_processor": {
5+
"tag": "neural-sparse",
6+
"description": "This processor is making two-phase rescorer.",
7+
"enabled": true,
8+
"two_phase_parameter": {
9+
"prune_ratio": %f,
10+
"expansion_rate": %f,
11+
"max_window_size": %d
12+
}
13+
}
14+
}
15+
]
16+
}

0 commit comments

Comments
 (0)