[RFC] Lower bound for min-max normalization technique in Hybrid query #1189

martin-gaievski · 2025-02-18T17:45:16Z

Introduction

This document describes details of design for Explainability in Hybrid Query. This feature has been requested through GitHub issues #150 and #299.

Overview

Hybrid search combines multiple query types, like keyword and neural search, to improve search relevance. In version 2.11 team released hybrid query which is part of the neural-search plugin. Main responsibility of the hybrid query is to return combined scores of multiple queries. In common scenario those queries represent different search types, like lexical and semantic based.

Hybrid query uses multiple techniques for preparing final list of matched document, main two types are score based normalization and rank base combination. For score base normalization, the most effective technique is min-max normalization. In scope of this proposal we want to improve search relevance of min-max normalization by allowing setting of lower bound.

Problem Statement

Min-max normalization technique is based on the usage of maximum and minimum scores from all matched documents with the following formula.

normalizedScore = (score - minScore) / (maxScore - minScore);

In context of OpenSearch, finding the minimum score is based on assumption that may be not the most effective one. While handling search request, system retrieves limited amount of matching documents from each shard, this limit is defined by the query parameter size. Minimum score will be identified as minimum from all score from all collected documents. In case overall number of matching documents is much higher then number of retrieved documents, then the delta between real and retrieved minimum scores can be significant. This will negatively influence the final normalized score.

Following graphs illustrate described scenario, in shard 1 retrieved min score is 4.0, while actual lower bound is 0.0. Similarly for shard 2 retrieved and lower bound scores are 2.0 and 1.0.

Requirements

Functional Requirements

We want to introduce a lower limit that reflects the actual minimum score for matching document.

Functional Requirements

User defines the lower bound score for each sub-query. It should be possible to have mixed configuration, where some sub-queries use the lower bound and some still use today approach.
Lower bound score applies to existing min-max normalization technique in hybrid query/normalization processor. For other techniques we should block this feature.
All changes in interface should be minimal, backward compatible and aligned with existing conventions. New change should not replace the default behavior.
Should support/not collide with existing hybrid query features, e.g. pagination, explain etc.

Non functional requirements

Minimal regression in a performance of hybrid query, not more then 2% latency and 2% in RAM consumption for coordinator role node

Current state

Today normalization in hybrid query is performed by the normalization processor which is of phase results processor type. Search pipeline with this processor can be defined with following request, see more details of how normalization processor can be configured here

PUT /_search/pipeline/nlp-search-pipeline
{
    "description": "Normalization processor for hybrid search",
    "phase_results_processors": [
        {
            "normalization-processor": {
                "normalization": {
                    "technique": "min_max"
                },
                "combination": {
                    "technique": "arithmetic_mean",
                    "parameters": {
                        "weights": [
                            0.8,
                            0.2
                        ]
                    }
                }
            }
        }
    ]
}

Following formula used in min-max technique

normalizedScore = (score - minScore) / (maxScore - minScore);

where min score is min(scores[0 ... size)) and max score is max(scores[0 ... size))

Min-max technique uses all matched documents from all shards to find maximum and minimum scores. OpenSearch retrieve up to size number of documents from each shard, for most scenarios they will be sorted in desc order, documents with lower scores will be dropped. While the maximum score will be used by the processor is one of the retrieved documents, real minimum score can be outside of the retrieved sub-set of documents.

Following table shows one example of such scenario when size for the query is set to 3, and there are more than size matching documents in each shard

Query params	Nodes	DocIds	BM-25 scores	K-NN	BM25 collected docIds/scores	K-NN collected docIds/scores
size = 5	Data Node-1	d1	30	1.5	d5 - 80	d3 - 5
	d2	25	2.5	d1 - 20	d5 - 3
	d3		5	d2 - 10	d2 - 2.5
	d4		1		d1 - 1.5
	d5	80	3		d4 - 1
	Data Node-2	d6		2	d10 - 100	d8 - 4.2
	d7	70	1.2	d7 - 70	d9 - 3.3
	d8		4.2		d10 - 2.7
	d9		3.3		d6 - 2
	d10	100	2.7		d7 - 1.2

	Coordinator Node	Global BM-25	Global Norm BM-25	Global KNN	Global Norm KNN	Global combined results	Global combined sorted results
		d10 - 100	1.00	d3 - 5	1.000	d1: 0.1175	d10: 0.7125
		d5 - 80	0.73	d8 - 4.2	0.800	d2: 0.1875	d5: 0.6150
		d7 - 70	0.60	d9 - 3.3	0.575	d3: 0.50	d3: 0.5
		d1 - 30	0.07	d5 - 3	0.500	d4: 0.0	d8: 0.4
		d2 - 25	0.00	d10 - 2.7	0.425	d5: 0.64	d7: 0.3250
				d2 - 2.5	0.375	d6: 0.125	d9: 0.2875
				d6 - 2.0	0.250	d7: 0.36	d2: 0.1875
				d1 - 1.5	0.125	d8: 0.40	d6: 0.125
				d7 - 1.2	0.050	d9: 0.2875	d1: 0.0975
				d4 - 1.0	0.000	d10: 0.7125	d4: 0.0
			min/max 25/100		min/max 1.0/5

Two major issues are:

documents with min score (e.g. d4) are dropped from the final result despite them having non-zero score. In relaxed version of the same problem document receives low score in case it matches multiple different sub-queries (e.g. d2)
normalized score does not reflect the scale of actual difference in scores between min and max scores. In the example table for BM25 the min score is 1/4 of the max score, and for KNN it’s 1/5. After normalization they both will be at the same scale of 0.0 to 1.0. For user such behavior may be counterintuitive because min score document in KNN results is “closer“ to top document comparing to min score document in BM25 results.

Challenges

Retrieve actual lower bound score for the query
This is a problem when number of matched documents at shard level is greater then the size . In this case actual min score will not be the part of the document set, and in case if matched document number is higher then the max size limit it will no be even collected. Example is knn query, where for typical configuration any document will have some positive score.

We can perform exhaustive search and retrieve all matching documents at shard level. Problem with this approach is performance degradation, for big datasets latency can drop drastically, and memory consumption can be high. Based on these consideration we recommend to avoid exhaustive retrieval.

How to deal with documents that have score lower then the lower bound
If we use any lower bound value that is not based on actual data that can lead to scenario when document may have scores that are lower then the lower bound.
There are multiple way of howe to address this, we can:

use lower bound for all score that are lower
drop such documents
keep the document scores but penalize them using certain techniques like decay

Solution Overview

Implementation of the lower bound score in the context of calculations is straightforward, with our change calculation will look like this:

float normalizedScore = (score - customMinScore) / (maxScore - customMinScore);

Essentially we replaced the actual minScore by the user provided number. Value for minimum score can be set as new parameter for normalization technique. We can use format that is based on the position (index) of sub-query, similar to existing weights parameter.

These changes will be implemented at the phase result processor level. This component is responsible for running computations when the min-max technique is set up by the user. The following diagram shows the high-level components and the specific location where the change needs to be made.

To avoid confusion with existing OpenSearch feature with the same name min_score I suggest we pick different name.

Recommended name is lower_bound.

Expert level configuration for lower bound

The configuration of the lower bound for the min-max normalization technique is considered an expert-level task. This feature is part of the search pipeline setup and should be used with caution, as improper configuration can lead to less relevant hybrid scores. Determining the optimal lower bound value requires a deep understanding of the data distribution and the specific search use case. It involves analyzing the score distribution and experimenting with different values to find the most effective lower bound. Incorrect configuration can significantly impact the relevance of the search results, and the computation of the lower bound may introduce additional latency and resource consumption. Users should be aware of these potential impacts and monitor their systems accordingly.

To give users maximum flexibility we are going to allow user to configure lower_bounds at sub-query level, and skip/don’t apply it if needed.

Option1: Configurable score retention or clipping [Recommended]

Pros:

better relevancy metrics (NDCG)
simple and efficient in terms of resource utilization

Cons:

more new parameters in processor definition

It’s possible that actual shard level scores are lower then the lower bound score we defined. In this case we can do one of following actions:

return actual score
drop the actual score and return score defined as lower bound (vanilla clipping)

Following graphs illustrate ho the lower bound works when:

all scores are greater then lower bound
some scores are lower then lower bound, and how the clipping can be applied
- without penalizing scores
- with score penalization

I have done POC and collected NDCG metric values for several datasets, following table shows these results

Summary of experiments

dataset	baseline
	NDCG@5	NDCG@10	NDCG@100
trec-covid	0.6602	0.6002	0.3427
nfcorpus	0.3627	0.3285	0.2946
arguana	0.4504	0.4987	0.4683
fiqa	0.2645	0.292	0.3565

	min score with clipping
	NDCG@5	NDCG@10	NDCG@100
trec-covid	0.7279	0.6727	0.4876
nfcorpus	0.365	0.329	0.2881
arguana	0.4306	0.4801	0.5274
fiqa	0.2519	0.2726	0.3292

	0.0677	0.0725	0.1449
	0.0023	0.0005	-0.0065
	-0.0198	-0.0186	0.0591
	-0.0126	-0.0194	-0.0273
		AVG	0.02023
	min score with fixed decay penalty
	NDCG@5	NDCG@10	NDCG@100
trec-covid	0.7372	0.6798	0.4916
nfcorpus	0.348	0.3142	0.2836
arguana	0.441	0.4877	0.5323
fiqa	0.2547	0.2752	0.3336

	0.077	0.0796	0.1489
	-0.0147	-0.0143	-0.011
	-0.0094	-0.011	0.064
	-0.0098	-0.0168	-0.0229
		AVG	0.02163
	min score with adaptive decay penalty
	NDCG@5	NDCG@10	NDCG@100
trec-covid	0.7408	0.6922	0.5052
nfcorpus	0.3653	0.3318	0.2965
arguana	0.4523	0.5004	0.544
fiqa	0.271	0.2946	0.3573

	0.0806	0.092	0.1625
	0.0026	0.0033	0.0019
	0.0019	0.0017	0.0757
	0.0065	0.0026	0.0008
		AVG	0.03601
	min score with unaltered score retention
	NDCG@5	NDCG@10	NDCG@100
trec-covid	0.7405	0.693	0.5054
nfcorpus	0.3674	0.3346	0.297
arguana	0.4533	0.501	0.5451
fiqa	0.271	0.2946	0.3573

	0.0803	0.0928	0.1627
	0.0047	0.0061	0.0024
	0.0029	0.0023	0.0768
	0.0065	0.0026	0.0008
		AVG	0.03674

See appendix for detailed dataset statistics.

Based on the data from POC we recommend solution that uses lower bound for scores greater then the min_score, and uses actual score when it’s lower then the lower bound. Recommendation is to make this approach default, and clipping mode will be optional.

Solution with decay function that is based on IQR giving similar results, but is more computationally intense, os it’s not recommended.

API changes

New feature needs to be configurable by user. That can be done via technique parameters for the processor as part of search pipeline definition.

{
  "description": "Normalization processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max",
          "parameters": {
            "lower_bounds": [
              { 
                "mode": "apply",
                "min_score": 0.1
              }, 
                "mode": "clip",
                "min_score": 0.1
              }, 
                "mode": "ignore"
              }
            ]
          }
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}

Parameter details

name	type	description	default
lower_bounds	array of objects	Structure that holds lower bound details for each sub-query. Number of items must match the number of sub-queries.	empty array
mode	string	Controls how the lower bound should be applied to sub-queryPossible values are: apply - use min_score for normaliation but do not replace original scoresclip - replace scores below lower bound with the min_score valueignore - do not apply lower_bound at all for this sub-query	apply
min_score	float	Sets the actual value for the score lower bound. Allowed values are from -10,000 to 10,000	0.0

for completeness following request shows processor with all defaults, meaning we will apply lower bound with a min_score of 0.0 for all sub-queries

{
  "description": "Normalization processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max",
          "parameters": {
            "lower_bounds": [
            ]
          }
        },
        "combination": {
          "technique": "arithmetic_mean"
        }
      }
    }
  ]
}

Option 2: Clipping

We can just clip the low scores, meaning return the lower bound score if actual score is less then lower bound.

Request will look like following:

{
  "description": "Normalization processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max",
          "parameters": {
            "lower_bounds": [ 0.1, 0.05 ]
          }
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}

Pros:

simple, less changes in interface and less learning curve for users (but they still need to provide the value for lower bound score)
less compute intense (need to run benchmarks for exact numbers, presumable within the 1-2%)

Cons:

lower relevancy metrics (NDCG) because it ignores actual data distribution

Option 3: Clipping with IQR based decay

We can address low scores by keeping them but also applying penalty to low scores that are lower then “lower bound”.

There are many options for applying penalty to the low scores, some of the most popular and promising are:

fixed rate of decay
- pros:
  - simple and computationally efficient
- cons:
  - ignores the actual scores distribution
based on standard deviation
- pros:
  - depends on actual scores distribution
- cons:
  - sensitive to outliers
  - extra compute
based on interquartile range (IQR)
- pros:
  - depends on actual scores distribution, works well with skewed distributions
  - works well with skewed distributions, robust to outliers and unusual distributions
  - intuitive (represents the middle 50% of the data)
- cons:
  - extra compute
based on median absolute deviation (MAD)
- pros:
  - depends on actual scores distribution
  - robust to outliers and unusual distributions
- cons:
  - more computationally intense comparing to IQR

I have done POC and collected NDCG metric values for several datasets, following table shows these results, check table for exact numbers. Most promising is approach that’s based on IQR.

Pros:

good relevancy metrics (NDCG), takes into account data distribution

Cons:

more compute intense (need to run benchmarks for exact numbers, presumable within the 1-2%)
less intuitive for customers, can set higher barrier for feature adoption

User scenarios

Following data show how every solution option affects the final score. Initial score values are taken from this table that shows how score are calculated today

Lower bound is lower then the actual minimal score

Lower bound [0.0, 00]

Global BM-25	Global Norm BM-25	Global KNN	Global Norm KNN	Global combined results	Global combined sorted results	results with no lower bound (for reference)
d10 - 100	1.00	d3 - 5	1.00	d1: 0.3	d10: 0.77	d10: 0.7125
d5 - 80	0.80	d8 - 4.2	0.84	d2: 0.375	d5: 0.7	d5: 0.6150
d7 - 70	0.70	d9 - 3.3	0.66	d3: 0.5	d3: 0.5	d3: 0.5
d1 - 30	0.30	d5 - 3	0.60	d4: 0.1	d7: 0.47	d8: 0.4
d2 - 25	0.25	d10 - 2.7	0.54	d5: 0.7	d8: 0.42	d7: 0.3250
		d2 - 2.5	0.50	d6: 0.2	d2: 0.375	d9: 0.2875
		d6 - 2.0	0.40	d7: 0.47	d9: 0.33	d2: 0.1875
		d1 - 1.5	0.30	d8: 0.42	d1: 0.3	d6: 0.125
		d7 - 1.2	0.24	d9: 0.33	d6: 0.2	d1: 0.0975
		d4 - 1.0	0.20	d10: 0.77	d4: 0.1	d4: 0.0
	min/max 0/100		min/max 0.0/5

Lower bound is higher then min score, clipping enabled

Lower bound [30.0, 2.0]

Global BM-25	Global Norm BM-25	Global KNN	Global Norm KNN	Global combined results	Global combined sorted results	results with no lower bound (for reference)
d10 - 100	1.00	d3 - 5	1.00	d1: 0.0	d10: 0.6165	d10: 0.7125
d5 - 80	0.71	d8 - 4.2	0.73	d2: 0.0835	d5: 0.5215	d5: 0.6150
d7 - 70	0.57	d9 - 3.3	0.43	d3: 0.5	d3: 0.5	d3: 0.5
d1 - 30	0.00	d5 - 3	0.33	d4: 0.0	d8: 0.3665	d8: 0.4
d2 - 25	0.00	d10 - 2.7	0.23	d5: 0.5215	d7: 0.2850	d7: 0.3250
		d2 - 2.5	0.17	d6: 0.0	d9: 0.2165	d9: 0.2875
		d6 - 2.0	0.00	d7: 0.2850	d2: 0.0835	d2: 0.1875
		d1 - 1.5	0.00	d8: 0.3665	d1: 0.0	d6: 0.125
		d7 - 1.2	0.00	d9: 0.2165	d6: 0.0	d1: 0.0975
		d4 - 1.0	0.00	d10: 0.6165	d4: 0.0	d4: 0.0
	min/max 30/100		min/max 2.0/5

Lower bound is higher then min score, penalize score with decay function

Lower bound [30.0, 2.0]

decay rate based on standard deviation

Global BM-25	Global Norm BM-25	Global KNN	Global Norm KNN	Global combined results	Global combined sorted results	results with no lower bound (for reference)
d10 - 100	1.0	d3 - 5	1.00	d1: 0.2870	d10: 0.77	d10: 0.7125
d5 - 80	0.8	d8 - 4.2	0.84	d2: 0.3732	d5: 0.7	d5: 0.6150
d7 - 70	0.7	d9 - 3.3	0.66	d3: 0.5	d3: 0.5	d3: 0.5
d1 - 30	0.3	d5 - 3	0.60	d4: 0.0	d7: 0.451	d8: 0.4
d2 - 25	0.246	d10 - 2.7	0.54	d5: 0.5215	d8: 0.42	d7: 0.3250
		d2 - 2.5	0.50	d6: 0.0	d2: 0.3732	d9: 0.2875
		d6 - 2.0	0.40	d7: 0.2850	d9: 0.33	d2: 0.1875
		d1 - 1.5	0.274	d8: 0.3665	d1: 0.287	d6: 0.125
		d7 - 1.2	0.202	d9: 0.2165	d6: 0.2	d1: 0.0975
		d4 - 1.0	0.154	d10: 0.6165	d4: 0.077	d4: 0.0
standard deviation = 32.68	min/max 30/100	standard deviation = 1.31	min/max 2.0/5

Decay rate based on interquartile range (IQR)

Global BM-25	Global Norm BM-25	Global KNN	Global Norm KNN	Global combined results	Global combined sorted results	results with no lower bound (for reference)
d10 - 100	1.0	d3 - 5	1.00	d1: 0.2890	d10: 0.77	d10: 0.7125
d5 - 80	0.8	d8 - 4.2	0.84	d2: 0.3738	d5: 0.7	d5: 0.6150
d7 - 70	0.7	d9 - 3.3	0.66	d3: 0.5	d3: 0.5	d3: 0.5
d1 - 30	0.3	d5 - 3	0.6	d4: 0.083	d7: 0.455	d8: 0.4
d2 - 25	0.2476	d10 - 2.7	0.54	d5: 0.7	d8: 0.42	d7: 0.3250
		d2 - 2.5	0.5	d6: 0.2	d2: 0.3738	d9: 0.2875
		d6 - 2.0	0.4	d7: 0.455	d9: 0.33	d2: 0.1875
		d1 - 1.5	0.278	d8: 0.42	d1: 0.289	d6: 0.125
		d7 - 1.2	0.21	d9: 0.33	d6: 0.2	d1: 0.0975
		d4 - 1.0	0.166	d10: 0.77	d4: 0.083	d4: 0.0
standard deviation = 32.68	min/max 30/100	standard deviation = 1.31	min/max 2.0/5

Low Level Design

How to setup

We need minor adjustments in the Factory class to read and parse new parameters ScoreNormalizationFactory.

if lower_bounds not present ignore lower bound logic completed (today behavior)
if lower_bounds present, then read mode, apply default if neede; read min_score , if not there use 0.0 default value.
pass parameters to MinMaxScoreNormalizationTechnique class.
when query received validate if number of lower_bound elements matches the number of sub-queries in hybrid query.

How compute normalized scores

All logic related changes will be done in MinMaxScoreNormalizationTechnique class.

First we compute the minimum score depending on mode flag and min_score limit value

    private float[] getMinScores(final List<CompoundTopDocs> queryTopDocs, final int numOfScores) {
        float[] minScores = new float[numOfScores];
        Arrays.fill(minScores, Float.MAX_VALUE);
        for (CompoundTopDocs compoundQueryTopDocs : queryTopDocs) {
            if (Objects.isNull(compoundQueryTopDocs)) {
                continue;
            }
            List<TopDocs> topDocsPerSubQuery = compoundQueryTopDocs.getTopDocs();
            for (int j = 0; j < topDocsPerSubQuery.size(); j++) {
                if (applyLowerBounds) { // added logic we take min score from processor definition
                    minScores[j] = lowerBoundMinScores.get(j);
                }
                else { // logic we have today
                    minScores[j] = Math.min(
                            minScores[j],
                            Arrays.stream(topDocsPerSubQuery.get(j).scoreDocs)
                                    .map(scoreDoc -> scoreDoc.score)
                                    .min(Float::compare)
                                    .orElse(Float.MAX_VALUE)
                    );
                }
            }
        }
        return minScores;
    }

Changes for a single score normalization should be done in the normalizeSingleScore method

private float normalizeSingleScore(final float score, final float minScore, final float maxScore, LowerBound lowerBoundDTO) {
        if (Floats.compare(maxScore, minScore) == 0 && Floats.compare(maxScore, score) == 0) {
            return SINGLE_RESULT_SCORE;
        }
        if (!lowerBoundDTO.applyLowerBounds || lowerBoundDTO.mode == IGNORE) {
            // this is logic we have today, no changes there
            float normalizedScore = (score - minScore) / (maxScore - minScore);
            return normalizedScore == 0.0f ? MIN_SCORE : normalizedScore;
        }
        float normalizedScore;
        if (lowerBoundDTO.mode == APPLY && score < minScore) {
            // if mode is apply then we return the actual document score
            // in case of lower bounds it can be less then the min_score
            normalizedScore = score;
        } else if (lowerBoundDTO.mode == CLIP && score < minScore) {
            // alternative approach when we clip the score so it became a min score 
            normalizedScore = minScore;
        } else {
            // this aplies to most of the cases when score is greater than the min score
            normalizedScore = (score - minScore) / (maxScore - minScore);
        }
        return normalizedScore;
    }

Potential Issues

Knowing the lower bound that gives the most relevant results can be a challenging to a user. Existing logic provides decent results in general, so this parameters should be an expert level setting rather then a default recommendation. We should think of some sort of heuristic to retrieve most effective lower bound from within the indexed data.

Metrics

Adding specific metric is not possible at the moment, we should add one once stats API for neural is ready. It’s in design phase #1104 and #1146. As per early reviews of stats API (draft design) adding new metric will be straightforward, as simple as making one call to the static method.

Backward Compatibility

New solution is backward compatible with today approach: if no details are specified for lower bounds then actual shard level min score will be used.

Testability

New functionality should be covered by unit tests and integration tests. Unit test will take care of computation logic and edge cases in input data. Integration test will test the end to end flow, on test should be enough for sanity check.

Need full scale benchmarking to measure how this feature affects the relevancy and resource utilization. Some benchmarks were done as part of the POC, using 4 data sets, average improvement of NDCG is 3.5%

Appendix A

Dataset statistics

Dataset	Average query length	Median query Length	Average passage length	Median passage Length	Number of passages	Number of test queries
NFCorpus	3.29	2	22.098	224	3633	323
Trec-Covid	10.6	10	148.64	155	171332	50
ArguAna	193.55	174	164.19	147	8674	1406
FiQA	10.94	10	132.9	90	57638	648

References

[RFC] High Level Approach and Design For Normalization and Score Combination: [RFC] High Level Approach and Design For Normalization and Score Combination #126
Normalization processor in Opensearch documentation: https://opensearch.org/docs/latest/search-plugins/search-pipelines/normalization-processor
Original PR with min-max technique [FEATURE] Provide way of defining methods for score normalization and combination in scope of Hybrid search #228
An Analysis of Fusion Functions for Hybrid Retrieval by Sebastian Bruch
Blog post on details of score normalization in hybrid query: An overview of rank normalization in hybrid search
On the definition of penalty functions in data aggregation by Humberto Bustince Sola, Gleb Beliakov, Graçaliz Pereira Dimuro
Features and performance of some outlier detection methods by Barbato Giulio, Emanuele Modesto Barini, Gianfranco Genta
Interquartile Range

Feedback Required

We greatly value feedback from the community to ensure that this proposal addresses real-world use cases effectively. Here are a few specific points where your input would be particularly helpful:

Defaults for lower bounds
We plan to use defaults for the lower_bound feature, applying the lower bound score without a penalty and setting the default min_score to 0.0.
Are these defaults suitable for all query types?
Do you have any suggestions for alternative defaults?

Need for extra features
Should we consider adding extra features such as an upper_bound score?
What other features do you think would be beneficial?

Benefit of other techniques
Currently, we are adding the lower_bound feature to min-max normalization but not to L2 normalization.
Do you think it would be beneficial to add the lower_bound feature to L2 normalization as well?

Your insights will help us refine the proposal to better meet the needs of our users. Thank you for your valuable feedback!

The text was updated successfully, but these errors were encountered:

marcus-bcl · 2025-03-05T11:02:36Z

We have found that an upper bound would work well for our project too, if it could be applied to one or more of the queries. Or alternatively, a way to indicate that one set of scores is already normalised.

Currently we use a hybrid query, combining a radial KNN query with a min_score of 0.75, and a keyword query using BM25. The KNN query always returns results with a score in the range [0.75, 1.0], where we know that 0.75 corresponds to a poor match and that 1.0 is a very good match. However, we find that after normalisation poor matches are ranked higher than expected.

For example, if we get a string of low scoring results from the KNN query such as [0.77, 0.77, 0.76, 0.75, 0.75], then these are currently normalised to [1.0, 1.0, 0.5, 0.0, 0.0], which indicates the result with a cosine similarity of 0.77 is a strong match - when it is not. This results in weak KNN results ranking higher than strong keyword results.

If we could specify both a lower and upper bound, then we could set them to the known range coming back from the KNN search (i.e. lower=0.75, upper=1.0). In the previous example, the scores would then be normalised to [0.08, 0.08, 0.04, 0.0, 0.0] - preserving the fact that the results are weak matches and should be ranked lower.

Open to other ideas on how one might solve this of course!

martin-gaievski · 2025-03-05T17:45:11Z

@marcus-bcl thanks for your feedback, I see how raw min-max normalization results are not that great for your scenario. This is the fundamental limitation of the min-max normalization; it's challenging to improve it without knowing the potential upper bound. I believe our other normalization techniques, L2 and incoming z-score, will face similar issues.

I've created a feature request for adding upper_bound parameter for min-max #1210, please +1 if that request is something you're looking for.

martin-gaievski added enhancement RFC untriaged labels Feb 18, 2025

opensearch-infra bot added this to OpenSearch Roadmap Feb 18, 2025

github-project-automation bot moved this to New in OpenSearch Roadmap Feb 18, 2025

minalsha removed the untriaged label Feb 18, 2025

martin-gaievski changed the title ~~[Draft][RFC] Lower bound for min-max normalization technique in Hybrid query~~ [RFC] Lower bound for min-max normalization technique in Hybrid query Feb 18, 2025

martin-gaievski added hybrid search v3.0.0 v3.0.0 labels Feb 18, 2025

martin-gaievski removed this from OpenSearch Roadmap Feb 18, 2025

martin-gaievski mentioned this issue Feb 25, 2025

[Feature branch] Lower bounds for min-max normalization in hybrid query #1195

Merged

5 tasks

minalsha assigned martin-gaievski Feb 26, 2025

martin-gaievski mentioned this issue Feb 28, 2025

[DOC] Add min_max normalization technique to nueral search opensearch-project/documentation-website#9337

Open

4 tasks

martin-gaievski added v3.1.0 and removed v3.0.0 v3.0.0 labels Mar 3, 2025

martin-gaievski mentioned this issue Mar 6, 2025

Lower bounds for min-max normalization in hybrid query, feature->main #1213

Merged

4 tasks

martin-gaievski added v3.0.0 v3.0.0 and removed v3.1.0 labels Mar 6, 2025

minalsha added this to OpenSearch Roadmap Mar 6, 2025

github-project-automation bot moved this to New in OpenSearch Roadmap Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Lower bound for min-max normalization technique in Hybrid query #1189

[RFC] Lower bound for min-max normalization technique in Hybrid query #1189

martin-gaievski commented Feb 18, 2025 •

edited

Loading

marcus-bcl commented Mar 5, 2025 •

edited

Loading

martin-gaievski commented Mar 5, 2025

[RFC] Lower bound for min-max normalization technique in Hybrid query #1189

[RFC] Lower bound for min-max normalization technique in Hybrid query #1189

Comments

martin-gaievski commented Feb 18, 2025 • edited Loading

Introduction

Overview

Problem Statement

Requirements

Functional Requirements

Non functional requirements

Current state

Challenges

Solution Overview

Expert level configuration for lower bound

Option1: Configurable score retention or clipping [Recommended]

Option 2: Clipping

Option 3: Clipping with IQR based decay

User scenarios

Low Level Design

Potential Issues

Metrics

Backward Compatibility

Testability

Appendix A

Dataset statistics

References

Feedback Required

marcus-bcl commented Mar 5, 2025 • edited Loading

martin-gaievski commented Mar 5, 2025

martin-gaievski commented Feb 18, 2025 •

edited

Loading

marcus-bcl commented Mar 5, 2025 •

edited

Loading