Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NPE when LeafReader return null VectorValues #13162

Merged
merged 2 commits into from
Mar 11, 2024

Conversation

bugmakerrrrrr
Copy link
Contributor

Description

LeafReader#getXXXVectorValues may return null value.

Reproduction:

public class TestKnnByteVectorQuery extends BaseKnnVectorQueryTestCase {
  public void testVectorEncodingMismatch() throws IOException {
    try (Directory indexStore =
        getIndexStore("field", new float[] {0, 1}, new float[] {1, 2}, new float[] {0, 0});
        IndexReader reader = DirectoryReader.open(indexStore)) {
      AbstractKnnVectorQuery query =
          new KnnFloatVectorQuery("field", new float[] {0, 1}, 10);
      IndexSearcher searcher = newSearcher(reader);
      searcher.search(query, 10);
    }
  }
}

Output:

java.lang.NullPointerException: Cannot invoke "org.apache.lucene.index.FloatVectorValues.size()" because the return value of "org.apache.lucene.index.LeafReader.getFloatVectorValues(String)" is null
	at __randomizedtesting.SeedInfo.seed([A63D89233AB7E539:F77A9739A165CB8D]:0)
	at org.apache.lucene.search.KnnFloatVectorQuery.approximateSearch(KnnFloatVectorQuery.java:92)
	at org.apache.lucene.search.AbstractKnnVectorQuery.getLeafResults(AbstractKnnVectorQuery.java:120)
	at org.apache.lucene.search.AbstractKnnVectorQuery.searchLeaf(AbstractKnnVectorQuery.java:104)
	at org.apache.lucene.search.AbstractKnnVectorQuery.lambda$rewrite$0(AbstractKnnVectorQuery.java:89)
	at org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(TaskExecutor.java:117)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at org.apache.lucene.search.TaskExecutor$TaskGroup.invokeAll(TaskExecutor.java:152)
	at org.apache.lucene.search.TaskExecutor.invokeAll(TaskExecutor.java:76)
	at org.apache.lucene.search.AbstractKnnVectorQuery.rewrite(AbstractKnnVectorQuery.java:91)
	at org.apache.lucene.search.KnnFloatVectorQuery.rewrite(KnnFloatVectorQuery.java:45)
	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:731)
	at org.apache.lucene.tests.search.AssertingIndexSearcher.rewrite(AssertingIndexSearcher.java:69)
	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:742)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:626)
	at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:483)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:498)
	at org.apache.lucene.search.TestKnnByteVectorQuery.testVectorEncodingMismatch(TestKnnByteVectorQuery.java:47)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change overall is good. We shouldn't accidentally throw NPEs and we should protect users from querying with incorrect vector types.

Could you add a CHANGES.txt entry? It belongs under bug fix & lucene 9.11

// The field does not exist or does not index vectors
FloatVectorValues floatVectorValues = getFloatVectorValues(fi.name);
if (floatVectorValues == null) {
FloatVectorValues.checkField(this, field);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leaf reader here shouldn't throw. Especially since the companion method that accepts a KnnCollector doesn't.

// The field does not exist or does not index vectors
ByteVectorValues byteVectorValues = getByteVectorValues(fi.name);
if (byteVectorValues == null) {
ByteVectorValues.checkField(this, field);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leaf reader here shouldn't throw. Especially since the companion method that accepts a KnnCollector doesn't.

Comment on lines 43 to 45
if (vectorValues == null) {
return DoubleValues.EMPTY;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you throw in the byte similarity source, but not here? We need to be consistent. I think throwing here is acceptable as well (via FloatVectorValues.check).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I will fix it.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉 Thanks for the contribution! 🎉 🎉 🎉

I will merge and backport in a little while

@benwtrent benwtrent merged commit 5b5815a into apache:main Mar 11, 2024
3 checks passed
benwtrent pushed a commit that referenced this pull request Mar 11, 2024
`LeafReader#getXXXVectorValues` may return null value.

**Reproduction**:
```
public class TestKnnByteVectorQuery extends BaseKnnVectorQueryTestCase {
  public void testVectorEncodingMismatch() throws IOException {
    try (Directory indexStore =
        getIndexStore("field", new float[] {0, 1}, new float[] {1, 2}, new float[] {0, 0});
        IndexReader reader = DirectoryReader.open(indexStore)) {
      AbstractKnnVectorQuery query =
          new KnnFloatVectorQuery("field", new float[] {0, 1}, 10);
      IndexSearcher searcher = newSearcher(reader);
      searcher.search(query, 10);
    }
  }
}
```
**Output**:
```
java.lang.NullPointerException: Cannot invoke "org.apache.lucene.index.FloatVectorValues.size()" because the return value of "org.apache.lucene.index.LeafReader.getFloatVectorValues(String)" is null
```
benwtrent added a commit that referenced this pull request Mar 20, 2024
Related to: #13162

Since this is unreleased, no changelog entry is necessary.
benwtrent added a commit that referenced this pull request Mar 20, 2024
Related to: #13162

Since this is unreleased, no changelog entry is necessary.
@hossman
Copy link
Member

hossman commented Mar 20, 2024

FWIW: This commit seems to have duplicated checkField logic that was previously added in #13105 -- compare VectorFieldFunction.checkField with FloatVectorValues.checkField and ByteVectorValues.checkField.

These should probably be refactored to eliminate the duplication?

hossman@slate:~/lucene/lucene [j11] [main] $ tail -20 ./lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/VectorFieldFunction.java
   * @lucene.internal
   * @lucene.experimental
   */
  static void checkField(LeafReader in, String field, VectorEncoding expectedEncoding) {
    FieldInfo fi = in.getFieldInfos().fieldInfo(field);
    if (fi != null) {
      final VectorEncoding actual = fi.hasVectorValues() ? fi.getVectorEncoding() : null;
      if (expectedEncoding != actual) {
        throw new IllegalStateException(
            "Unexpected vector encoding ("
                + actual
                + ") for field "
                + field
                + "(expected="
                + expectedEncoding
                + ")");
      }
    }
  }
}
hossman@slate:~/lucene/lucene [j11] [main] $ tail -20 ./lucene/core/src/java/org/apache/lucene/index/FloatVectorValues.java
   * Checks the Vector Encoding of a field
   *
   * @throws IllegalStateException if {@code field} has vectors, but using a different encoding
   * @lucene.internal
   * @lucene.experimental
   */
  public static void checkField(LeafReader in, String field) {
    FieldInfo fi = in.getFieldInfos().fieldInfo(field);
    if (fi != null && fi.hasVectorValues() && fi.getVectorEncoding() != VectorEncoding.FLOAT32) {
      throw new IllegalStateException(
          "Unexpected vector encoding ("
              + fi.getVectorEncoding()
              + ") for field "
              + field
              + "(expected="
              + VectorEncoding.FLOAT32
              + ")");
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants