Merge branch_9x into fs/branch_9x (up to `ccf4b198ec328095d45d2746189dc8ca633e8bcf`) #20

patsonluk · 2023-07-17T23:56:28Z

In order to create a new fs/branch_9_7, we are following the steps as documented in
https://www.notion.so/OSS-Solr-Development-Guide-41a77cf7bc7246d0bf56177e6986c174#7dc4bc1c52ad40c5ad33d77aba46789b

In particular, we need to first update the fs/branch_9x to catch up to all the changes of branch_9x up to the common ancestor ccf4b198ec328095d45d2746189dc8ca633e8bcf

This PR include all the commits from branch_9x since last merge up to such commit.

The term member of TermAndBoost used to be a Term instance and became a BytesRef with apache#11941, which means its equals impl won't take the field name into account. The SynonymQuery equals impl needs to be updated accordingly to take the field into account as well, otherwise synonym queries with same term and boost across different fields are equal which is a bug.

) CollectionTerminatedException is always caught and never exposed to users so there's no point in filling in a stack-trace for it.

…2274) There's a couple of places in the Exitable wrapper classes where queryTimeout is set within the constructor and never modified. This commit makes such members final.

QueryTimeout was introduced together with ExitableDirectoryReader but is now also optionally set to the IndexSearcher to wrap the bulk scorer with a TimeLimitingBulkScorer. Its javadocs needs updating.

TimeExceededException has three members that are set within its constructor and never modified. They can be made final.

…pache#12283) * `ToParentBlockJoinQuery` Explain Support Score Mode --------- Co-authored-by: Marcus <marcuseagan@gmail.com>

The only behaviour that QueueSizeBasedExecutor overrides from SliceExecutor is when to execute on the caller thread. There is no need to override the whole invokeAll method for that. Instead, this commit introduces a shouldExecuteOnCallerThread method that can be overridden.

…ache#12197) * GITHUB-11838 Change API to allow concurrent query rewrite (apache#11840) Replace Query#rewrite(IndexReader) with Query#rewrite(IndexSearcher) Co-authored-by: Patrick Zhai <zhaih@users.noreply.github.com> Co-authored-by: Adrien Grand <jpountz@gmail.com> Backport of apache#11840 Changes from original: - Query keeps `rewrite(IndexReader)`, but it is now deprecated - VirtualMethod is used to correct delegate to the overridden methods - The changes to `RewriteMethod` type classes are reverted, this increased the backwards compatibility impact. ------------------------------ ### Description Issue: apache#11838 #### Updated Proposal * Change signature of rewrite to `rewrite(IndexSearcher)` * How did I migrate the usage: * Use Intellij to do preliminary refactoring for me * For test usage, use searcher whenever is available, otherwise create one using `newSearcher(reader)` * For very few non-test classes which doesn't have IndexSearcher available but called rewrite, create a searcher using `new IndexSearcher(reader)`, tried my best to avoid creating it recurrently (Especially in `FieldQuery`) * For queries who have implemented the rewrite and uses some part of reader's functionality, use shortcut method when possible, otherwise pull out the reader from indexSearcher.

…e#12288) * Concurrent rewrite for KnnVectorQuery (apache#12160) - Reduce overhead of non-concurrent search by preserving original execution - Improve readability by factoring into separate functions --------- Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com> * adjusting for backport --------- Co-authored-by: Kaival Parikh <46070017+kaivalnp@users.noreply.github.com> Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com>

Co-authored-by: tangdonghai <tangdonghai@meituan.com> # Conflicts: # lucene/CHANGES.txt

…2292)

…#12305)

…va encodes the date using local timezone) (apache#12315)

- add a few missing full stops - update wording in the description of Query#equals method

…pache#12327) Leverage accelerated vector hardware instructions in Vector Search. Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously). Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21. --------- Co-authored-by: Uwe Schindler <uschindler@apache.org> Co-authored-by: Robert Muir <rmuir@apache.org>

…che#12325) The concurrent query rewrite for knn vectory query introduced with apache#12160 requests one thread per segment to the executor. To align this with the IndexSearcher parallel behaviour, we should rather parallelize across slices. Also, we can reuse the same slice executor instance that the index searcher already holds, in that way we are using a QueueSizeBasedExecutor when a thread pool executor is provided.

This method is showing up as a little hot when profiling some queries. Almost all the time spent in this method is just burnt on ceremony around stream indirections that don't inline. Moving this to iterators, simplifying the check for same doc id and also saving one iteration (for the min cost) makes this method far cheaper and easier to read.

…ment (apache#12334)

… it in TermInSetQuery#visit (apache#12320)

…h different FieldInfo This commits restores Lucene 9's ability to handle indices created with Lucene 8 where there are discrepancies in FieldInfos, such as different IndexOptions

…wo polygon nodes (apache#12353) # Conflicts: # lucene/CHANGES.txt

When reading data from outside the buffer, BufferedIndexInput always resets its buffer to start at the new read position. If we are reading backwards (for example, using an OffHeapFSTStore for a terms dictionary) then this can have the effect of re-reading the same data over and over again. This commit changes BufferedIndexInput to use paging when reading backwards, so that if we ask for a byte that is before the current buffer, we read a block of data of bufferSize that ends at the previous buffer start. Fixes apache#12356

…i (JDK-8309727) (apache#12362)

…rsion depth (apache#12249)

…he#12294) Backport incl JDK21 apijar file with java.util.Objects regenerated

…vance (apache#12324)

… non-empty and finite vectors (apache#12281) --------- Co-authored-by: Uwe Schindler <uschindler@apache.org>

…pache#12363) (apache#12365) This commit enables the Panama Vector API for Java 21. The version of VectorUtilPanamaProvider for Java 21 is identical to that of Java 20. As such, there is no specific 21 version - the Java 20 version will be loaded from the MRJAR.

…ents even if queue is not full for PagingFieldCollector (apache#12368) Signed-off-by: gashutos <gashutos@amazon.com>

PR apache#12169 accidentally moved the `TermAndBoost` class to a different location, which would break custom sub-classes of `QueryBuilder`. This commit moves it back to its original location.

…tween knn vectors (apache#12253) Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io>

…pache#12316) (cherry picked from commit a454388)

…rphological data (apache#12323) there can be many entries with morph data, so we'd better avoid compiling and matching regexes and even stream allocation (cherry picked from commit 4bf1b94)

* TestHunspell: reduce the flakiness probability We need to check how the timeout interacts with custom exception-throwing checkCanceled. The default timeout seems not enough for some CI agents, so let's increase it. Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com> (cherry picked from commit 5b63a18)

… may not fully support vectorization or if C2 is not enabled (apache#12376)

hiteshk25

LGTM

…d2746189dc8ca633e8bcf`) (#20)" This reverts commit a7f4238.

…d2746189dc8ca633e8bcf`) (#20)" (#21) This reverts commit a7f4238.

romseygeek and others added 30 commits April 26, 2023 16:38

Add next minor version 9.7.0

eb31c4e

Fix MMapDirectory documentation for Java 20 (apache#12265)

15568ca

Don't generate stacktrace in CollectionTerminatedException (apache#12270

ae305f8

) CollectionTerminatedException is always caught and never exposed to users so there's no point in filling in a stack-trace for it.

add missing changelog entry for apache#12260

363f349

Add missing author to changelog entry for apache#12220

f074dc9

Make query timeout members final in ExitableDirectoryReader (apache#1…

3e6ab76

…2274) There's a couple of places in the Exitable wrapper classes where queryTimeout is set within the constructor and never modified. This commit makes such members final.

Update javadocs for QueryTimeout (apache#12272)

48986d1

QueryTimeout was introduced together with ExitableDirectoryReader but is now also optionally set to the IndexSearcher to wrap the bulk scorer with a TimeLimitingBulkScorer. Its javadocs needs updating.

Make TimeExceededException members final (apache#12271)

447cfea

TimeExceededException has three members that are set within its constructor and never modified. They can be made final.

DOAP changes for release 9.6.0

cf2dfd2

Add back-compat indices for 9.6.0

8e726f7

ToParentBlockJoinQuery Explain Support Score Mode (apache#12245) (a…

37b97a6

…pache#12283) * `ToParentBlockJoinQuery` Explain Support Score Mode --------- Co-authored-by: Marcus <marcuseagan@gmail.com>

toposort use iterator to avoid stackoverflow (apache#12286)

76e8a42

Co-authored-by: tangdonghai <tangdonghai@meituan.com> # Conflicts: # lucene/CHANGES.txt

Fix test to compile with Java 11 after backport of apache#12286

b895c24

Update Javadoc for topoSortStates method after apache#12286 (apache#1…

afacf23

…2292)

Optimize HNSW diversity calculation (apache#12235)

310360c

Minor cleanup and improvements to DaciukMihovAutomatonBuilder (apache…

59110fc

…#12305)

GITHUB-12291: Skip blank lines from stopwords list. (apache#12299)

d1db558

Wrap Query rewrite backwards layer with AccessController (apache#12308)

4d1ed9e

Make sure APIJAR reproduces with different timezone (unfortunately ja…

b21b396

…va encodes the date using local timezone) (apache#12315)

Add multi-thread searchability to OnHeapHnswGraph (apache#12257)

25a908d

Fix backport error

3eb7b04

[MINOR] Update javadoc in Query class (apache#12233)

7db2c12

- add a few missing full stops - update wording in the description of Query#equals method

Update changes to be correct with ARM (it is called NEON there)

02db735

gashutos and others added 23 commits May 31, 2023 15:20

Fix searchafter high latency when after value is out of range for seg…

327997b

…ment (apache#12334)

Make memory fence in ByteBufferGuard explicit (apache#12290)

204acc3

Add "direct to binary" option for DaciukMihovAutomatonBuilder and use…

349b458

… it in TermInSetQuery#visit (apache#12320)

Add updateDocuments API which accept a query (reopen) (apache#12346)

d76dd26

GITHUB#11350: Handle backward compatibility when merging segments wit…

9500438

…h different FieldInfo This commits restores Lucene 9's ability to handle indices created with Lucene 8 where there are discrepancies in FieldInfos, such as different IndexOptions

[Tessellator] Improve the checks that validate the diagonal between t…

84ea3aa

…wo polygon nodes (apache#12353) # Conflicts: # lucene/CHANGES.txt

feat: soft delete optimize (apache#12339)

1107aa2

Work around SecurityManager issues during initialization of vector ap…

41cd1f7

…i (JDK-8309727) (apache#12362)

Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recu…

b218b81

…rsion depth (apache#12249)

Implement MMapDirectory with Java 21 Project Panama Preview API (apac…

adc3740

…he#12294) Backport incl JDK21 apijar file with java.util.Objects regenerated

remove relic in apijar folder caused by vector additions

d8f3be1

Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step ad…

7f16909

…vance (apache#12324)

Add checks in KNNVectorField / KNNVectorQuery to only allow non-null,…

8c00149

… non-empty and finite vectors (apache#12281) --------- Co-authored-by: Uwe Schindler <uschindler@apache.org>

Add CHANGES.txt for apache#12334 Honor after value for skipping docum…

c2e5ef4

…ents even if queue is not full for PagingFieldCollector (apache#12368) Signed-off-by: gashutos <gashutos@amazon.com>

Move TermAndBoost back to its original location. (apache#12366)

7f46d58

PR apache#12169 accidentally moved the `TermAndBoost` class to a different location, which would break custom sub-classes of `QueryBuilder`. This commit moves it back to its original location.

GITHUB-12252: Add function queries for computing similarity scores be…

433aa49

…tween knn vectors (apache#12253) Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io>

hunspell (minor): reduce allocations when processing compound rules (a…

27d480f

…pache#12316) (cherry picked from commit a454388)

hunspell (minor): reduce allocations when reading the dictionary's mo…

a2b47b0

…rphological data (apache#12323) there can be many entries with morph data, so we'd better avoid compiling and matching regexes and even stream allocation (cherry picked from commit 4bf1b94)

This allows VectorUtilProvider tests to be executed although hardware…

ccf4b19

… may not fully support vectorization or if C2 is not enabled (apache#12376)

Merge branch 'branch_9x' into fs/branch_9x

1a9f441

patsonluk assigned hiteshk25 Jul 17, 2023

hiteshk25 approved these changes Jul 18, 2023

View reviewed changes

hiteshk25 merged commit a7f4238 into fs/branch_9x Jul 18, 2023

patsonluk deleted the patsonluk/fs_branch_9x_merge branch July 18, 2023 00:30

hiteshk25 added a commit that referenced this pull request Jul 18, 2023

Revert "Merge branch_9x into fs/branch_9x (up to `ccf4b198ec328095d45…

171bdc9

…d2746189dc8ca633e8bcf`) (#20)" This reverts commit a7f4238.

hiteshk25 mentioned this pull request Jul 18, 2023

Revert "Merge branch_9x into fs/branch_9x (up to ccf4b198ec328095d45d2746189dc8ca633e8bcf)" #21

Merged

hiteshk25 added a commit that referenced this pull request Jul 18, 2023

Revert "Merge branch_9x into fs/branch_9x (up to `ccf4b198ec328095d45…

5697c49

…d2746189dc8ca633e8bcf`) (#20)" (#21) This reverts commit a7f4238.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge branch_9x into fs/branch_9x (up to `ccf4b198ec328095d45d2746189dc8ca633e8bcf`) #20

Merge branch_9x into fs/branch_9x (up to `ccf4b198ec328095d45d2746189dc8ca633e8bcf`) #20

patsonluk commented Jul 17, 2023

hiteshk25 left a comment

Merge branch_9x into fs/branch_9x (up to ccf4b198ec328095d45d2746189dc8ca633e8bcf) #20

Merge branch_9x into fs/branch_9x (up to ccf4b198ec328095d45d2746189dc8ca633e8bcf) #20

Conversation

patsonluk commented Jul 17, 2023

hiteshk25 left a comment

Choose a reason for hiding this comment

Merge branch_9x into fs/branch_9x (up to `ccf4b198ec328095d45d2746189dc8ca633e8bcf`) #20

Merge branch_9x into fs/branch_9x (up to `ccf4b198ec328095d45d2746189dc8ca633e8bcf`) #20