-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Searching #80693
Comments
Pinging @elastic/es-search (Team:Search) |
This has been discussed multiple times before, and here is a good summary why have not done it so far. In short, we focus parallelization on processing multiple requests concurrently thus increasing throughput, rather than parallelizing a single request. @LifeIsStrange I am curios what is your use case where you need to improve the latency of a single request? |
I am not very qualified to defend the optimization but maybe that @mikemccand could do some justification?
Note btw that the cost of parallelism can be much lower than Java threads, either by leveraging a minimal amount of Kotlin code (using Coroutines) or through the soon to come Loom green (user space) threads. Edit:
Firstly this was tested in 2018, Lucene releases frequently optimize/lower the cost of intra parallel queries. |
@javanna I see it has been privately discussed in march, any feedback about what was said? BTW as a little update to my original comment,
|
@LifeIsStrange not much to add, this is quite high in the list of things we want to do. The list is long :) We'd prefer not to compromise and apply concurrency only to the query, but also to aggregations. |
Elasticsearch has historically performed search sequentially across the segments. Lucene supports parallelizing search across segments when collecting hits (via collector managers) as well as when rewriting certain queries (e.g. knn queries). Elasticsearch is now ready to support concurrency within a single shard too. Search is already performed using collector managers, and the last piece that is missing is providing an executor to the index searcher so that it can offload the concurrent computation to it. This commit introduces a secondary executor, used exclusively to execute the concurrent bits of search. The search threads are still the ones that coordinate the search (where the caller search will originate from), but the actual work will be offloaded to the newly introduced executor. We are offloading not only parallel execution but also sequential execution, to make the workload more predictable, as it would be surprising to have bits of search executed in either of the two thread pools. Also, that would introduce the possibility to suddenly run a higher amount of heavy operations overall (some in the caller thread and some in the separate threads), which could overload the system as well as make sizing of thread pools more difficult. Note that fetch, together with other actions, is still executed in the search thread pool. This commit does not make the search thread pool merely a coordinating only thread pool, It does so only for what concerns the IndexSearcher#search operation itself, which is though a big portion of the different phases of search API execution. Given that the searcher blocks waiting for all tasks to be completed, we take a simple approach of introducing a bounded executor, which blocks whenever there are no threads to directly execute tasks. This simplifies handling of thread pool queue and rejections. In fact, we can guarantee that the secondary thread pool won't reject, and delegate queueing entirely to the search thread pool which is the entry point for every search operation anyway. The principle behind this is that if you got a slot in the search thread pool, you should be able to complete your search. As part of this commit we are also introducing the ability to cancel tasks that have not started yet, so that if any task throws an exception, other tasks are prevented from starting needless computation. This commit also enables concurrenct search execution in the DFS phase, which is going to improve resource usage as well as performance of knn queries which benefit from both concurrent rewrite and collection. We will enable concurrent execution for the query phase in a subsequent commit. Relates to elastic#80693 Relates to elastic#90700
This commit enables concurrent search execution in the DFS phase, which is going to improve resource usage as well as performance of knn queries which benefit from both concurrent rewrite and collection. We will enable concurrent execution for the query phase in a subsequent commit. While this commit does not introduce parallelism for the query phase, it introduces offloading sequential computation to the newly introduced executor. This is true both for situations where a single slice needs to be searched, as well as scenarios where a specific request does not support concurrency (currently only DFS phase does regardless of the request). Sequential collection is not offloaded only if the request includes aggregations that don't support offloading: composite, nested and cardinality as their post collection method must be executed in the same thread as the collection or we'll trip a lucene assertion that verifies that doc_values are pulled and consumed from the same thread. ## Technical details This commit introduces a secondary executor, used exclusively to execute the concurrent bits of search. The search threads are still the ones that coordinate the search (where the caller search will originate from), but the actual work will be offloaded to the newly introduced executor. We are offloading not only parallel execution but also sequential execution, to make the workload more predictable, as it would be surprising to have bits of search executed in either of the two thread pools. Also, that would introduce the possibility to suddenly run a higher amount of heavy operations overall (some in the caller thread and some in the separate threads), which could overload the system as well as make sizing of thread pools more difficult. Note that fetch, together with other actions, is still executed in the search thread pool. This commit does not make the search thread pool merely a coordinating only thread pool, It does so only for what concerns the IndexSearcher#search operation itself, which is though a big portion of the different phases of search API execution. Given that the searcher blocks waiting for all tasks to be completed, we take a simple approach of introducing a thread pool executor that has the same size as the existing search thread pool but relies on an unbounded queue. This simplifies handling of thread pool queue and rejections. In fact, we'd like to guarantee that the secondary thread pool won't reject, and delegate queuing entirely to the search thread pool which is the entry point for every search operation anyway. The principle behind this is that if you got a slot in the search thread pool, you should be able to complete your search, and rather quickly. As part of this commit we are also introducing the ability to cancel tasks that have not started yet, so that if any task throws an exception, other tasks are prevented from starting needless computation. Relates to #80693 Relates to #90700
This commit enables concurrent search execution in the DFS phase, which is going to improve resource usage as well as performance of knn queries which benefit from both concurrent rewrite and collection. We will enable concurrent execution for the query phase in a subsequent commit. While this commit does not introduce parallelism for the query phase, it introduces offloading sequential computation to the newly introduced executor. This is true both for situations where a single slice needs to be searched, as well as scenarios where a specific request does not support concurrency (currently only DFS phase does regardless of the request). Sequential collection is not offloaded only if the request includes aggregations that don't support offloading: composite, nested and cardinality as their post collection method must be executed in the same thread as the collection or we'll trip a lucene assertion that verifies that doc_values are pulled and consumed from the same thread. ## Technical details This commit introduces a secondary executor, used exclusively to execute the concurrent bits of search. The search threads are still the ones that coordinate the search (where the caller search will originate from), but the actual work will be offloaded to the newly introduced executor. We are offloading not only parallel execution but also sequential execution, to make the workload more predictable, as it would be surprising to have bits of search executed in either of the two thread pools. Also, that would introduce the possibility to suddenly run a higher amount of heavy operations overall (some in the caller thread and some in the separate threads), which could overload the system as well as make sizing of thread pools more difficult. Note that fetch, together with other actions, is still executed in the search thread pool. This commit does not make the search thread pool merely a coordinating only thread pool, It does so only for what concerns the IndexSearcher#search operation itself, which is though a big portion of the different phases of search API execution. Given that the searcher blocks waiting for all tasks to be completed, we take a simple approach of introducing a thread pool executor that has the same size as the existing search thread pool but relies on an unbounded queue. This simplifies handling of thread pool queue and rejections. In fact, we'd like to guarantee that the secondary thread pool won't reject, and delegate queuing entirely to the search thread pool which is the entry point for every search operation anyway. The principle behind this is that if you got a slot in the search thread pool, you should be able to complete your search, and rather quickly. As part of this commit we are also introducing the ability to cancel tasks that have not started yet, so that if any task throws an exception, other tasks are prevented from starting needless computation. Relates to elastic#80693 Relates to elastic#90700
It looks like Elasticsearch is currently missing support for concurrent searching and that this is a major prospective optimization to reduce latency.
Digression: this is not the topic at hand but I personally consider the fork OpenSearch to be an historical tragedy of duplication of work and a segregation of improvements. What would be even more anti-utilitarist would be to blindly ignore each other advances towards making better search.
Lucene concurrent search APIs are probably among the most important feature/optimization currently not leveraged by Elasticsearch.
The fork OpenSearch has a pull request implementing support for it.
See also the corresponding issue
I believe it would be great for end users if you could take inspiration from the PR and work on this feature.
Great blog about the Lucene feature:
https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html?m=1
See also those performance results:
https://engineeringblog.yelp.com/2021/09/nrtsearch-yelps-fast-scalable-and-cost-effective-search-engine.html
The text was updated successfully, but these errors were encountered: