Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understand/Improve the performance of numeric range queries #9541

Open
jainankitk opened this issue Aug 24, 2023 · 3 comments
Open

Understand/Improve the performance of numeric range queries #9541

jainankitk opened this issue Aug 24, 2023 · 3 comments
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs Search:Performance v2.16.0 Issues and PRs related to version 2.16.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@jainankitk
Copy link
Collaborator

I have been looking at understanding and improving the performance of numeric range queries in Opensearch. For this purpose, I have setup single node cluster, and ingested nyc_taxis dataset.

While running single range query request in loop, I collected the following cpu flamegraph:
Screenshot 2023-08-24 at 12 49 49 PM

Trying to understand how can we reduce the cost of readInts for this query

@jainankitk jainankitk added enhancement Enhancement or improvement to existing feature or request untriaged Performance This is for any performance related enhancements or bugs Search Search query, autocomplete ...etc and removed untriaged labels Aug 24, 2023
@jainankitk
Copy link
Collaborator Author

After digging further into the code for readInts24, I noticed that there are multiple calls to readLong. Added logic for reading all the longs together to reduce the syscall overhead. Saw about ~15% improvement in the query latency:

Before:

|                                                 Max Throughput |  range |        0.71 |  ops/s |                    
|                                        50th percentile latency |  range |     245.533 |     ms |                    
|                                        90th percentile latency |  range |     248.005 |     ms |                    
|                                        99th percentile latency |  range |     254.824 |     ms |                                                                                                                                           
|                                       100th percentile latency |  range |     256.902 |     ms |                                                                                                                                           
|                                   50th percentile service time |  range |     243.585 |     ms |                    
|                                   90th percentile service time |  range |     246.178 |     ms |                    
|                                   99th percentile service time |  range |     252.672 |     ms |                    
|                                  100th percentile service time |  range |     255.072 |     ms |                                                                                                                                           
|                                                     error rate |  range |           0 |      % | 

After:

|                                              Median Throughput |  range |         0.7 |  ops/s |
|                                                 Max Throughput |  range |        0.71 |  ops/s |
|                                        50th percentile latency |  range |     207.554 |     ms |
|                                        90th percentile latency |  range |     209.392 |     ms |
|                                        99th percentile latency |  range |     213.157 |     ms |
|                                       100th percentile latency |  range |     219.398 |     ms |
|                                   50th percentile service time |  range |     205.421 |     ms |
|                                   90th percentile service time |  range |     207.361 |     ms |
|                                   99th percentile service time |  range |     211.164 |     ms |
|                                  100th percentile service time |  range |     217.787 |     ms |
|                                                     error rate |  range |           0 |      % |

@jainankitk
Copy link
Collaborator Author

Lucene changes made:

diff --git a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
index 40db4c0069d..40ee7a1c968 100644
--- a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
+++ b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
@@ -325,11 +325,14 @@ final class DocIdsWriter {
 
   private static void readInts24(IndexInput in, int count, IntersectVisitor visitor)
       throws IOException {
+    long[] scratchLong = new long[(count/8) * 3];
+    in.readLongs(scratchLong, 0, (count/8) * 3);
     int i;
     for (i = 0; i < count - 7; i += 8) {
-      long l1 = in.readLong();
-      long l2 = in.readLong();
-      long l3 = in.readLong();
+      int li = (i/8) * 3;
+      long l1 = scratchLong[li];
+      long l2 = scratchLong[li+1];
+      long l3 = scratchLong[li+2];
       visitor.visit((int) (l1 >>> 40));
       visitor.visit((int) (l1 >>> 16) & 0xffffff);
       visitor.visit((int) (((l1 & 0xffff) << 8) | (l2 >>> 56)));

@gashutos
Copy link
Contributor

gashutos commented Sep 3, 2023

Anything around 10% improvement is very hard to conclude with just one run.
But if this change is improving performance, it should benefit sort qureries as well, since it too uses BKD docIds value reading.
We can run sort queries on pickup_time or drop_off_time or on http_logs @timestamp field.
http_logs @timestamp is very high cardinality field compare to nyc_taxis's any field. The optimization difference can be more zoomed out there if you would like to try @jainankitk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs Search:Performance v2.16.0 Issues and PRs related to version 2.16.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: Now(This Quarter)
Development

No branches or pull requests

4 participants