Understand/Improve the performance of numeric range queries #9541

jainankitk · 2023-08-24T19:52:32Z

I have been looking at understanding and improving the performance of numeric range queries in Opensearch. For this purpose, I have setup single node cluster, and ingested nyc_taxis dataset.

While running single range query request in loop, I collected the following cpu flamegraph:

Trying to understand how can we reduce the cost of readInts for this query

jainankitk · 2023-08-24T21:50:03Z

After digging further into the code for readInts24, I noticed that there are multiple calls to readLong. Added logic for reading all the longs together to reduce the syscall overhead. Saw about ~15% improvement in the query latency:

Before:

|                                                 Max Throughput |  range |        0.71 |  ops/s |                    
|                                        50th percentile latency |  range |     245.533 |     ms |                    
|                                        90th percentile latency |  range |     248.005 |     ms |                    
|                                        99th percentile latency |  range |     254.824 |     ms |                                                                                                                                           
|                                       100th percentile latency |  range |     256.902 |     ms |                                                                                                                                           
|                                   50th percentile service time |  range |     243.585 |     ms |                    
|                                   90th percentile service time |  range |     246.178 |     ms |                    
|                                   99th percentile service time |  range |     252.672 |     ms |                    
|                                  100th percentile service time |  range |     255.072 |     ms |                                                                                                                                           
|                                                     error rate |  range |           0 |      % |

After:

|                                              Median Throughput |  range |         0.7 |  ops/s |
|                                                 Max Throughput |  range |        0.71 |  ops/s |
|                                        50th percentile latency |  range |     207.554 |     ms |
|                                        90th percentile latency |  range |     209.392 |     ms |
|                                        99th percentile latency |  range |     213.157 |     ms |
|                                       100th percentile latency |  range |     219.398 |     ms |
|                                   50th percentile service time |  range |     205.421 |     ms |
|                                   90th percentile service time |  range |     207.361 |     ms |
|                                   99th percentile service time |  range |     211.164 |     ms |
|                                  100th percentile service time |  range |     217.787 |     ms |
|                                                     error rate |  range |           0 |      % |

jainankitk · 2023-08-24T21:52:45Z

Lucene changes made:

diff --git a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
index 40db4c0069d..40ee7a1c968 100644
--- a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
+++ b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
@@ -325,11 +325,14 @@ final class DocIdsWriter {
 
   private static void readInts24(IndexInput in, int count, IntersectVisitor visitor)
       throws IOException {
+    long[] scratchLong = new long[(count/8) * 3];
+    in.readLongs(scratchLong, 0, (count/8) * 3);
     int i;
     for (i = 0; i < count - 7; i += 8) {
-      long l1 = in.readLong();
-      long l2 = in.readLong();
-      long l3 = in.readLong();
+      int li = (i/8) * 3;
+      long l1 = scratchLong[li];
+      long l2 = scratchLong[li+1];
+      long l3 = scratchLong[li+2];
       visitor.visit((int) (l1 >>> 40));
       visitor.visit((int) (l1 >>> 16) & 0xffffff);
       visitor.visit((int) (((l1 & 0xffff) << 8) | (l2 >>> 56)));

gashutos · 2023-09-03T07:10:53Z

Anything around 10% improvement is very hard to conclude with just one run.
But if this change is improving performance, it should benefit sort qureries as well, since it too uses BKD docIds value reading.
We can run sort queries on pickup_time or drop_off_time or on http_logs @timestamp field.
http_logs @timestamp is very high cardinality field compare to nyc_taxis's any field. The optimization difference can be more zoomed out there if you would like to try @jainankitk

jainankitk added enhancement Enhancement or improvement to existing feature or request untriaged Performance This is for any performance related enhancements or bugs Search Search query, autocomplete ...etc and removed untriaged labels Aug 24, 2023

github-project-automation bot added this to Search Project Board Aug 24, 2023

github-project-automation bot moved this to 🆕 New in Search Project Board Aug 24, 2023

jainankitk mentioned this issue Aug 30, 2023

Optimize readInts24 performance for DocIdsWriter apache/lucene#12527

Open

macohen added this to OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

github-project-automation bot moved this to Lucene (In Progress) in OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

anasalkouz added Search:Performance and removed Search Search query, autocomplete ...etc labels Sep 20, 2023

sandeshkr419 mentioned this issue Dec 6, 2023

Understand/Improve the performance of range queries #11251

Closed

expani mentioned this issue May 15, 2024

[Feature Request] Improve BKD Tree DocIds Encoding for 24 and 32 bit variations #13686

Open

harshavamsi mentioned this issue May 22, 2024

Introduce ApproximateRangeQuery and ApproximateableQuery #13788

Merged

9 tasks

reta added v3.0.0 Issues and PRs related to version 3.0.0 v2.16.0 Issues and PRs related to version 2.16.0 labels Jun 11, 2024

getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand/Improve the performance of numeric range queries #9541

Understand/Improve the performance of numeric range queries #9541

jainankitk commented Aug 24, 2023

jainankitk commented Aug 24, 2023

jainankitk commented Aug 24, 2023

gashutos commented Sep 3, 2023

Understand/Improve the performance of numeric range queries #9541

Understand/Improve the performance of numeric range queries #9541

Comments

jainankitk commented Aug 24, 2023

jainankitk commented Aug 24, 2023

jainankitk commented Aug 24, 2023

gashutos commented Sep 3, 2023