Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip optimization if the index has duplicate data #43121

Conversation

mayya-sharipova
Copy link
Contributor

Skip sort optimization if the index has 50% or more data
with the same value.
When index has a lot of docs with the same value, sort
optimization doesn't make sense, as DistanceFeatureQuery
will produce same scores for these docs, and Lucene
will use the second sort to tie-break. This could be slower
than usual sorting.

@mayya-sharipova mayya-sharipova added the :Search/Search Search-related issues that do not fall into other categories label Jun 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/1

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first round of review. I left some comments but the change looks good overall. Can you also add a non-randomized test that checks the return value of indexFieldHasDuplicateData ?

noDuplicateSegments++;
}
}
return (duplicateSegments >= noDuplicateSegments);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we compute the global count of all medians and compare with the total doc count divided by 2 ? Currently the heuristic does not take the size of segments into account.

@@ -652,9 +654,9 @@ public void testNumericLongOrDateSortOptimization() throws Exception {
TestSearchContext searchContext = spy(new TestSearchContext(null, indexShard));
when(searchContext.mapperService()).thenReturn(mapperService);

final int numDocs = scaledRandomIntBetween(50, 100);
final int numDocs = 10000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a big number, can we test with smaller values ?

@mayya-sharipova
Copy link
Contributor Author

Can you also add a non-randomized test that checks the return value of indexFieldHasDuplicateData ?

@jimczi What kind of test do you have in mind? QueryPhaseTests::testIndexFieldHasDuplicateData is not enough?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating @mayya-sharipova. I left another comment regarding additional tests.

@@ -708,6 +710,39 @@ public void testNumericLongOrDateSortOptimization() throws Exception {
dir.close();
}

public void testIndexFieldHasDuplicateData() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use BKDWriter/Reader directly here since we could set the max number of docs per leave to a small value. This would ensure that we create enough leaves to test with fewer docs. Even with 10,000 docs here the number of leaves that we create is quite small (less than 10) so the test is a bit too simplistic in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi Thanks Jim for the review. This is addressed in the last commit

}
writer.close();
final IndexReader reader = DirectoryReader.open(dir);
assertTrue(indexFieldHasDuplicateData(reader, fieldName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also check the value returned to compute the heuristic (the median and the count) and compare it with the expectation with some error bounds. It could be another test but if we generate values randomly we should be able to compute the median and the number of duplicated values accurately and then compare the result with the approximation that is computed in the BKD, wdyt ?

@mayya-sharipova mayya-sharipova force-pushed the skip_optimization_on_homogeneous_dataset branch from 251423c to 1b0277d Compare June 25, 2019 13:25
@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/1

Skip sort optimization if the index has 50% or more data
with the same value.
When index has a lot of docs with the same value, sort
optimization doesn't make sense, as DistanceFeatureQuery
will produce same scores for these docs, and Lucene
will use the second sort to tie-break. This could be slower
than usual sorting.
@mayya-sharipova mayya-sharipova force-pushed the skip_optimization_on_homogeneous_dataset branch from dfef7fa to 6f39a40 Compare June 25, 2019 18:53
This allows to control the number of points in the leaf node
@mayya-sharipova mayya-sharipova force-pushed the skip_optimization_on_homogeneous_dataset branch from 6f39a40 to 9c71827 Compare June 25, 2019 19:03
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, LGTM otherwise.
Not related to this pr but I wonder if we can simplify the logic if we move it to ContextIndexSearcher ? It would require some refactoring but I don't like the fact that we need to check all the options in the SearchContext while we just need to ensure that the Collector is a TopFieldCollector. I have another change in flight that does that, I'll open a pr when I have something to share but wanted to let you know first.

for (LeafReaderContext lrc : reader.leaves()) {
PointValues pointValues = lrc.reader().getPointValues(field);
if (pointValues == null) continue;
int docCount = pointValues.getDocCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment (or an assert) that the doc count is equals to the number of points. This is important since we'll need to change the logic here if we handle multiple values per docs (https://github.com/elastic/elasticsearch/pull/43121/files#diff-ec88da77f16eaf2fff65965789ea44beR398). Or maybe you can use PointValues#size here and add an assert that PointValues#size == PointValues#getDocCount to ensure that we don't forget to revise the logic if/when we handle multiple values per doc.

@@ -236,6 +236,9 @@ static boolean execute(SearchContext searchContext,
System.arraycopy(oldFormats, 0, newFormats, 1, oldFormats.length);
sortAndFormatsForRewrittenNumericSort = searchContext.sort(); // stash SortAndFormats to restore it later
searchContext.sort(new SortAndFormats(new Sort(newSortFields), newFormats));
if (LOGGER.isTraceEnabled()) {
LOGGER.trace("Sort optimization on the field [" + oldSortFields[0].getField() + "] was enabled!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that this helps the debugging ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi What would be the way to see if the optimization was used? LOGGER.trace is a not a good way?

@mayya-sharipova
Copy link
Contributor Author

@jimczi Thanks Jim. The last commit addresses your last feedback.

I wonder if we can simplify the logic if we move it to ContextIndexSearcher ? It would require some refactoring but I don't like the fact that we need to check all the options in the SearchContext while we just need to ensure that the Collector is a TopFieldCollector.

When you say that we check all the options in the SearchContext, are you talking about all the checks we do in tryRewriteLongSort? Would they be unnecessary if we are in ContextIndexSearcher? I am not sure how ContextIndexSearcher can simplify the checks

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine test this please

1 similar comment
@mayya-sharipova
Copy link
Contributor Author

@elasticmachine test this please

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

@mayya-sharipova mayya-sharipova merged commit 3c29734 into elastic:long_sort_optimization Jul 3, 2019
mayya-sharipova added a commit that referenced this pull request Nov 26, 2019
* Optimize sort on numeric long and date fields (#39770)

Optimize sort on numeric long and date fields, when 
the system property `es.search.long_sort_optimized` is true.

* Skip optimization if the index has duplicate data (#43121)

Skip sort optimization if the index has 50% or more data
with the same value.
When index has a lot of docs with the same value, sort
optimization doesn't make sense, as DistanceFeatureQuery
will produce same scores for these docs, and Lucene
will use the second sort to tie-break. This could be slower
than usual sorting.

* Sort leaves on search according to the primary numeric sort field (#44021)

This change pre-sort the index reader leaves (segment) prior to search
when the primary sort is a numeric field eligible to the distance feature
optimization. It also adds a tie breaker on `_doc` to the rewritten sort
in order to bypass the fact that leaves will be collected in a random order.
I ran this patch on the http_logs benchmark and the results are very promising:

```
|                                       50th percentile latency | desc_sort_timestamp |    220.706 |      136544 |   136324 |     ms |
|                                       90th percentile latency | desc_sort_timestamp |    244.847 |      162084 |   161839 |     ms |
|                                       99th percentile latency | desc_sort_timestamp |    316.627 |      172005 |   171688 |     ms |
|                                      100th percentile latency | desc_sort_timestamp |    335.306 |      173325 |   172989 |     ms |
|                                  50th percentile service time | desc_sort_timestamp |    218.369 |     1968.11 |  1749.74 |     ms |
|                                  90th percentile service time | desc_sort_timestamp |    244.182 |      2447.2 |  2203.02 |     ms |
|                                  99th percentile service time | desc_sort_timestamp |    313.176 |     2950.85 |  2637.67 |     ms |
|                                 100th percentile service time | desc_sort_timestamp |    332.924 |     2959.38 |  2626.45 |     ms |
|                                                    error rate | desc_sort_timestamp |          0 |           0 |        0 |      % |
|                                                Min Throughput |  asc_sort_timestamp |   0.801824 |    0.800855 | -0.00097 |  ops/s |
|                                             Median Throughput |  asc_sort_timestamp |   0.802595 |    0.801104 | -0.00149 |  ops/s |
|                                                Max Throughput |  asc_sort_timestamp |   0.803282 |    0.801351 | -0.00193 |  ops/s |
|                                       50th percentile latency |  asc_sort_timestamp |    220.761 |     824.098 |  603.336 |     ms |
|                                       90th percentile latency |  asc_sort_timestamp |    251.741 |     853.984 |  602.243 |     ms |
|                                       99th percentile latency |  asc_sort_timestamp |    368.761 |     893.943 |  525.182 |     ms |
|                                      100th percentile latency |  asc_sort_timestamp |    431.042 |      908.85 |  477.808 |     ms |
|                                  50th percentile service time |  asc_sort_timestamp |    218.547 |     820.757 |  602.211 |     ms |
|                                  90th percentile service time |  asc_sort_timestamp |    249.578 |     849.886 |  600.308 |     ms |
|                                  99th percentile service time |  asc_sort_timestamp |    366.317 |     888.894 |  522.577 |     ms |
|                                 100th percentile service time |  asc_sort_timestamp |    430.952 |     908.401 |   477.45 |     ms |
|                                                    error rate |  asc_sort_timestamp |          0 |           0 |        0 |      % |
```

So roughly 10x faster for the descending sort and 2-3x faster in the ascending case. Note
that I indexed the http_logs with a single client in order to simulate real time-based indices
where document are indexed in their timestamp order.

Relates #37043

* Remove nested collector in docs response

As we don't use cancellableCollector anymore, it should be removed from
the expected docs response.

* Use collector manager for search when necessary (#45829)

When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
Thus for such a case, we use collectorManager,
where for every segment a dedicated collector will be created.

* Use shared TopFieldCollector manager

Use shared TopFieldCollector manager for sort optimization.
This collector manager is able to exchange minimum competitive
score between collectors

* Correct calculation of avg value to avoid overflow

* Optimize calculating if index has duplicate data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants