Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searchable snapshot read FST performance issue. #88366

Closed
howardhuanghua opened this issue Jul 8, 2022 · 6 comments
Closed

Searchable snapshot read FST performance issue. #88366

howardhuanghua opened this issue Jul 8, 2022 · 6 comments
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@howardhuanghua
Copy link
Contributor

Elasticsearch Version

8.0

Installed Plugins

No response

Java Version

bundled

OS Version

CentOS

Problem Description

Searchable snapshot read tip file has performance issue, currently BaseSearchableSnapshotIndexInput extends from BufferedIndexInput.
When we trigger the follow terms query,

GET /index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "terms": {
      "id": [ "7129519", "7191131" ]
    }
  }
}
'

FST read tip file would use ReverseRandomAccessReader which read from the tail.
I have printed the doReadInternal details:

[2022-07-07T17:05:24,351][INFO ][o.e.x.s.s.i.MetadataCachingIndexInput] [node] cache readInternal: read [_28_Lucene84_0.tip][4134192-4135216] ([1024] bytes) from [randomaccess[length=1847637, file pointer=102073, offset=4032119]] name[randomaccess]
[2022-07-07T17:05:24,351][INFO ][o.e.x.s.s.i.MetadataCachingIndexInput] [node] cache readInternal: read [_28_Lucene84_0.tip][4134191-4135215] ([1024] bytes) from [randomaccess[length=1847637, file pointer=102072, offset=4032119]] name[randomaccess]
[2022-07-07T17:05:24,351][INFO ][o.e.x.s.s.i.MetadataCachingIndexInput] [node] cache readInternal: read [_28_Lucene84_0.tip][4134190-4135214] ([1024] bytes) from [randomaccess[length=1847637, file pointer=102071, offset=4032119]] name[randomaccess]
[2022-07-07T17:05:24,351][INFO ][o.e.x.s.s.i.MetadataCachingIndexInput] [node] cache readInternal: read [_28_Lucene84_0.tip][4134189-4135213] ([1024] bytes) from [randomaccess[length=1847637, file pointer=102070, offset=4032119]] name[randomaccess]
[2022-07-07T17:05:24,351][INFO ][o.e.x.s.s.i.MetadataCachingIndexInput] [node] cache readInternal: read [_28_Lucene84_0.tip][4134188-4135212] ([1024] bytes) from [randomaccess[length=1847637, file pointer=102069, offset=4032119]] name[randomaccess]
[2022-07-07T17:05:24,351][INFO ][o.e.x.s.s.i.MetadataCachingIndexInput] [node] cache readInternal: read [_28_Lucene84_0.tip][4134187-4135211] ([1024] bytes) from [randomaccess[length=1847637, file pointer=102068, offset=4032119]] name[randomaccess]

We could see that, each time read 1024 bytes, and only single byte is useful, and 1023 are duplicated reading, 4134192-4135216, 4134191-4135215..., that's because we use ReverseRandomAccessReader read from the tail.

Could we use ByteBufferIndexInput instead of BufferedIndexInput for BaseSearchableSnapshotIndexInput ?

Steps to Reproduce

  1. Create a searchable snapshot index.
  2. Trigger terms query.

Logs (if relevant)

No response

@howardhuanghua howardhuanghua added >bug needs:triage Requires assignment of a team area label labels Jul 8, 2022
@DaveCTurner DaveCTurner added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed needs:triage Requires assignment of a team area label labels Jul 8, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@howardhuanghua
Copy link
Contributor Author

I got a proposal, for searchable snapshot .tip file reading, enhance BufferedIndexInput to support reverse cache:
image

@tlrx
Copy link
Member

tlrx commented Jul 26, 2022

Thanks @howardhuanghua for reporting.

I agree that ReverseRandomAccessReader works inefficiently when the wrapped RandomAccessInput is a BufferedIndexInput: the internal buffer of BufferedIndexInput is forced to be refilled from the frozen/cold cache for every backward byte read. In Lucene the RandomAccessInput is created using IndexInput#randomAccessSlice(long offset, long length) and there is no indication of the forward/backward nature of reads (it's random, after all :) so enhancing BufferedIndexInput to support reverse cache reads might not be straightforward.

A possible solution could be to detect successive backward (or forward) reads and make a bet that the next read will also be backward (or forward) and refill the internal buffer accordingly. I can give this a try but I'd like to wait for an on-going effort we have that should add frozen tier to our nightly benchmarks. Once this is done we should hopefully better see the benefits of implementing this change.

@howardhuanghua
Copy link
Contributor Author

Thanks @tlrx , I mean we could still read forward, but just reverse fill 1k buffer.
For example, for tip file, if we read one byte from position 3000, than we read 1977-3000 into buffer, and retrive the last byte.
Here is the override readByte function demo in BaseCacheIndexInput.

+    @Override
+    public byte readByte(long pos) throws IOException {
+        long index = pos - bufferStart;
+        if (index < 0 || index >= buffer.limit()) {
+            if (fileInfo.physicalName().endsWith(TERMS_INDEX_EXTENSION)) {
+                // tip use ReverseRandomAccessReader, reverse fill buffer
+                long reverseStart = pos - bufferSize + 1;
+                bufferStart = reverseStart > 0 ? reverseStart : 0;
+                buffer.limit(0);  // trigger refill() on read
+                seekInternal(bufferStart);
+                refill();
+                index = reverseStart > 0 ? bufferSize - 1 : pos;
+            } else {
+                bufferStart = pos;
+                buffer.limit(0);  // trigger refill() on read
+                seekInternal(pos);
+                refill();
+                index = 0;
+            }
+        }
+        return buffer.get((int) index);
+    }

@romseygeek
Copy link
Contributor

This should be fixed by apache/lucene#12357

@nik9000
Copy link
Member

nik9000 commented Jun 20, 2023

That Lucene change should go live in Elasticsearch 8.9.0.

@nik9000 nik9000 closed this as completed Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

6 participants