-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace boolean flags on IOContext
with an enum.
#13219
Conversation
This replaces the `load`, `randomAccess` and `readOnce` flags with a `ReadAdvice` enum, whose values are aligned with the allowed values to (f|m)advise. Closes apache#13211
@@ -53,7 +54,7 @@ public VariableGapTermsIndexReader(SegmentReadState state) throws IOException { | |||
state.segmentInfo.name, | |||
state.segmentSuffix, | |||
VariableGapTermsIndexWriter.TERMS_INDEX_EXTENSION); | |||
final IndexInput in = state.directory.openInput(fileName, state.context.toReadOnce()); | |||
final IndexInput in = state.directory.openInput(fileName, IOContext.READONCE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A change to this class wouldn't be necessary anymore once #13216 is merged.
* loads the content of the file into the page cache at open time. This should only be used on | ||
* very small files that can be expected to fit in RAM with very high confidence. | ||
*/ | ||
LOAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a better name for this that is more aligned with other constant names. RANDOM_ACCESS_MEMORY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like load, should be preload, maybe RANDOM_PRELOAD
?
|
||
public static final IOContext RANDOM = new IOContext(false, false, true); | ||
public static final IOContext RANDOM = new IOContext(ReadAdvice.RANDOM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW kept these constant names as-is rather than align them with ReadAdvice
constant names on purpose as they convey stronger expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still refactor and rename them a bit. The LOAD should really be PRELOAD.
return POSIX_MADV_SEQUENTIAL; | ||
} | ||
return null; | ||
return switch (ctx.readAdvice()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the context from the signature and change it to madvise(MemorySegment, ReadAdvice)
. MemorySegmentIndexInputProvider
would just pass context.readAdvice()
to madvise()
then.
@@ -54,58 +43,74 @@ public enum Context { | |||
DEFAULT | |||
}; | |||
|
|||
public static final IOContext DEFAULT = new IOContext(Context.DEFAULT); | |||
/** Advice regarding the read access pattern. */ | |||
public enum ReadAdvice { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this toplevel class!? I am tempting between both variants.
Could we maybe rename the inner Context
as the name IOContext
is so similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, I'm leaving changes to Context
to a separate PR to keep this one focused.
} | ||
if (readOnce && randomAccess) { | ||
throw new IllegalArgumentException("readOnce and randomAccess are mutually exclusive"); | ||
if (context == Context.MERGE && readAdvice != ReadAdvice.SEQUENTIAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really a good idea! It makes code much easier and the merge case needs no special handling in MMapDir.
* loads the content of the file into the page cache at open time. This should only be used on | ||
* very small files that can be expected to fit in RAM with very high confidence. | ||
*/ | ||
LOAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like load, should be preload, maybe RANDOM_PRELOAD
?
* to provide stronger guarantees on query latency. | ||
* @param randomAccess This flag indicates that the file will be accessed randomly. If this flag is | ||
* set, then readOnce will be false. | ||
* @param readAdvice Advice regarding the read access pattern. Write operations should disregard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing in our case is always sequential (OutputStream). If we have a solutions for fadvise when writing files we can add another enum.
* or by reading data through the {@link RandomAccessInput} abstraction in random order. | ||
*/ | ||
RANDOM, | ||
/** Data is expected to be read sequentially with very little seeking at most. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The madvise flags also say "Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)"
The second sentence is important as this is exactly our use case
This is also the reason why we don't use sequential for preloaded files, as it's a "read once" like approach.
P.S.: Are we using RANDOM at the moment? I also found elastic/elasticsearch#27748, this person suggests to pass RANDOM for everything. |
@@ -135,12 +135,12 @@ public void madvise(MemorySegment segment, IOContext context) throws IOException | |||
} | |||
} | |||
|
|||
private Integer mapIOContext(IOContext ctx) { | |||
return switch (ctx.readAdvice()) { | |||
private Integer mapIOContext(ReadAdvice readAdvice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be mapReadAdvice()
:-)
|
||
public static final IOContext LOAD = new IOContext(false, true, true); | ||
public static final IOContext PRELOAD = new IOContext(ReadAdvice.RANDOM_PRELOAD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need a MIGRATE.md entry.
Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD.
Yeah, Wikimedia also did testing and they report getting best performance with a mmap readahead of 16kB instead of the default of 128kB (it's shared on the same thread). It feels a bit like a bug to me that mmap has such a higher readahead than regular read operations, I wonder if we should recommend lowering this default readahead in our wiki / javadocs instead of trying to work around it by passing RANDOM everywhere. My preference would be to not index too much on how the various hints perform in practice and try to provide what seems to be the correct read advice based on what we know of the access patterns. E.g. postings and doc values data should probably use NORMAL, stored fields, term vectors and vectors data should probably use RANDOM, etc. |
The question that I have about this: How to handle merging then? If we use random access for some files and then want to merge away the segments. As you said before, the problem is with reused NRT readers for merging. I think, we should not hardcode the RANDOM flag now on all files?! It is good that IOContext with MergeInfo always requires SEQUENTIAL, but is this really used in all cases when we merge? When its hardcoded while opening index files we have a problem. The example of the vargaps reader has exactly that problem: It always uses readOnce. I think you are more familar with how the merging works, these are just some points to consider. |
This is a big question to me too. With reader pooling, if you open a reader and then it gets included in a merge, we'll reuse the same
My understanding is that it will only be used if the index input is created with the |
FYI I tried to start switching some files to it at #13222 and discussed some limitations. |
This is a follow-up of a discussion on apache#13219. `mmap` has a higher readahead than regular `read()` operations by default, e.g. 128kB instead of 16kB on my Linux box. On indexes that exceed the size of the page cache, this may trigger performance issues due to page cache trashing and additional page cache contention. Rather than forcing `MMapDirectory` to use `MADV_RANDOM` on all files, it would make more sense to configure a lower `mmap` readahead at the system level, e.g. the same readahead value as `read()` operations use.
This replaces the `load`, `randomAccess` and `readOnce` flags with a `ReadAdvice` enum, whose values are aligned with the allowed values to (f|m)advise. Closes apache#13211
This replaces the `load`, `randomAccess` and `readOnce` flags with a `ReadAdvice` enum, whose values are aligned with the allowed values to (f|m)advise. Closes apache#13211
This replaces the
load
,randomAccess
andreadOnce
flags with aReadAdvice
enum, whose values are aligned with the allowed values to (f|m)advise.Closes #13211