Replace boolean flags on `IOContext` with an enum. #13219

jpountz · 2024-03-26T14:01:37Z

This replaces the load, randomAccess and readOnce flags with a ReadAdvice enum, whose values are aligned with the allowed values to (f|m)advise.

Closes #13211

This replaces the `load`, `randomAccess` and `readOnce` flags with a `ReadAdvice` enum, whose values are aligned with the allowed values to (f|m)advise. Closes apache#13211

jpountz · 2024-03-26T14:07:11Z

lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/VariableGapTermsIndexReader.java

@@ -53,7 +54,7 @@ public VariableGapTermsIndexReader(SegmentReadState state) throws IOException {
            state.segmentInfo.name,
            state.segmentSuffix,
            VariableGapTermsIndexWriter.TERMS_INDEX_EXTENSION);
-    final IndexInput in = state.directory.openInput(fileName, state.context.toReadOnce());
+    final IndexInput in = state.directory.openInput(fileName, IOContext.READONCE);


A change to this class wouldn't be necessary anymore once #13216 is merged.

jpountz · 2024-03-26T14:07:50Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java

+     * loads the content of the file into the page cache at open time. This should only be used on
+     * very small files that can be expected to fit in RAM with very high confidence.
+     */
+    LOAD


I wonder if there's a better name for this that is more aligned with other constant names. RANDOM_ACCESS_MEMORY?

I don't like load, should be preload, maybe RANDOM_PRELOAD?

jpountz · 2024-03-26T14:13:20Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java


-  public static final IOContext RANDOM = new IOContext(false, false, true);
+  public static final IOContext RANDOM = new IOContext(ReadAdvice.RANDOM);


FWIW kept these constant names as-is rather than align them with ReadAdvice constant names on purpose as they convey stronger expectations.

We can still refactor and rename them a bit. The LOAD should really be PRELOAD.

uschindler · 2024-03-26T14:29:05Z

lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java

-      return POSIX_MADV_SEQUENTIAL;
-    }
-    return null;
+    return switch (ctx.readAdvice()) {


I think we can remove the context from the signature and change it to madvise(MemorySegment, ReadAdvice). MemorySegmentIndexInputProvider would just pass context.readAdvice() to madvise() then.

uschindler · 2024-03-26T14:32:57Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java

@@ -54,58 +43,74 @@ public enum Context {
    DEFAULT
  };

-  public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+  /** Advice regarding the read access pattern. */
+  public enum ReadAdvice {


Maybe make this toplevel class!? I am tempting between both variants.

Could we maybe rename the inner Context as the name IOContext is so similar.

If you don't mind, I'm leaving changes to Context to a separate PR to keep this one focused.

uschindler · 2024-03-26T14:34:20Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java

-    }
-    if (readOnce && randomAccess) {
-      throw new IllegalArgumentException("readOnce and randomAccess are mutually exclusive");
+    if (context == Context.MERGE && readAdvice != ReadAdvice.SEQUENTIAL) {


This is really a good idea! It makes code much easier and the merge case needs no special handling in MMapDir.

uschindler · 2024-03-26T14:35:48Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java

+     * loads the content of the file into the page cache at open time. This should only be used on
+     * very small files that can be expected to fit in RAM with very high confidence.
+     */
+    LOAD


I don't like load, should be preload, maybe RANDOM_PRELOAD?

uschindler · 2024-03-26T14:37:00Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java

- *     to provide stronger guarantees on query latency.
- * @param randomAccess This flag indicates that the file will be accessed randomly. If this flag is
- *     set, then readOnce will be false.
+ * @param readAdvice Advice regarding the read access pattern. Write operations should disregard


Writing in our case is always sequential (OutputStream). If we have a solutions for fadvise when writing files we can add another enum.

uschindler · 2024-03-26T14:40:17Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java

+     * or by reading data through the {@link RandomAccessInput} abstraction in random order.
+     */
+    RANDOM,
+    /** Data is expected to be read sequentially with very little seeking at most. */


The madvise flags also say "Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)"

The second sentence is important as this is exactly our use case

This is also the reason why we don't use sequential for preloaded files, as it's a "read once" like approach.

uschindler · 2024-03-26T15:06:36Z

P.S.: Are we using RANDOM at the moment?

I also found elastic/elasticsearch#27748, this person suggests to pass RANDOM for everything.

uschindler · 2024-03-26T15:08:36Z

lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java

@@ -135,12 +135,12 @@ public void madvise(MemorySegment segment, IOContext context) throws IOException
    }
  }

-  private Integer mapIOContext(IOContext ctx) {
-    return switch (ctx.readAdvice()) {
+  private Integer mapIOContext(ReadAdvice readAdvice) {


should be mapReadAdvice() :-)

uschindler · 2024-03-26T15:12:41Z

lucene/core/src/java/org/apache/lucene/store/IOContext.java


-  public static final IOContext LOAD = new IOContext(false, true, true);
+  public static final IOContext PRELOAD = new IOContext(ReadAdvice.RANDOM_PRELOAD);


This may need a MIGRATE.md entry.

jpountz · 2024-03-26T15:54:49Z

P.S.: Are we using RANDOM at the moment?

Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD.

I also found elastic/elasticsearch#27748, this person suggests to pass RANDOM for everything.

Yeah, Wikimedia also did testing and they report getting best performance with a mmap readahead of 16kB instead of the default of 128kB (it's shared on the same thread). It feels a bit like a bug to me that mmap has such a higher readahead than regular read operations, I wonder if we should recommend lowering this default readahead in our wiki / javadocs instead of trying to work around it by passing RANDOM everywhere. My preference would be to not index too much on how the various hints perform in practice and try to provide what seems to be the correct read advice based on what we know of the access patterns. E.g. postings and doc values data should probably use NORMAL, stored fields, term vectors and vectors data should probably use RANDOM, etc.

uschindler · 2024-03-26T16:21:07Z

P.S.: Are we using RANDOM at the moment?

Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD.

I also found elastic/elasticsearch#27748, this person suggests to pass RANDOM for everything.

Yeah, Wikimedia also did testing and they report getting best performance with a mmap readahead of 16kB instead of the default of 128kB (it's shared on the same thread). It feels a bit like a bug to me that mmap has such a higher readahead than regular read operations, I wonder if we should recommend lowering this default readahead in our wiki / javadocs instead of trying to work around it by passing RANDOM everywhere. My preference would be to not index too much on how the various hints perform in practice and try to provide what seems to be the correct read advice based on what we know of the access patterns. E.g. postings and doc values data should probably use NORMAL, stored fields, term vectors and vectors data should probably use RANDOM, etc.

The question that I have about this: How to handle merging then? If we use random access for some files and then want to merge away the segments. As you said before, the problem is with reused NRT readers for merging. I think, we should not hardcode the RANDOM flag now on all files?!

It is good that IOContext with MergeInfo always requires SEQUENTIAL, but is this really used in all cases when we merge? When its hardcoded while opening index files we have a problem. The example of the vargaps reader has exactly that problem: It always uses readOnce.

I think you are more familar with how the merging works, these are just some points to consider.

jpountz · 2024-03-26T16:37:12Z

The question that I have about this: How to handle merging then?

This is a big question to me too. With reader pooling, if you open a reader and then it gets included in a merge, we'll reuse the same SegmentReader and its existing open index inputs, which have likely been open with a NORMAL or RANDOM advice. Ideally there would be a way for our getMergeInstance() APIs to somehow return a clone that has a different read advice.

It is good that IOContext with MergeInfo always requires SEQUENTIAL, but is this really used in all cases when we merge?

My understanding is that it will only be used if the index input is created with the IOContext that is set on the SegmentReadState and this reader has been open specifically for merging, said otherwise the index has not been reopened between the time when the segment got written and the time when the same segment got merged away. This is only going to cover a small subset of the segments that we write, this doesn't look good enough to me.

jpountz · 2024-03-26T16:38:02Z

P.S.: Are we using RANDOM at the moment?

FYI I tried to start switching some files to it at #13222 and discussed some limitations.

This is a follow-up of a discussion on apache#13219. `mmap` has a higher readahead than regular `read()` operations by default, e.g. 128kB instead of 16kB on my Linux box. On indexes that exceed the size of the page cache, this may trigger performance issues due to page cache trashing and additional page cache contention. Rather than forcing `MMapDirectory` to use `MADV_RANDOM` on all files, it would make more sense to configure a lower `mmap` readahead at the system level, e.g. the same readahead value as `read()` operations use.

This replaces the `load`, `randomAccess` and `readOnce` flags with a `ReadAdvice` enum, whose values are aligned with the allowed values to (f|m)advise. Closes apache#13211

Replace boolean flags on IOContext with an enum.

e61a168

This replaces the `load`, `randomAccess` and `readOnce` flags with a `ReadAdvice` enum, whose values are aligned with the allowed values to (f|m)advise. Closes apache#13211

jpountz requested a review from uschindler March 26, 2024 14:01

jpountz commented Mar 26, 2024

View reviewed changes

uschindler reviewed Mar 26, 2024

View reviewed changes

uschindler added this to the 10.0.0 milestone Mar 26, 2024

uschindler added the type:enhancement label Mar 26, 2024

feedback

cc3b902

uschindler approved these changes Mar 26, 2024

View reviewed changes

feedback

94ca7c2

uschindler reviewed Mar 26, 2024

View reviewed changes

CHANGELOG + MIGRATE

97eb122

jpountz requested a review from uschindler March 26, 2024 15:59

uschindler approved these changes Mar 26, 2024

View reviewed changes

jpountz merged commit 8558934 into apache:main Mar 27, 2024
3 checks passed

jpountz deleted the replace_boolean_flags_IOContext branch March 27, 2024 08:13

jpountz mentioned this pull request Mar 27, 2024

Recommend lowering the default mmap readahead. #13223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace boolean flags on `IOContext` with an enum. #13219

Replace boolean flags on `IOContext` with an enum. #13219

jpountz commented Mar 26, 2024

jpountz Mar 26, 2024

jpountz Mar 26, 2024

uschindler Mar 26, 2024

jpountz Mar 26, 2024

uschindler Mar 26, 2024

uschindler Mar 26, 2024 •

edited

Loading

uschindler Mar 26, 2024

jpountz Mar 26, 2024

uschindler Mar 26, 2024

uschindler Mar 26, 2024

uschindler Mar 26, 2024

uschindler Mar 26, 2024

uschindler commented Mar 26, 2024

uschindler Mar 26, 2024

uschindler Mar 26, 2024

jpountz commented Mar 26, 2024

uschindler commented Mar 26, 2024

jpountz commented Mar 26, 2024

jpountz commented Mar 26, 2024


		public static final IOContext RANDOM = new IOContext(false, false, true);
		public static final IOContext RANDOM = new IOContext(ReadAdvice.RANDOM);


		public static final IOContext LOAD = new IOContext(false, true, true);
		public static final IOContext PRELOAD = new IOContext(ReadAdvice.RANDOM_PRELOAD);

Replace boolean flags on IOContext with an enum. #13219

Replace boolean flags on IOContext with an enum. #13219

Conversation

jpountz commented Mar 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uschindler Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uschindler commented Mar 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz commented Mar 26, 2024

uschindler commented Mar 26, 2024

jpountz commented Mar 26, 2024

jpountz commented Mar 26, 2024

Replace boolean flags on `IOContext` with an enum. #13219

Replace boolean flags on `IOContext` with an enum. #13219

uschindler Mar 26, 2024 •

edited

Loading