Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace boolean flags on IOContext with an enum. #13219

Merged
merged 4 commits into from
Mar 27, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Mar 26, 2024

This replaces the load, randomAccess and readOnce flags with a ReadAdvice enum, whose values are aligned with the allowed values to (f|m)advise.

Closes #13211

This replaces the `load`, `randomAccess` and `readOnce` flags with a
`ReadAdvice` enum, whose values are aligned with the allowed values to
(f|m)advise.

Closes apache#13211
@jpountz jpountz requested a review from uschindler March 26, 2024 14:01
@@ -53,7 +54,7 @@ public VariableGapTermsIndexReader(SegmentReadState state) throws IOException {
state.segmentInfo.name,
state.segmentSuffix,
VariableGapTermsIndexWriter.TERMS_INDEX_EXTENSION);
final IndexInput in = state.directory.openInput(fileName, state.context.toReadOnce());
final IndexInput in = state.directory.openInput(fileName, IOContext.READONCE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change to this class wouldn't be necessary anymore once #13216 is merged.

* loads the content of the file into the page cache at open time. This should only be used on
* very small files that can be expected to fit in RAM with very high confidence.
*/
LOAD
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a better name for this that is more aligned with other constant names. RANDOM_ACCESS_MEMORY?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like load, should be preload, maybe RANDOM_PRELOAD?


public static final IOContext RANDOM = new IOContext(false, false, true);
public static final IOContext RANDOM = new IOContext(ReadAdvice.RANDOM);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW kept these constant names as-is rather than align them with ReadAdvice constant names on purpose as they convey stronger expectations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still refactor and rename them a bit. The LOAD should really be PRELOAD.

return POSIX_MADV_SEQUENTIAL;
}
return null;
return switch (ctx.readAdvice()) {
Copy link
Contributor

@uschindler uschindler Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the context from the signature and change it to madvise(MemorySegment, ReadAdvice). MemorySegmentIndexInputProvider would just pass context.readAdvice() to madvise() then.

@@ -54,58 +43,74 @@ public enum Context {
DEFAULT
};

public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
/** Advice regarding the read access pattern. */
public enum ReadAdvice {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this toplevel class!? I am tempting between both variants.

Could we maybe rename the inner Context as the name IOContext is so similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, I'm leaving changes to Context to a separate PR to keep this one focused.

}
if (readOnce && randomAccess) {
throw new IllegalArgumentException("readOnce and randomAccess are mutually exclusive");
if (context == Context.MERGE && readAdvice != ReadAdvice.SEQUENTIAL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a good idea! It makes code much easier and the merge case needs no special handling in MMapDir.

* loads the content of the file into the page cache at open time. This should only be used on
* very small files that can be expected to fit in RAM with very high confidence.
*/
LOAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like load, should be preload, maybe RANDOM_PRELOAD?

* to provide stronger guarantees on query latency.
* @param randomAccess This flag indicates that the file will be accessed randomly. If this flag is
* set, then readOnce will be false.
* @param readAdvice Advice regarding the read access pattern. Write operations should disregard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing in our case is always sequential (OutputStream). If we have a solutions for fadvise when writing files we can add another enum.

* or by reading data through the {@link RandomAccessInput} abstraction in random order.
*/
RANDOM,
/** Data is expected to be read sequentially with very little seeking at most. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The madvise flags also say "Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)"

The second sentence is important as this is exactly our use case

This is also the reason why we don't use sequential for preloaded files, as it's a "read once" like approach.

@uschindler uschindler added this to the 10.0.0 milestone Mar 26, 2024
@uschindler
Copy link
Contributor

P.S.: Are we using RANDOM at the moment?

I also found elastic/elasticsearch#27748, this person suggests to pass RANDOM for everything.

@@ -135,12 +135,12 @@ public void madvise(MemorySegment segment, IOContext context) throws IOException
}
}

private Integer mapIOContext(IOContext ctx) {
return switch (ctx.readAdvice()) {
private Integer mapIOContext(ReadAdvice readAdvice) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be mapReadAdvice() :-)


public static final IOContext LOAD = new IOContext(false, true, true);
public static final IOContext PRELOAD = new IOContext(ReadAdvice.RANDOM_PRELOAD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need a MIGRATE.md entry.

@jpountz
Copy link
Contributor Author

jpountz commented Mar 26, 2024

P.S.: Are we using RANDOM at the moment?

Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD.

I also found elastic/elasticsearch#27748, this person suggests to pass RANDOM for everything.

Yeah, Wikimedia also did testing and they report getting best performance with a mmap readahead of 16kB instead of the default of 128kB (it's shared on the same thread). It feels a bit like a bug to me that mmap has such a higher readahead than regular read operations, I wonder if we should recommend lowering this default readahead in our wiki / javadocs instead of trying to work around it by passing RANDOM everywhere. My preference would be to not index too much on how the various hints perform in practice and try to provide what seems to be the correct read advice based on what we know of the access patterns. E.g. postings and doc values data should probably use NORMAL, stored fields, term vectors and vectors data should probably use RANDOM, etc.

@jpountz jpountz requested a review from uschindler March 26, 2024 15:59
@uschindler
Copy link
Contributor

P.S.: Are we using RANDOM at the moment?

Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD.

I also found elastic/elasticsearch#27748, this person suggests to pass RANDOM for everything.

Yeah, Wikimedia also did testing and they report getting best performance with a mmap readahead of 16kB instead of the default of 128kB (it's shared on the same thread). It feels a bit like a bug to me that mmap has such a higher readahead than regular read operations, I wonder if we should recommend lowering this default readahead in our wiki / javadocs instead of trying to work around it by passing RANDOM everywhere. My preference would be to not index too much on how the various hints perform in practice and try to provide what seems to be the correct read advice based on what we know of the access patterns. E.g. postings and doc values data should probably use NORMAL, stored fields, term vectors and vectors data should probably use RANDOM, etc.

The question that I have about this: How to handle merging then? If we use random access for some files and then want to merge away the segments. As you said before, the problem is with reused NRT readers for merging. I think, we should not hardcode the RANDOM flag now on all files?!

It is good that IOContext with MergeInfo always requires SEQUENTIAL, but is this really used in all cases when we merge? When its hardcoded while opening index files we have a problem. The example of the vargaps reader has exactly that problem: It always uses readOnce.

I think you are more familar with how the merging works, these are just some points to consider.

@jpountz
Copy link
Contributor Author

jpountz commented Mar 26, 2024

The question that I have about this: How to handle merging then?

This is a big question to me too. With reader pooling, if you open a reader and then it gets included in a merge, we'll reuse the same SegmentReader and its existing open index inputs, which have likely been open with a NORMAL or RANDOM advice. Ideally there would be a way for our getMergeInstance() APIs to somehow return a clone that has a different read advice.

It is good that IOContext with MergeInfo always requires SEQUENTIAL, but is this really used in all cases when we merge?

My understanding is that it will only be used if the index input is created with the IOContext that is set on the SegmentReadState and this reader has been open specifically for merging, said otherwise the index has not been reopened between the time when the segment got written and the time when the same segment got merged away. This is only going to cover a small subset of the segments that we write, this doesn't look good enough to me.

@jpountz
Copy link
Contributor Author

jpountz commented Mar 26, 2024

P.S.: Are we using RANDOM at the moment?

FYI I tried to start switching some files to it at #13222 and discussed some limitations.

@jpountz jpountz merged commit 8558934 into apache:main Mar 27, 2024
3 checks passed
@jpountz jpountz deleted the replace_boolean_flags_IOContext branch March 27, 2024 08:13
jpountz added a commit to jpountz/lucene that referenced this pull request Mar 27, 2024
This is a follow-up of a discussion on apache#13219. `mmap` has a higher readahead
than regular `read()` operations by default, e.g. 128kB instead of 16kB on my
Linux box. On indexes that exceed the size of the page cache, this may trigger
performance issues due to page cache trashing and additional page cache
contention. Rather than forcing `MMapDirectory` to use `MADV_RANDOM` on all
files, it would make more sense to configure a lower `mmap` readahead at the
system level, e.g. the same readahead value as `read()` operations use.
shatejas pushed a commit to shatejas/lucene that referenced this pull request Oct 13, 2024
This replaces the `load`, `randomAccess` and `readOnce` flags with a
`ReadAdvice` enum, whose values are aligned with the allowed values to
(f|m)advise.

Closes apache#13211
shatejas pushed a commit to shatejas/lucene that referenced this pull request Nov 17, 2024
This replaces the `load`, `randomAccess` and `readOnce` flags with a
`ReadAdvice` enum, whose values are aligned with the allowed values to
(f|m)advise.

Closes apache#13211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace boolean flags on IOContext with an enum
2 participants