Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After update to Lucene 9.4 use --enable-preview on Java==19 (exact) to allow mmap use new JDK APIs #4637

Closed
uschindler opened this issue Sep 29, 2022 · 34 comments · Fixed by #4973
Labels
Build Libraries & Interfaces enhancement Enhancement or improvement to existing feature or request

Comments

@uschindler
Copy link
Contributor

Description

Apache Lucene 9.4 will have support for Java 19 Panama APIs to mmap index files (using a MR-JAR). See apache/lucene#912 for more information.

As those APIs are not yet enabled by default in the JDK, we have to still use some opt-in approach, controlled by Java's command line:

  • Lucene by default uses the old implementation using MappedByteBuffer and several hacks which may also risk crushing the JDK if an open index is closed from another thread while a search is running (this is well known). If Java 19 is detected, Lucene will log a warning through JUL when MMapDirectory is initialized (see below).
  • If you pass --enable-preview to the Java command line (next to heap settings), it will enable preview APIs in JDK (https://openjdk.org/jeps/12). Lucene detects this and switches MMapDirectory and uses a new implementation MemorySegmentIndexInput for the inputs to use those new APIs (at moment it will also log this as "info" message to JUL). The new APIs are safe and can no longer crush the JVM. But most importantly, all index files are now mapped in portions of 16 GiB instead of 1 GiB into memory. In fact, unless an index is force-merged to one segment, all index files will then consist only of one memory mapping spawning the whole file! This will help Hotspot to further optimize reading as only one implementation on MemorySegmentIndexInput ist used. In addition, because the number of mappings is dramatically reduced (approximately 5 times less mappings, because the maximum segment size is 5 Gigabytes by default and all such segments now use one instead of 5 mappings). This may allow users to no longer change sysctl (see max_map_count @ https://opensearch.org/docs/latest/opensearch/install/important-settings/) and go with defaults of OS. On the other hand users may host more indexes with many more segments on one node.

Some TODOs:

  • Make sure that Opensearch also redirects stuff logged via java.util.logging (JUL) to its own log file, so they do not land in console. This can be done with log4j by adding the log4j-jul adapter and install it using a system property in the Bootstrap classes. I have not checked if this is already done. The reason for this is that Apache Lucene now logs some events using java.util.logging since Lucene 9.0. Some of those events are MMapDirectory messages (e.g., when unmapping was not working) or few others like some module system settings are incorrect. Logging is very seldom, but for this feature it will definitely log using JUL, so it would be good to make sure Opensearch redirects JUL logging correctly to its own loggers. This could be a separate issue!
  • The Opensearch startup script should pass --enable-preview as command line flag if exactly Java 19 is used to start up Opensearch. If this is not done, a warning gets logged (see above).

Important: Lucene 9.4 only supports this on Java 19 (exactly), because the APIs are in flux. If you start with Java 20, it falls back to the classical MMapDirectory. We will add support for Java 20 in a later release. The reason for this is that the class files of new implementation are marked by some special version numbers that make them ONLY compatible to Java 19, not earlier or later, to allow the JDK to apply changes to the API before final release in Java 21. But passing --enable-preview to later versions won't hurt, so maybe enable it on all versions >= 19.

A last note: The downside of this new code is that closing and unmapping an index file gets more heavy (it will trigger an safepoint in the JVM). We have not yet found out how much this impacts servers opening/closing index files a lot. Because of this we would really like Amazon/Opensearch to do benchmarking on this, ideally if their users and customers could optionally enable it. But benchmarking should be done now, because with hopefully Java 21, Lucene will use the new implementation by default. Java 20 will be the second and last preview round.

@uschindler uschindler added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 29, 2022
@dblock
Copy link
Member

dblock commented Sep 29, 2022

Thanks for the detailed proposal. Alongside these changes I think we need to start running at least some CI on both JDK 17 and JDK19. Currently we run 11 and 17 via https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/version.properties#L5.

@uschindler
Copy link
Contributor Author

uschindler commented Sep 29, 2022

Ah OK, I did not know which versions you are running. Because the sister project ES is already on JDK 19 for testing and bundles 18 at moment.

@reta
Copy link
Collaborator

reta commented Sep 29, 2022

We did some work to support JDK-18 [1], somehow overlooked JDK-19, @dblock I can take this part, will update CI images, etc.

[1] #1710

@dblock
Copy link
Member

dblock commented Sep 29, 2022

@reta What do you think about extracting a small subset of tests (not ITs for example) to run on GHA, so we can make the JDK matrix easy and visible in the repo, just like we do for plugins? (e.g. this one). Something more than precommit but less than gradle check? and reducing gradle check to ITs

@reta
Copy link
Collaborator

reta commented Sep 29, 2022

@reta What do you think about extracting a small subset of tests (not ITs for example) to run on GHA, so we can make the JDK matrix easy and visible in the repo, just like we do for plugins? (e.g. this one). Something more than precommit but less than gradle check? and reducing gradle check to ITs

@dblock certainly +1 to give it a try, I think we should be able to run everything on GHA but starting with subset is probably a good idea.

@uschindler
Copy link
Contributor Author

Of course when Lucene comes out tomorrow (hopefully, the schedule suggests it, but something bad can always happen), please update Lucene to 9.4.0 first. I have no idea at which stage the snapshot currently used is: 9.4.0-snapshot-ddf0d0a

@dblock
Copy link
Member

dblock commented Sep 29, 2022

@uschindler Yes, of course, but don't let us stop you :)

Read more about Lucene Snapshots here: https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#lucene-snapshots

@dbwiddis
Copy link
Member

+1 on this. I've been playing around with JDK19 and there's some great performance gains to be had. Downside is the temporary nature of anything we do, as we'll have to tweak it for 20 and then 21, but will be good to work out the kinks with an opt-in.

@uschindler
Copy link
Contributor Author

@dbwiddis You can use a similar approach for/with JNA vs. panama-foreign like we do by using an MR-JAR and adding some try/catch logic when loading the classes.

@dbwiddis
Copy link
Member

You can use a similar approach for/with JNA vs. panama-foreign like we do by using an MR-JAR

@uschindler it's less the practical aspect (it can be done) but more the future question of "what do we do a year from now when JDK 21 LTS is out". Or even 6 months from now when JDK 20 is out and JDK 19 is EOL and no longer receiving updates.

Are we going to continue to "support" older versions that are not getting security updates (which I might suggest are more likely with native memory access than other features)? Do we back out the "19" branch from the MRJAR? If it's the same API in 20 but there's a security fix in 20 that's not in 19, which version do we enforce in the MRJAR?

I'm all for trying to enable features (early) that we think will be in JDK 21 (with some possible tweaks to the API) but I think it should be with the expectation that we will flip 19 to 20, and then 21, and not make any attempt to "support" EOL versions.

@uschindler
Copy link
Contributor Author

uschindler commented Oct 1, 2022

Hi,
My plan is to have another Gradle SourceSet in Lucene for 20 and then one for 21. The logic to load will be the same (no changes needed).

If I figure out that in JDK 20 there are no API changes (most likely for the part of Panama we use), we may just add a Gradle task to copy the v19 class files to the v20 MR-JAR folder while patching classfile major and preview minor version (directly or via ASM visitor). This spares us to autoprovision JDK20. I will see what works. When JDK 21 LTS comes out we may drop support for the 19 and 20 in newer Lucene versions and use a plain MR-JAR variant without any preview supporting logic. But this would not break any code here. JDK 19 would fallback to the old code. 🤣

In short: I will support 20 in addition later.

@dbwiddis
Copy link
Member

dbwiddis commented Oct 1, 2022

When JDK 21 LTS comes out we may drop support for the 19 and 20 in newer Lucene versions

I think that aligns with my thinking. Of course I'm not the one to convince. :-)

@saratvemulapalli
Copy link
Member

Looks like we do want to use --enable-preview with Lucene 9.4, which means OpenSearch has to run on Java 19.
Is everybody agreeing with the change, looking for thoughts from @nknize.
Please do 👍🏻 👎🏻 .

@dblock
Copy link
Member

dblock commented Oct 11, 2022

So OpenSearch 3.x would be JDK 19 only as a breaking change (I think 👎 for me), or would support --enable-preview as an option and continue to be runnable on JDK11, 17 and 19 (definitely 👍 and could be on 2.x?)?

@reta
Copy link
Collaborator

reta commented Oct 11, 2022

Oh I think we should be testing with JDK 19 but not make it the default JDK bundle ... People could use it at will but we should not push it

@dbwiddis
Copy link
Member

So OpenSearch 3.x would be JDK 19 only as a breaking change (I think 👎 for me), or would support --enable-preview as an option and continue to be runnable on JDK11, 17 and 19 (definitely 👍 and could be on 2.x?)?

The request is in the original Issue text:

  • support JDK 11 and 17 with no changes
  • If, and only if, the user is running on JDK 19, add the --enable-preview on the command line

(Original issue also notes some logging changes required.)

@nknize
Copy link
Collaborator

nknize commented Oct 12, 2022

Make sure that Opensearch also redirects stuff logged via java.util.logging (JUL) to its own log file,...I have not checked if this is already done.

JUL log4j adapter was not configured so java.util.logging was only using stderr handler (note stdout / stderr are redirected to opensearch.stdout.log / opensearch.stderr.log but logger.fine in JUL was never logging since the log4j configuration wasn't propagating to the JUL logger). I opened a PR to install and configure the JUL log4j adapter so it logs through the same logger used across OpenSearch.

Oh I think we should be testing with JDK 19 but not make it the default JDK bundle ... People could use it at will but we should not push it

I think this is a good first step. However 3.0 is scheduled to go GA January next year and this is a good time to introduce any breaking changes (e.g., maybe start working on jigsaw modularization of OpenSearch?). So perhaps we think about opt in for OpenSearch 2.4 but maybe it becomes the default for 3.0 (we'll do this in a separate PR targeted for main branch only)? @uschindler, any concerns with cutting over to 19 as default bundled jdk for 3.0 GA? Maybe we continue to let --enable-preview be opt in in 3.0 even w/ default 19?

@nknize
Copy link
Collaborator

nknize commented Oct 12, 2022

We have not yet found out how much this impacts servers opening/closing index files a lot. Because of this we would really like Amazon/Opensearch to do benchmarking on this...

@anasalkouz @adnapibar are we publishing nightly benchmarks yet? We should be helping out the Lucene project w/ performance benchmarks like this using opensearch-benchmarks. /cc @mikemccand

@uschindler
Copy link
Contributor Author

That was my plan: bundle and test with JDK 19 and enable the preview mode only for that combination.

I think the magic can be done in this java bootstrap part that precalculates the initial java command before launching the main server.

@nknize
Copy link
Collaborator

nknize commented Oct 12, 2022

That was my plan: bundle and test with JDK 19 and enable the preview mode only for that combination.

+1 to follow suit and default bundle jdk 19 for OpenSearch 3.0 (main branch); and make it opt-in only for 2.x.

@reta
Copy link
Collaborator

reta commented Oct 27, 2022

JDK-19 support on Jenkins: opensearch-project/opensearch-build#2808

@reta
Copy link
Collaborator

reta commented Nov 8, 2022

The main is on JDK 19: #4973

@uschindler
Copy link
Contributor Author

Do we also pass enable-preview at runtime?

In the patch I only see it passed to javadoc, but why?

@reta
Copy link
Collaborator

reta commented Nov 8, 2022

Do we also pass enable-preview at runtime?

Not yet, will do that shortly

In the patch I only see it passed to javadoc, but why?

> Task :server:javadoc                                                                         
error: class file for /org.apache.lucene/lucene-core/9.5.0-snapshot-a4ef70f/e949897fa24e14d2701a3c41fe27a4f094681b81/lucene-core-9.5.0-snapshot-a4ef70f.jar(/org/apache/lucene/store/MemorySegmentIndexInputProvider.class) uses preview features of Java SE 19.
  (use --enable-preview to allow loading of class files which contain preview features)
error: class file for /org.apache.lucene/lucene-core/9.5.0-snapshot-a4ef70f/e949897fa24e14d2701a3c41fe27a4f094681b81/lucene-core-9.5.0-snapshot-a4ef70f.jar(/org/apache/lucene/store/MemorySegmentIndexInput.class) uses preview features of Java SE 19.
  (use --enable-preview to allow loading of class files which contain preview features)
error: class file for /org.apache.lucene/lucene-core/9.5.0-snapshot-a4ef70f/e949897fa24e14d2701a3c41fe27a4f094681b81/lucene-core-9.5.0-snapshot-a4ef70f.jar(/org/apache/lucene/store/MemorySegmentIndexInput$MultiSegmentImpl.class) uses preview features of Java SE 19.
  (use --enable-preview to allow loading of class files which contain preview features)

@uschindler
Copy link
Contributor Author

Do we also pass enable-preview at runtime?

Not yet, will do that shortly

In the patch I only see it passed to javadoc, but why?

> Task :server:javadoc                                                                         
error: class file for /org.apache.lucene/lucene-core/9.5.0-snapshot-a4ef70f/e949897fa24e14d2701a3c41fe27a4f094681b81/lucene-core-9.5.0-snapshot-a4ef70f.jar(/org/apache/lucene/store/MemorySegmentIndexInputProvider.class) uses preview features of Java SE 19.
  (use --enable-preview to allow loading of class files which contain preview features)
error: class file for /org.apache.lucene/lucene-core/9.5.0-snapshot-a4ef70f/e949897fa24e14d2701a3c41fe27a4f094681b81/lucene-core-9.5.0-snapshot-a4ef70f.jar(/org/apache/lucene/store/MemorySegmentIndexInput.class) uses preview features of Java SE 19.
  (use --enable-preview to allow loading of class files which contain preview features)
error: class file for /org.apache.lucene/lucene-core/9.5.0-snapshot-a4ef70f/e949897fa24e14d2701a3c41fe27a4f094681b81/lucene-core-9.5.0-snapshot-a4ef70f.jar(/org/apache/lucene/store/MemorySegmentIndexInput$MultiSegmentImpl.class) uses preview features of Java SE 19.
  (use --enable-preview to allow loading of class files which contain preview features)

The problem is because during javadoc you do not pass --release 17 or like that. So it fails because it can "see" those classes during javadoc, although it won't use them. That's strange, looks like a bug in javadoc to me. I thought the target JDK for compile is still 17, or was this changed to release 19?

@reta
Copy link
Collaborator

reta commented Nov 8, 2022

I thought the target JDK for compile is still 17, or was this changed to release 19?

The target is actually set to 11, but we didn't pass anything to javadoc as target

@uschindler
Copy link
Contributor Author

As far as I see you do not pass any release version to the compiler. So your java files are marked as 19. Therefore you need java 19 to build javadoc. And therefore it reads all class files from Lucene, also the preview ones.

I doubt that it will work with JDK 20. If you execute compilation with 20 and then enforce javadoc to 19 it will also fail.

So be consistent: pass consistent predefined release version to both compiler and javadoc. Just run tests with latest version. This makes build reproducible.

In java we compile against release 11 (Lucene 9) and against release 17 (Lucenes main), but run tests with what Gradle provides.

@reta
Copy link
Collaborator

reta commented Nov 8, 2022

So be consistent: pass consistent predefined release version to both compiler and javadoc. Just run tests with latest version. This makes build reproducible.

Fixed that for Javadoc (#5151), for compiler we use Gradle's source/target compatibility settings.

@reta
Copy link
Collaborator

reta commented Nov 8, 2022

In java we compile against release 11 (Lucene 9) and against release 17 (Lucenes main), but run tests with what Gradle provides.

You use toolchain, correct?

@uschindler
Copy link
Contributor Author

So be consistent: pass consistent predefined release version to both compiler and javadoc. Just run tests with latest version. This makes build reproducible.

Fixed that for Javadoc (#5151), for compiler we use Gradle's source/target compatibility settings.

That's the correct variant. I did not find the target compatibility in your Gradle file. Gradle does tot automatically pass release to java and javadoc globally, you need to configure it in every task. To be 100% safe, you should also pass 11 as release to javac. This also ensures that you compile against 11 API. Target and source only produce class files for 11 but it compiles against actual java library.

But the fix looks much better now.

@reta
Copy link
Collaborator

reta commented Nov 8, 2022

Thanks @uschindler here we are https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/build.gradle#L163 (OpenSearch build is quite complex) :-)

@uschindler
Copy link
Contributor Author

uschindler commented Nov 8, 2022

In java we compile against release 11 (Lucene 9) and against release 17 (Lucenes main), but run tests with what Gradle provides.

You use toolchain, correct?

Partially, yes, to compile java 19 classes for mmap. For the main sources we use --release 11 also on compiler. This will tell javac to compile against a signature file with java 11 class declarations (all versions after java 8 ship with some minimized jar file containing stripped class files without bodies for all major versions).

@dblock
Copy link
Member

dblock commented Dec 27, 2022

FYI apache/lucene#12033

@uschindler
Copy link
Contributor Author

Sorry, forgot to post this here. After update to Lucenes 9.5 you can remove the flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Libraries & Interfaces enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants