Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [Java] Require Java 22 to build Arrow #43050

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

laurentgo
Copy link
Contributor

@laurentgo laurentgo commented Jun 25, 2024

** This is work in progress **

Rationale for this change

This change demonstrates how to modernize Java toolchain to build Arrow and open opportunities to introduce support for some of the recent Java new features while maintaining compatibility with Java 11.

Java tools used by our build system (like spotless, google-java-format, error-prone, etc.) have dropped support for Java 8 and may do so as well with Java 11 causing the build system to use unsupported versions.

The lack of recent java versions in our toolchains also means that adopting recent Java features (through mechanisms like multi-release jars for example) is currently not possible.

What changes are included in this PR?

This change makes Java 22 the minimum version required to build Arrow while Java 11 remains the minimum Java version required to use Arrow libraries and tools.

For the purpose of demonstrating the end result, this change actually captures multiple changes:

  • Compile arrow code base with --release flag, allowing true Java 11 compilation independently of the JDK version used by the build
  • Add support for cross version testing: Introduce a property arrow.test.jdk-version which accepts a JDK version to run unit and integration tests with (the JDK needs to be declared in Maven toolchains first), which is different from the JDK version used to build Arrow
  • Finally make Java 22 the minimum version required to build Arrow and update CI builds accordingly (there are several image changes needed to support Java 22. This change also introduces a new multi-JDK docker image)

Are these changes tested?

Those changes have been tested via github actions as much as possible. I haven't tested with crossbow yet/

Are there any user-facing changes?

No direct user-facing changes but developers will be required to update their Java toolchain.

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Laurent! This looks like a nice bit of cleanup. It'll make it easier to drop Java 8 in the future as well.

# specific language governing permissions and limitations
# under the License.

FROM maven:3.9.6-eclipse-temurin-22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to parameterize the version numbers in a central location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, it should be possible :)

env JAVA11_HOME /opt/java/openjdk11
env JAVA17_HOME /opt/java/openjdk17
env JAVA21_HOME /opt/java/openjdk21
env JAVA22_HOME /opt/java/openjdk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we call this JAVA_LATEST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a convention about using JAVAX_HOME where X represents the Java version. This convention is used by Maven JDK toolchains discovery mechanism. At the same time, JAVA_HOME already points to the latest JDK version.

For those reasons, I'm not sure we need JAVA_LATEST?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I agree if there is already a convention to follow.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 26, 2024
@danepitkin
Copy link
Member

@github-actions crossbow submit -g java

Copy link

Revision: e31e655

Submitted crossbow builds: ursacomputing/crossbow @ actions-7d97f6c645

Task Status
java-jars GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@laurentgo
Copy link
Contributor Author

Looks like crossbow tests are failing (which is expected since I didn't try to run them and modified all of the actions).

I will try and see if I can modify, but I am also trying to gauge the interest for this work, and if yes, if I should decomposing this work into sub issues

@lidavidm
Copy link
Member

So we'd use Java 21, but we'd still be limited to Java 8 features and dependencies, other than maybe things like being able to upgrade ErrorProne now that we don't build with Java 8 - is that right?

@laurentgo
Copy link
Contributor Author

laurentgo commented Jun 26, 2024

So we'd use Java 21, but we'd still be limited to Java 8 features and dependencies, other than maybe things like being able to upgrade ErrorProne now that we don't build with Java 8 - is that right?

Not necessarily. By leveraging MRJAR feature, you could create src/main/java21 directory and instruct the maven-compiler-plugin to use Java21 to build that code and get it packaged inside the jar under META-INF/versions/M/.... Using this mechanism you can now have two implementations of the same class, one for Java 8 up to Java 20, and one for java 21 and higher (the right one being picked up by the JVM automatically)

We could start with module-info.java which are Java9+ files for example...

@lidavidm
Copy link
Member

Ah, I see.

@danepitkin the plan was to drop Java 8 for the 18.0.0 release, right? Or was it 19.0.0?

@danepitkin
Copy link
Member

The current plan is to drop in v18. It would be nice to get some help on this though.

@lidavidm
Copy link
Member

I will try and see if I can modify, but I am also trying to gauge the interest for this work, and if yes, if I should decomposing this work into sub issues

I think this work is valuable and I am at least OK with requiring Java 21 for building. But if we are going to drop JDK8 support entirely soon-ish then it sounds like there's less to be gained from this work.

@laurentgo
Copy link
Contributor Author

There are still some long term gains:

  • Some plugins are already considering dropping Java 11 support as the official OpenJDK end of life date is October 2024 (while Java 8 is supported until November 2026...)
  • I believe a couple of recent Java features could be of great interest to the Arrow project:
  • Foreign Function and Memory API: FFI could provide a better solution than JNI for some of adapters and Gandiva. The new memory API could provide a better memory allocator than the current Netty one
  • Vector API: Could possibly help with vectorized operations

But to be able to experiment/introduce support for those new API, a prerequisite is still to have a modern toolchain

@lidavidm
Copy link
Member

lidavidm commented Jul 1, 2024

Yes, I think this is still helpful - but dropping Java 8 may simplify this?

@laurentgo
Copy link
Contributor Author

Yes. it would not be necessary to refactor sun.misc.Unsafe access although it would be still worth it to remove it as a public field from MemoryUtil IMHO

Change miminum java build version to 22. This doesn't change the minimum
version to use Arrow which is still Java 11

Update docker images for java jni/conda integration tests to use recent
python image in order to install java 22
@laurentgo laurentgo force-pushed the laurentgo/wip-update-java-version branch from e31e655 to 9bfdd59 Compare July 29, 2024 22:37
@laurentgo laurentgo requested a review from jonkeane as a code owner July 29, 2024 22:37
@laurentgo
Copy link
Contributor Author

I just rebased on top of current master and remove already merged code and the changes to MemoryUtil. Will update the description as well

@@ -32,8 +32,12 @@ Arrow Java uses the `Maven <https://maven.apache.org/>`_ build system.

Building requires:

* JDK 22+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it 21 or 22? (preferably we would at least build with the latest LTS?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to test with 22, one of the latest Java features I'd like see Arrow adopt is JEP 454 - Foreign Function and Memory API which was released as part of Java 22

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with 22 is fine, but I'd like the minimum version to at least be an LTS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have an optionally-enabled submodule for FFM experiments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible but what about the release process? wouldn't that require Java 22?

Copy link
Member

@assignUser assignUser Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delaying any (optional) use of FFM/FFI to Java 25 so not before september 2025.

hm, I see your point. The main issue I see is when we need to re-build the binaries either for a patch release or other reasons, if we originally release with a non lts version and this version e.g. 22 is no longer available additional changes will be required that potentially have other issues that need to be fixed (e.g. dependencies). vs. just using the lts that will continue to be available for much longer.

Though this is extrapolated from previous patch/backport experience and not actually based on java, so maybe my fears are unfounded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as docker images or adoptium/temurin availability of non-LTS version. it seems older versions are still available. It is to be the same for the Github setup-java action. The issue may be homebrew though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK packaging with 22, I just want to leave it open for people to build with 21 (or ideally, even 17) if needed, minus any FFM code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So for now I think I will pause on this change and will wait until I'm ready to propose a change requiring the use of a newer Java version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurentgo what would be the expected changes with this PR, if we break it down to the list of proposed changes?
(Removing UNSAFE, etc)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 30, 2024
@assignUser assignUser marked this pull request as draft July 30, 2024 01:31
@laurentgo laurentgo changed the title WIP: [Java] Require Java 21 to build Arrow WIP: [Java] Require Java 22 to build Arrow Jul 30, 2024
@danepitkin
Copy link
Member

@github-actions crossbow submit test-debian-12-docs

Copy link

Unable to match any tasks for `test-debian-12-docs`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/10372038877

@lidavidm
Copy link
Member

lidavidm commented Oct 8, 2024

It seems error-prone is requiring Java 17 for building (can still target Java 11) so we may want to consider this again if other tools are going to do the same thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants