Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel 6 Remote JDK Toolchains Incorrect Configuration #17085

Closed
jlaxson opened this issue Dec 29, 2022 · 6 comments
Closed

Bazel 6 Remote JDK Toolchains Incorrect Configuration #17085

jlaxson opened this issue Dec 29, 2022 · 6 comments
Labels
next_month Issues that we will review next month. This is currently mainly used by Configurability team P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: bug

Comments

@jlaxson
Copy link
Contributor

jlaxson commented Dec 29, 2022

Description of the bug:

Upgrading to Bazel 6.0.0, I get some toolchain resolution and build failures around java_library/java_binary rules. My environment is cross-compile heavy and has remote executors on a number of platforms (linux, darwin both x86 and arm64, windows) and bazel itself is invoked on any of those platforms building for any other.

I get issues when, due to peculiarities in execution platform selection order, an exec platform is chosen for java that is distinct from the target platform. For instance, target platform is macos_arm64, but the exec platform is @local_config_platform//:host on linux, or a remote linux platform. For instance, here is toolchain_resolution_debug from a cross compile:

INFO: ToolchainResolution:   Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform //:macos_arm: execution //:macos_arm: Selected toolchain @remotejdk11_macos_aarch64//:jdk
INFO: ToolchainResolution:   Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform //:macos_arm: execution //:macos: Selected toolchain @remotejdk11_macos_aarch64//:jdk
INFO: ToolchainResolution:   Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform //:macos_arm: execution //:linux: Selected toolchain @remotejdk11_macos_aarch64//:jdk
INFO: ToolchainResolution:   Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform //:macos_arm: execution //test:downloader: Selected toolchain @remotejdk11_macos_aarch64//:jdk
INFO: ToolchainResolution:   Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform //:macos_arm: execution @local_config_platform//:host: Selected toolchain @remotejdk11_macos_aarch64//:jdk

A number of these selections are wrong: remotejdk11_macos_aarch64 will not run on the host platform (it is linux), nor will it run on the //:linux remote execution platform. Nor will it run on the //test:downloader platform which specifies no OS constraints to match at all. If bazel happens to choose any but the first configuration with identical target and exec platform, it will try to execute a darwin arm64 binary on linux with predictable fireworks. In the ultimate case where the target and set of exec platforms are distinct, this is guaranteed to fail.

This also fails when targeting a platform that has no JDK such as Android. In Bazel 5.1 it worked just fine to have a java_library depended on by an android_library, now toolchain resolution just fails if you set --platforms to one with @platforms//os:android because there is no openjdk for android (and if there were, my remote exec environment has no android runners)

It seems that d5559c1 changed the definition of the java toolchains to specify only target_compatible_with, not exec_compatible_with, even though the jdk itself is platform-specific. I'm not sure I understand the benefits of that change, but at a minimum it seems like exec_compatible_with should also be specified for those toolchains.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

See https://github.com/jlaxson/bazel_java_toolchain_bug for the simple reduction involving android platforms.

Which operating system are you running Bazel on?

Macos, linux

What is the output of bazel info release?

release 6.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@fmeum
Copy link
Collaborator

fmeum commented Mar 3, 2023

Definitely not an expert on this, so take any of the following with a grain of salt.

Note that there are two relevant Java toolchain types: the one for the JDK used for compilation (@bazel_tools//tools/jdk:toolchain_type) and the one for the JDK/JRE used as the runtime for executing a java_binary (@bazel_tools//tools/jdk:runtime_toolchain_type).

The toolchain_type JDK is used for compilation and should thus be selected based on the exec platform. Since Java compilation outputs are platform-independent, the target platform doesn't matter here.

The runtime_toolchain_type JDK is added to the runfiles of a java_binary and is executed when you do bazel run, so this actually needs to be selected based on the target platform. If you execute a Java tool during the build, there will be a transition to the exec configuration at some point, which will make the exec platform the target platform and thus again select the correct runtime.

Based on my understanding of the points above, java_library should not have a toolchain dependency on @bazel_tools//tools/jdk:runtime_toolchain_type, only on @bazel_tools//tools/jdk:toolchain_type. That would allow you to have a java_library in the deps of an android_* target even though there is no runtime for Android. I checked the Starlark implementation of the java_library rules and it should allow this, but as far as I know the native implementation is enabled by default and I have no idea how it handles toolchains.

It's worrying that your reproducer seems to indicate that this doesn't work, which looks like it could indeed be a regression in Bazel 6.

@cushon Could you take a look?

Regarding the other toolchain resolution failures you are seeing: Is it possible that some custom rules or genrules you are using aren't correctly transitioning to the exec configuration? Do you also have a reproducer for these non-Android issues?

@fmeum
Copy link
Collaborator

fmeum commented Mar 3, 2023

Just noticed that the reproducer you linked has an empty main branch, but the content lives on the master branch.

@jlaxson
Copy link
Contributor Author

jlaxson commented Mar 3, 2023

Thanks @fmeum for the attention. I fixed up the reproducer branch and split it into two issues, one showing the problem with simply a java_library and a bespoke target platform.

I think I loosely understand the goal, but some issues I still see:

  • The reproducer breaks at java_library, not even java_binary needing to build out its runfiles tree
  • Even if only java_binary should have the runtime_toolchain_type dependency, I think you should still be able to build a java_binary without requiring a matching available toolchain?
  • When using bazel run, If you're going to be platform-sensitive with runfiles, wouldn't you want runfiles compatible with the host platform, not target? (If you're not using a hermetic jre, you're going to get the locally installed host-platform-compatible jre)

I also pulled in the starlark rules_java and the error persists (resolving the current_java_runtime alias). You can toggle the load statement in not_android_example/BUILD. I think an issue is that since the rule declares its need for the runtime_toolchain_type, that must always be valid at analysis time.

Regarding the other toolchain resolution failures you are seeing: Is it possible that some custom rules or genrules you are using aren't correctly transitioning to the exec configuration? Do you also have a reproducer for these non-Android issues?

Are you referring to the toolchain_resolution_debug output from above?

@cushon
Copy link
Contributor

cushon commented Mar 4, 2023

ERROR: <output_base>/external/bazel_tools/tools/jdk/BUILD:28:19: While resolving toolchains for target @bazel_tools//tools/jdk:current_java_runtime

You can use cquery to see the patch to the target that's resolving the toolchain:

bazel cquery 'somepath(//not_android_example/..., @bazel_tools//tools/jdk:current_java_runtime)'
...
//not_android_example:java_test (93797ec)
@bazel_tools//tools/jdk:toolchain_java8 (93797ec)
@bazel_tools//tools/jdk:platformclasspath (93797ec)
@bazel_tools//tools/jdk:current_java_runtime (9d8e7aa)

The platformclasspath stuff depends on a JDK in the target configuration so Java compilation can cross-compile to the right JDK version (this corresponds to javac's --bootclasspath/--system flags). The Java compilation output is architecture-independent, so we only really need a JDK with the same major version as the target platform, not necessarily the same architecture, but the way the provided Java toolchains are set up right now the two are tied together.

fmeum added a commit to fmeum/bazel that referenced this issue Jun 8, 2023
fmeum added a commit to fmeum/bazel that referenced this issue Jun 9, 2023
The `java_binary` Starlark implementation now only directly depends on a
Java runtime toolchain when it is used as an executable rule.

This work will eventually allow `java_binary` with
`create_executable = False` to be compiled for platforms that do not
provide a regular JDK, such as Android.

Work towards bazelbuild#17085
Work towards bazelbuild/rules_java#64
Split off from bazelbuild#18262
copybara-service bot pushed a commit that referenced this issue Jun 16, 2023
The `java_binary` Starlark implementation now only directly depends on a Java runtime toolchain when it is used as an executable rule.

This work will eventually allow `java_binary` with `create_executable = False` to be compiled for platforms that do not provide a regular JDK, such as Android.

Work towards #17085
Work towards bazelbuild/rules_java#64
Split off from #18262

Closes #18620.

PiperOrigin-RevId: 540884239
Change-Id: I3c06c02d1a9dc1be2775f409c393a386ac2260b7
copybara-service bot pushed a commit that referenced this issue Jun 19, 2023
Work towards #17085
Work towards bazelbuild/rules_java#64
Split off from #18262

Closes #18618.

PiperOrigin-RevId: 541562617
Change-Id: I46a3057dcd5eec19cfd6f4ca6da2f76f0c3c2e3f
copybara-service bot pushed a commit that referenced this issue Jul 5, 2023
The `bootclasspath` rule in `rules_java` will soon use this toolchain type instead of the regular Java runtime toolchain type, which naturally carries a constraint on the target platform.

Also adds more detailed explanations of the now three Java toolchain types.

Work towards #17085
Work towards #18265
Work towards bazelbuild/rules_java#64
Split off from #18262

Closes #18841.

PiperOrigin-RevId: 545756139
Change-Id: Ib9dd7a1c20c32315375722d6a023a89859daea9c
fmeum added a commit to CodeIntelligenceTesting/jazzer that referenced this issue Aug 1, 2023
Executable `java_binary` targets, i.e., those that can be `bazel run` or
used in build actions, require a Java runtime for the target platform.
Since there is no standalone Java runtime for Android, all Java targets
in the transitive closure of an `android_binary` must not be executable
for cross-compilation to work.

Note: Cross-compilation without flags is currently blocked on
bazelbuild/bazel#17085.
fmeum added a commit to CodeIntelligenceTesting/jazzer that referenced this issue Aug 1, 2023
Executable `java_binary` targets, i.e., those that can be `bazel run` or
used in build actions, require a Java runtime for the target platform.
Since there is no standalone Java runtime for Android, all Java targets
in the transitive closure of an `android_binary` must not be executable
for cross-compilation to work.

Note: Cross-compilation without flags is currently blocked on
bazelbuild/bazel#17085.
fmeum added a commit to CodeIntelligenceTesting/jazzer that referenced this issue Aug 2, 2023
Executable `java_binary` targets, i.e., those that can be `bazel run` or
used in build actions, require a Java runtime for the target platform.
Since there is no standalone Java runtime for Android, all Java targets
in the transitive closure of an `android_binary` must not be executable
for cross-compilation to work.

Note: Cross-compilation without flags is currently blocked on
bazelbuild/bazel#17085.
copybara-service bot pushed a commit that referenced this issue Aug 4, 2023
This allows deploy jars for `java_binary` targets with `create_executable = False` to compile for target platforms for which no standalone Java runtime is available (e.g. Android).

Along the way, this removes a redundant check (`runtime.hermetic_files` is never `None`) and ensures that files coming from the hermetic runtime are only added if `hermetic` is set to `True`.

Work towards #17085

Closes #19140.

PiperOrigin-RevId: 553855531
Change-Id: I56ce730190498e2adb2f8acba709f02ec9c13ef4
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Aug 8, 2023
The `bootclasspath` rule in `rules_java` will soon use this toolchain type instead of the regular Java runtime toolchain type, which naturally carries a constraint on the target platform.

Also adds more detailed explanations of the now three Java toolchain types.

Work towards bazelbuild#17085
Work towards bazelbuild#18265
Work towards bazelbuild/rules_java#64
Split off from bazelbuild#18262

Closes bazelbuild#18841.

PiperOrigin-RevId: 545756139
Change-Id: Ib9dd7a1c20c32315375722d6a023a89859daea9c
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Aug 9, 2023
The `bootclasspath` rule in `rules_java` will soon use this toolchain type instead of the regular Java runtime toolchain type, which naturally carries a constraint on the target platform.

Also adds more detailed explanations of the now three Java toolchain types.

Work towards bazelbuild#17085
Work towards bazelbuild#18265
Work towards bazelbuild/rules_java#64
Split off from bazelbuild#18262

Closes bazelbuild#18841.

PiperOrigin-RevId: 545756139
Change-Id: Ib9dd7a1c20c32315375722d6a023a89859daea9c
fmeum added a commit to fmeum/bazel that referenced this issue Aug 10, 2023
The `bootclasspath` rule in `rules_java` will soon use this toolchain type instead of the regular Java runtime toolchain type, which naturally carries a constraint on the target platform.

Also adds more detailed explanations of the now three Java toolchain types.

Work towards bazelbuild#17085
Work towards bazelbuild#18265
Work towards bazelbuild/rules_java#64
Split off from bazelbuild#18262

Closes bazelbuild#18841.

PiperOrigin-RevId: 545756139
Change-Id: Ib9dd7a1c20c32315375722d6a023a89859daea9c
iancha1992 pushed a commit that referenced this issue Aug 14, 2023
The `bootclasspath` rule in `rules_java` will soon use this toolchain
type instead of the regular Java runtime toolchain type, which naturally
carries a constraint on the target platform.

Also adds more detailed explanations of the now three Java toolchain
types.

Work towards #17085 Work
towards #18265 Work towards
bazelbuild/rules_java#64 Split off from
#18262

Closes #18841.

PiperOrigin-RevId: 545756139
Change-Id: Ib9dd7a1c20c32315375722d6a023a89859daea9c

Fixes #19201
@hvadehra hvadehra added next_month Issues that we will review next month. This is currently mainly used by Configurability team and removed untriaged labels Sep 15, 2023
@hvadehra
Copy link
Member

Waiting for bazel 6.4 to try #18262

@hvadehra hvadehra added the P2 We'll consider working on this in future. (Assignee optional) label Sep 28, 2023
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Oct 19, 2023
The Java rules used toolchains of type `@bazel_tools//tools/jdk:runtime_toolchain_type` for two different purposes requiring different exec/target platform constraints, which led to issues when cross-compiling: The runtime added to the runfiles of an executable Java target has to have constraints on the target platform, whereas the runtime used to extract the bootclasspath should not have constraints on the target platform. As a result:
1. `@local_jdk` did not define any target constraints for its runtime, which allowed the runtime to provide the bootclasspath required by Android rules, but could also end up building `java_binary` targets that wouldn't run on the target (see bazelbuild#18265).
2. Remote JDKs defined a target constraint for the their runtimes, which prevented them from being added to the runfiles of targets they couldn't run on, but broke Android compilation due to the failure to resolve a runtime for bootclasspath extraction (see bazelbuild#17085).

This is resolved by adding a third toolchain type, `bootstrap_runtime_toolchain_type`, that is only used by the `bootclasspath` rule (realizes bazelbuild#17085 (comment)). Detailed explanations of the different types and their intended constraints have been added to `@bazel_tools//tools/jdk:BUILD`.

Addressing the cross-compilation errors then required the following changes:
1. Direct dependencies on the Java runtime have been removed from rules that do not need them (e.g. `android_library` and `java_binary` with `create_executable = False`).
3. Implicit dependencies on the Java runtime through the `bootclasspath` rule have been replaced with dependencies on the bootstrap runtime.
4. `{local,remote}_java_repository` now use the user-provided exec constraints of the Java toolchain as the target constraints of the associated runtime.
5. `@local_jdk` uses the auto-detected host constraints as exec constraints for the local Java toolchain.

Fixes bazelbuild#17085
Fixes bazelbuild#18265
Fixes bazelbuild/rules_java#64

RELNOTES[INC]: Java runtime toolchains created via `local_java_repository` from `@bazel_tools//tools/jdk:local_java_repository.bzl`, which includes `local_jdk`, now have `target_compatible_with` set to the auto-detected host constraints. This can result in errors about toolchain resolution failures for `@bazel_tools//tools/jdk:runtime_toolchain_type`, especially when cross-compiling. These failures can be fixed in the following ways (listed in decreasing order of preference):
* Replace `java_binary` targets that aren't meant to be run with `bazel run` or as tools during the build with `java_single_jar` (available in `@rules_java//java:java_single_jar.bzl`). Such targets do not require a Java runtime for the target configuration.
* Set `--java_runtime_version=remotejdk_N` for some Java version `N` to let Bazel choose and download an appropriate remote JDK for the current target platform. This setting defaults to `local_jdk`, which means that Bazel can only use the local JDK, which isn't compatible with any other platform.
* Manually define and register a `local_java_runtime` with no value set for `exec_compatible_with` (defaults to `[]`) and select it by setting `--java_runtime_version` to its `name`. This fully restores the previous behavior, but can result in incorrect results when cross-compiling (see bazelbuild#18265).

Closes bazelbuild#18262.

PiperOrigin-RevId: 574914446
Change-Id: I6cbfb7ffa2fbfd62e5f6fb49532b36be658dfa40
iancha1992 added a commit that referenced this issue Oct 19, 2023
…enarios (#19900)

The Java rules used toolchains of type
`@bazel_tools//tools/jdk:runtime_toolchain_type` for two different
purposes requiring different exec/target platform constraints, which led
to issues when cross-compiling: The runtime added to the runfiles of an
executable Java target has to have constraints on the target platform,
whereas the runtime used to extract the bootclasspath should not have
constraints on the target platform. As a result:
1. `@local_jdk` did not define any target constraints for its runtime,
which allowed the runtime to provide the bootclasspath required by
Android rules, but could also end up building `java_binary` targets that
wouldn't run on the target (see #18265).
2. Remote JDKs defined a target constraint for the their runtimes, which
prevented them from being added to the runfiles of targets they couldn't
run on, but broke Android compilation due to the failure to resolve a
runtime for bootclasspath extraction (see #17085).

This is resolved by adding a third toolchain type,
`bootstrap_runtime_toolchain_type`, that is only used by the
`bootclasspath` rule (realizes
#17085 (comment)).
Detailed explanations of the different types and their intended
constraints have been added to `@bazel_tools//tools/jdk:BUILD`.

Addressing the cross-compilation errors then required the following
changes:
1. Direct dependencies on the Java runtime have been removed from rules
that do not need them (e.g. `android_library` and `java_binary` with
`create_executable = False`).
3. Implicit dependencies on the Java runtime through the `bootclasspath`
rule have been replaced with dependencies on the bootstrap runtime.
4. `{local,remote}_java_repository` now use the user-provided exec
constraints of the Java toolchain as the target constraints of the
associated runtime.
5. `@local_jdk` uses the auto-detected host constraints as exec
constraints for the local Java toolchain.

Fixes #17085
Fixes #18265
Fixes bazelbuild/rules_java#64

RELNOTES[INC]: Java runtime toolchains created via
`local_java_repository` from
`@bazel_tools//tools/jdk:local_java_repository.bzl`, which includes
`local_jdk`, now have `target_compatible_with` set to the auto-detected
host constraints. This can result in errors about toolchain resolution
failures for `@bazel_tools//tools/jdk:runtime_toolchain_type`,
especially when cross-compiling. These failures can be fixed in the
following ways (listed in decreasing order of preference):
* Replace `java_binary` targets that aren't meant to be run with `bazel
run` or as tools during the build with `java_single_jar` (available in
`@rules_java//java:java_single_jar.bzl`). Such targets do not require a
Java runtime for the target configuration.
* Set `--java_runtime_version=remotejdk_N` for some Java version `N` to
let Bazel choose and download an appropriate remote JDK for the current
target platform. This setting defaults to `local_jdk`, which means that
Bazel can only use the local JDK, which isn't compatible with any other
platform.
* Manually define and register a `local_java_runtime` with no value set
for `exec_compatible_with` (defaults to `[]`) and select it by setting
`--java_runtime_version` to its `name`. This fully restores the previous
behavior, but can result in incorrect results when cross-compiling (see
#18265).

Closes #18262.

Commit
f79ca02

PiperOrigin-RevId: 574914446
Change-Id: I6cbfb7ffa2fbfd62e5f6fb49532b36be658dfa40

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.0.0 RC2. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next_month Issues that we will review next month. This is currently mainly used by Configurability team P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: bug
Projects
None yet
7 participants
@jlaxson @cushon @hvadehra @fmeum @sgowroji @iancha1992 and others