Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraalVM 24.0+ breaks the native-image driver (in native mode) #556

Closed
jerboaa opened this issue Aug 28, 2023 · 9 comments
Closed

GraalVM 24.0+ breaks the native-image driver (in native mode) #556

jerboaa opened this issue Aug 28, 2023 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@jerboaa
Copy link
Collaborator

jerboaa commented Aug 28, 2023

Description

After the move of GraalVM master (24.0) to use the JDK's metrics classes for container support, there is a cyclic dependency issue that needs to be solved for Mandrel.

  1. PhysicalMemory class newly depends on the JDK's metrics classes, which use NIO in unpatched OpenJDK.
  2. NIO, specifically Target_jdk_internal_misc_VM.java internally uses Runtime.getRuntime().maxMemory() for the initial direct memory size (overriden with -R:MaxDirectMemorySize), which is the heap size.
  3. The implementation of Runtime.getRuntime().maxMemory() in HeapImpl.java uses PhysicalMemory.size().

Therefore, there is an initialization cycle resulting in a stack overflow. At runtime, it looks like this:

  i SP 0x00007ffdf838e070 IP 0x00000000008bf8b9 size=16 java.nio.file.Files.readAllLines(Files.java:3433)
  i SP 0x00007ffdf838e070 IP 0x00000000008bf8b9 size=16 jdk.internal.platform.CgroupUtil.lambda$readAllLinesPrivileged$2(CgroupUtil.java:83)
  A SP 0x00007ffdf838e070 IP 0x00000000008bf8b9 size=16 jdk.internal.platform.CgroupUtil$$Lambda$a0a20a05f23dafb9435259313f4041cb74d30494.run(Unknown Source)
  A SP 0x00007ffdf838e080 IP 0x000000000068283e size=32 java.security.AccessController.executePrivileged(AccessController.java:114)
  A SP 0x00007ffdf838e0a0 IP 0x00000000006822b9 size=32 java.security.AccessController.doPrivileged(AccessController.java:571)
  A SP 0x00007ffdf838e0c0 IP 0x00000000008bfb19 size=48 jdk.internal.platform.CgroupUtil.readAllLinesPrivileged(CgroupUtil.java:84)
  A SP 0x00007ffdf838e0f0 IP 0x00000000008bc715 size=176 jdk.internal.platform.CgroupSubsystemFactory.determineType(CgroupSubsystemFactory.java:143)
  A SP 0x00007ffdf838e1a0 IP 0x00000000008bbe2d size=16 jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:85)
  i SP 0x00007ffdf838e1b0 IP 0x0000000000414cb3 size=16 jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:193)
  i SP 0x00007ffdf838e1b0 IP 0x0000000000414cb3 size=16 jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29)
  i SP 0x00007ffdf838e1b0 IP 0x0000000000414cb3 size=16 jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58)
  i SP 0x00007ffdf838e1b0 IP 0x0000000000414cb3 size=16 jdk.internal.platform.Container.metrics(Container.java:43)
  A SP 0x00007ffdf838e1b0 IP 0x0000000000414cb3 size=16 com.oracle.svm.core.Containers.memoryLimitInBytes(Containers.java:121)
  A SP 0x00007ffdf838e1c0 IP 0x0000000000475885 size=16 com.oracle.svm.core.heap.PhysicalMemory.size(PhysicalMemory.java:92)
  A SP 0x00007ffdf838e1d0 IP 0x0000000000565153 size=32 java.lang.Runtime.maxMemory(Runtime.java:932)
  A SP 0x00007ffdf838e1f0 IP 0x00000000004792d3 size=16 com.oracle.svm.core.jdk.DirectMemoryAccessors.initialize(Target_jdk_internal_misc_VM.java:112)
  i SP 0x00007ffdf838e200 IP 0x0000000000671b1a size=80 com.oracle.svm.core.jdk.DirectMemoryAccessors.getPageAlignDirectMemory(Target_jdk_internal_misc_VM.java:94)
  i SP 0x00007ffdf838e200 IP 0x0000000000671b1a size=80 jdk.internal.misc.VM.isDirectMemoryPageAligned(VM.java:156)
  A SP 0x00007ffdf838e200 IP 0x0000000000671b1a size=80 java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:124)
  i SP 0x00007ffdf838e250 IP 0x0000000000903265 size=48 java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
  A SP 0x00007ffdf838e250 IP 0x0000000000903265 size=48 sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:242)
  A SP 0x00007ffdf838e280 IP 0x00000000008ff35e size=80 sun.nio.ch.IOUtil.read(IOUtil.java:303)
  i SP 0x00007ffdf838e2d0 IP 0x00000000008fda4d size=96 sun.nio.ch.IOUtil.read(IOUtil.java:283)
  A SP 0x00007ffdf838e2d0 IP 0x00000000008fda4d size=96 sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:234)
  i SP 0x00007ffdf838e330 IP 0x00000000008f8946 size=64 sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:74)
  A SP 0x00007ffdf838e330 IP 0x00000000008f8946 size=64 sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
  A SP 0x00007ffdf838e370 IP 0x000000000090601f size=64 sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:329)
  A SP 0x00007ffdf838e3b0 IP 0x00000000009052c3 size=48 sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:372)
  A SP 0x00007ffdf838e3e0 IP 0x0000000000905640 size=64 sun.nio.cs.StreamDecoder.lockedRead(StreamDecoder.java:215)
  A SP 0x00007ffdf838e420 IP 0x0000000000905ba9 size=64 sun.nio.cs.StreamDecoder.read(StreamDecoder.java:169)
  A SP 0x00007ffdf838e460 IP 0x00000000005281f5 size=16 java.io.InputStreamReader.read(InputStreamReader.java:188)
  A SP 0x00007ffdf838e470 IP 0x000000000051d331 size=48 java.io.BufferedReader.fill(BufferedReader.java:160)
  A SP 0x00007ffdf838e4a0 IP 0x000000000051d648 size=80 java.io.BufferedReader.implReadLine(BufferedReader.java:370) 

We need to find a way to set the initial direct memory size (without using NIO) and/or without creating this cycle. This isn't a problem in GraalVM Community since it uses labsjdk as a base, which has a patch avoiding this issue. Thus, it's a Mandrel only issue.

The issue can be reproduced when trying to generate the native-image generator as a native image itself on a recent mandrel build and then trying to run --version on the result:

$ native-image --macro:native-image-launcher
$ native-image --version
@jerboaa jerboaa added the bug Something isn't working label Aug 28, 2023
@jerboaa jerboaa added this to the 23.1.0.0-Final milestone Aug 28, 2023
@jerboaa
Copy link
Collaborator Author

jerboaa commented Aug 28, 2023

@jerboaa jerboaa changed the title GraalVM master (23.1) fails the native-image generator launcher generation GraalVM master (23.1) breaks the native-image diver (in native mode) Aug 29, 2023
@jerboaa jerboaa changed the title GraalVM master (23.1) breaks the native-image diver (in native mode) GraalVM master (23.1) breaks the native-image driver (in native mode) Aug 29, 2023
@jerboaa
Copy link
Collaborator Author

jerboaa commented Aug 29, 2023

Note that the macro builds with -H:-ParseRuntimeOptions. Thus runtime options don't seem to work to fix the stack overflow.

@jerboaa jerboaa self-assigned this Aug 31, 2023
@jerboaa
Copy link
Collaborator Author

jerboaa commented Sep 1, 2023

Draft fix for this. Needs more testing.

jerboaa/graal@no-jlinking-fixes-7085...jerboaa:graal:mandrel-cgroups-bb-dm

@jerboaa jerboaa changed the title GraalVM master (23.1) breaks the native-image driver (in native mode) GraalVM master (24.0) breaks the native-image driver (in native mode) Sep 4, 2023
@jerboaa
Copy link
Collaborator Author

jerboaa commented Sep 4, 2023

It looks like the cgroup change is a 24.0 only feature for now. It doesn't seem to affect 23.1 (current release branch as of today at fd442af), so I'm adjusting the milestone.

@jerboaa jerboaa modified the milestones: 23.1.0.0-Final, 24.0.0.0-Final Sep 4, 2023
@jerboaa
Copy link
Collaborator Author

jerboaa commented Sep 4, 2023

Draft fix for this. Needs more testing.

jerboaa/graal@no-jlinking-fixes-7085...jerboaa:graal:mandrel-cgroups-bb-dm

Test run with this patch including quarkus native tests:
https://github.com/graalvm/mandrel/actions/runs/6074785555

@jerboaa jerboaa changed the title GraalVM master (24.0) breaks the native-image driver (in native mode) GraalVM 23.1+ breaks the native-image driver (in native mode) Sep 13, 2023
@jerboaa
Copy link
Collaborator Author

jerboaa commented Sep 13, 2023

It looks like the cgroup change is a 24.0 only feature for now. It doesn't seem to affect 23.1 (current release branch as of today at fd442af), so I'm adjusting the milestone.

It's in the 23.1 release tree now as well: oracle@efbbbe7

jerboaa added a commit that referenced this issue Sep 14, 2023
First, we use a separate accessor for page-alignedness as it doesn't
need the more sophisticated initialization of the directMemory field.

Next, ensure PhysicalMemory initialization is serialized and when it is,
set directMemory to a static value so that the container code can finish
initialization without introducing a cyle. The final directMemory value
based on the heap size is then published to JDK code by setting the VM
init level to 1. Therefore, application code would use the non-static
value as the upper bound.

Closes: #556
@zakkak
Copy link
Collaborator

zakkak commented Sep 14, 2023

Fixed for 23.1 in #569, still an issue with 24.0-dev

@zakkak zakkak modified the milestones: 23.1.0.0-Final, 24.0.0.0-Final Sep 14, 2023
@zakkak zakkak changed the title GraalVM 23.1+ breaks the native-image driver (in native mode) GraalVM 24.0+ breaks the native-image driver (in native mode) Sep 15, 2023
@jerboaa
Copy link
Collaborator Author

jerboaa commented Sep 22, 2023

oracle#7478 is the upstream PR for this.

github-actions bot pushed a commit that referenced this issue Oct 8, 2023
First, we use a separate accessor for page-alignedness as it doesn't
need the more sophisticated initialization of the directMemory field.

Next, ensure PhysicalMemory initialization is serialized and when it is,
set directMemory to a static value so that the container code can finish
initialization without introducing a cyle. The final directMemory value
based on the heap size is then published to JDK code by setting the VM
init level to 1. Therefore, application code would use the non-static
value as the upper bound.

Closes: #556
@jerboaa
Copy link
Collaborator Author

jerboaa commented Oct 9, 2023

oracle#7553 is the fix for master (24.0+) which got merged recently. Closing.

@jerboaa jerboaa closed this as completed Oct 9, 2023
zakkak pushed a commit that referenced this issue Dec 13, 2023
First, we use a separate accessor for page-alignedness as it doesn't
need the more sophisticated initialization of the directMemory field.

Next, ensure PhysicalMemory initialization is serialized and when it is,
set directMemory to a static value so that the container code can finish
initialization without introducing a cyle. The final directMemory value
based on the heap size is then published to JDK code by setting the VM
init level to 1. Therefore, application code would use the non-static
value as the upper bound.

Closes: #556
(cherry picked from commit 1ae2dc0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants