Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elytron Security, Main, etc. native integration tests fail with "guarantee failed" with latest GraalVM master (23.1) #35508

Closed
jerboaa opened this issue Aug 23, 2023 · 18 comments
Labels

Comments

@jerboaa
Copy link
Contributor

jerboaa commented Aug 23, 2023

Describe the bug

We see a couple of native IT tests fail with latest GraalVM master, which is GraalVM for JDK 21 (or internal version 23.1). Seems to happen with Elytron Security, Main, Micrometer Prometheus, Spring Web and possibly others. Failure looks like this:

Caused by: com.oracle.svm.core.util.VMError$HostedError: guarantee failed
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.util.VMError.shouldNotReachHere(VMError.java:78)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.util.VMError.guarantee(VMError.java:115)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.methodhandles.MethodHandleInvokerRenamingSubstitutionProcessor.lambda$getSubstitution$0(MethodHandleInvokerRenamingSubstitutionProcessor.java:94)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.methodhandles.MethodHandleInvokerRenamingSubstitutionProcessor.getSubstitution(MethodHandleInvokerRenamingSubstitutionProcessor.java:90)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.methodhandles.MethodHandleInvokerRenamingSubstitutionProcessor.lookup(MethodHandleInvokerRenamingSubstitutionProcessor.java:74)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.infrastructure.SubstitutionProcessor$ChainedSubstitutionProcessor.lookup(SubstitutionProcessor.java:125)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.infrastructure.SubstitutionProcessor$ChainedSubstitutionProcessor.lookup(SubstitutionProcessor.java:125)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.infrastructure.SubstitutionProcessor$ChainedSubstitutionProcessor.lookup(SubstitutionProcessor.java:125)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.infrastructure.SubstitutionProcessor$ChainedSubstitutionProcessor.lookup(SubstitutionProcessor.java:125)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.infrastructure.SubstitutionProcessor$ChainedSubstitutionProcessor.lookup(SubstitutionProcessor.java:125)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.infrastructure.SubstitutionProcessor$ChainedSubstitutionProcessor.lookup(SubstitutionProcessor.java:125)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.infrastructure.SubstitutionProcessor$ChainedSubstitutionProcessor.lookup(SubstitutionProcessor.java:125)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.AnalysisUniverse.lookupAllowUnresolved(AnalysisUniverse.java:217)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.AnalysisUniverse.lookup(AnalysisUniverse.java:197)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.AnalysisMethod.<init>(AnalysisMethod.java:182)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.PointsToAnalysisMethod.<init>(PointsToAnalysisMethod.java:70)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.PointsToAnalysisFactory.createMethod(PointsToAnalysisFactory.java:35)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.AnalysisUniverse.createMethod(AnalysisUniverse.java:453)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.AnalysisUniverse.lookupAllowUnresolved(AnalysisUniverse.java:441)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.meta.AnalysisUniverse.lookup(AnalysisUniverse.java:417)
	at org.graalvm.nativeimage.builder/com.oracle.svm.hosted.ameta.AnalysisMethodHandleAccessProvider.resolveInvokeBasicTarget(AnalysisMethodHandleAccessProvider.java:70)
	at jdk.internal.vm.compiler/org.graalvm.compiler.replacements.nodes.MethodHandleNode.getInvokeBasicTarget(MethodHandleNode.java:214)
	at jdk.internal.vm.compiler/org.graalvm.compiler.replacements.nodes.MethodHandleNode.tryResolveTargetInvoke(MethodHandleNode.java:106)
	at jdk.internal.vm.compiler/org.graalvm.compiler.replacements.nodes.MethodHandleWithExceptionNode.trySimplify(MethodHandleWithExceptionNode.java:74)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.phases.InlineBeforeAnalysisGraphDecoder.handleMethodHandle(InlineBeforeAnalysisGraphDecoder.java:350)
	at jdk.internal.vm.compiler/org.graalvm.compiler.nodes.GraphDecoder.processNextNode(GraphDecoder.java:924)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.phases.InlineBeforeAnalysisGraphDecoder.processNextNode(InlineBeforeAnalysisGraphDecoder.java:344)
	at jdk.internal.vm.compiler/org.graalvm.compiler.nodes.GraphDecoder.decode(GraphDecoder.java:650)
	at jdk.internal.vm.compiler/org.graalvm.compiler.replacements.PEGraphDecoder.decode(PEGraphDecoder.java:892)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.phases.InlineBeforeAnalysis.decodeGraph(InlineBeforeAnalysis.java:76)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.parse(MethodTypeFlowBuilder.java:195)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.apply(MethodTypeFlowBuilder.java:621)
	at org.graalvm.nativeimage.pointsto/com.oracle.graal.pointsto.flow.MethodTypeFlow.createFlowsGraph(MethodTypeFlow.java:167)
	... 23 more

See for example these CI runs:
https://github.com/graalvm/mandrel/actions/runs/5945859580/job/16126064976#step:12:759
https://github.com/graalvm/mandrel/actions/runs/5945859580/job/16126063503#step:12:385
https://github.com/graalvm/mandrel/actions/runs/5945859580/job/16126066006#step:12:848

It's not yet clear if this is an upstream GraalVM issue or not, but I'm filing this here for now so as to be able to track these failures.

@jerboaa jerboaa added the kind/bug Something isn't working label Aug 23, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Aug 23, 2023

/cc @Karm (mandrel), @galderz (mandrel), @sberyozkin (security), @zakkak (mandrel)

@jerboaa
Copy link
Contributor Author

jerboaa commented Aug 23, 2023

Seems to be caused by oracle/graal#7196

@sberyozkin
Copy link
Member

sberyozkin commented Aug 23, 2023

Thanks for doing the investigation, @jerboaa

@peter-hofer
Copy link

As a workaround, use option UseOldMethodHandleIntrinsics. I will investigate this shortly.

@jerboaa
Copy link
Contributor Author

jerboaa commented Aug 24, 2023

Thanks, @peter-hofer. I'll see if I can get that added in a temp CI build.

@peter-hofer
Copy link

On second thought UseOldMethodHandleIntrinsics still leaves MethodHandleInvokerRenamingSubstitutionProcessor enabled, so you might run into the same problem.

@peter-hofer
Copy link

It looks like the classData of some direct method handle invokers that we're looking at consists of more than just a single LambdaForm. Judging by the code in InvokerBytecodeGenerator it's probably a List.of several of them, but potentially also other objects which have no usable hash code, which would defeat the purpose of the renaming. We haven't encountered this before, so if you could find out what the failing classData object is by changing the guarantee to something like VMError.guarantee(LAMBDA_FORM_CLASS.isInstance(classData), String.valueOf(classData));, that would be very helpful.

Also, it might be better if we disable MethodHandleInvokerRenamingSubstitutionProcessor when UseOldMethodHandleIntrinsics is active even after we fix this.

@jerboaa
Copy link
Contributor Author

jerboaa commented Aug 28, 2023

On second thought UseOldMethodHandleIntrinsics still leaves MethodHandleInvokerRenamingSubstitutionProcessor enabled, so you might run into the same problem.

A bit of good news: I've finally had some time to reproduce locally and I can report, adding -H:+UseOldMethodHandleIntrinsics, fixes the native image build. Working on the the details of the guarantee failure next...

@peter-hofer
Copy link

I can report, adding -H:+UseOldMethodHandleIntrinsics, fixes the native image build

Thanks @jerboaa , this likely means that the old intrinsification actually hides these classes from the rest of the image build so that renaming never encounters them and therefore also doesn't fail the guarantee.

@jerboaa
Copy link
Contributor Author

jerboaa commented Aug 28, 2023

Also this code seems to trigger the guarantee failure (specifically instantiating WildFlyElytronPasswordProvider):

    public void registerPasswordProvider() {
        //we don't remove this, as there is no correct place where it can be removed
        //as continuous testing can be running alongside the dev mode app, but there is
        //only ever one provider
        WildFlyElytronPasswordProvider provider = new WildFlyElytronPasswordProvider();
        if (Security.getProvider(provider.getName()) == null) {
            Security.addProvider(provider);
        }
    }

@sberyozkin Do you know where the source for this provider lives?

@jerboaa
Copy link
Contributor Author

jerboaa commented Aug 28, 2023

Another data point. I needed labsjdk-ce-21-jvmci-23.1-b14 (21+35) to reproduce. labsjdk-ce-21-jvmci-23.1-b12 (21+34) didn't reproduce the issue, weirdly enough.

Update: Seems more likely caused by the issue not being reproducible 100% when I tried both versions (only once).

@peter-hofer
Copy link

@jerboaa it looks like this class is what triggers the problem. There's two identical lambdas in it, this::putService for a target type of Consumer<Service>.
https://github.com/wildfly-security/wildfly-elytron/blob/master/base/src/main/java/org/wildfly/security/WildFlyElytronBaseProvider.java

@jerboaa
Copy link
Contributor Author

jerboaa commented Aug 28, 2023

@peter-hofer Glad you found the class and the potential issue. Would you be able to deduce a minimal repro from that?

Another observation on my end is that it doesn't seem to reliably reproduce 100%. It's more like 80-90% of the runs I've tried. Therefore, I'm no longer sure if -H:+UseOldMethodHandleIntrinsics has the effect I said earlier, nor if labs JDK build version matters.

Either way, a build with more verbose printing if the guarantee fails produced this:

Caused by: com.oracle.svm.core.util.VMError$HostedError: [DMH.invokeVirtual=Lambda(a0:L,a1:L,a2:L)=>{
    t3:L=DirectMethodHandle.internalMemberName(a0:L);
    t4:V=MethodHandle.linkToVirtual(a1:L,a2:L,t3:L);void}, MethodHandle(WildFlyElytronBaseProvider,Service)void]
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.util.VMError.shouldNotReachHere(VMError.java:78)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.util.VMError.guarantee(VMError.java:122)

@peter-hofer
Copy link

@jerboaa I didn't find anything between b12 and b14 that should affect method handles or lambdas other than a change in StringConcatFactory, but it could be triggered by another change.
I don't think I can derive a minimal reproducer because nothing about it seems particularly unusual. I suspect that there's some automatic customization of the method handle or such which triggers during the build, which would also explain that this doesn't happen consistently. But with the output you posted, I should be able to find out a bit more, thanks.

@jerboaa
Copy link
Contributor Author

jerboaa commented Aug 28, 2023

@peter-hofer Steps to reproduce should be something along the lines of this:

$ git clone https://github.com/quarkusio/quarkus
$ cd quarkus
$ ./mvnw clean install -Dquickly
$ cat run_itest.sh 
#!/bin/bash
GRAALVM=$1
SPECIFIC_TEST=$2

if [ "${GRAALVM}_" == "_" ] || [ "${SPECIFIC_TEST}_" == "_" ]; then
  echo "Usage ./$0 <path-to-graal-home> <itest-path>"
  exit 1
fi
export JAVA_HOME="$GRAALVM"
export GRAALVM_HOME="$JAVA_HOME"
export PATH="$PATH:$JAVA_HOME/bin"
podman system service --time=0 unix:/run/user/$(id -u)/podman.sock &
export DOCKER_HOST="unix:/run/user/$(id -u)/podman.sock"
pushd integration-tests
../mvnw --fae -pl "$SPECIFIC_TEST" -Dquarkus.native.additional-build-args="-Djava.lang.invoke.MethodHandle.DEBUG_NAMES=true" -Dnative -Dtest-containers -Dmaven.test.failure.ignore=true -Dstart-containers -Dquarkus.native.enable-reports -Dnative.surefire.skip -Dformat.skip -Dno-descriptor-tests  -Dlog.level=ALL clean verify
popd
$ bash run_itest.sh /path/to/graalvm-home elytron-security

Sorry, my attempts at coming up with a smaller one wasn't successful.

@peter-hofer
Copy link

Thanks for your help @jerboaa , I have pushed a fix to oracle/graal#7262. It works locally and also improves the stability of generated invokers' names across image builds.

@jerboaa
Copy link
Contributor Author

jerboaa commented Sep 1, 2023

@peter-hofer Excellent. Thank you! We no longer see those failures in our CI:
https://github.com/graalvm/mandrel/actions/runs/6044323984/job/16403307103

Closing as fixed.

@jerboaa jerboaa closed this as completed Sep 1, 2023
@peter-hofer
Copy link

Great news @jerboaa , thanks for reporting back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants