Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Security.Cryptography.Xml.Tests fail on mono linux ARM64 #50300

Closed
ericstj opened this issue Mar 26, 2021 · 7 comments
Closed

System.Security.Cryptography.Xml.Tests fail on mono linux ARM64 #50300

ericstj opened this issue Mar 26, 2021 · 7 comments
Labels
area-System.Security needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@ericstj
Copy link
Member

ericstj commented Mar 26, 2021

Tests are crashing with segfault while initializing OpenSSL.

https://dev.azure.com/dnceng/public/_build/results?buildId=1058396&view=ms.vss-test-web.build-test-results-tab&runId=32633134&resultId=180106&paneView=attachments

core.1001.99:
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-50288-merge-db294f34f17e4673ad/System.Security.Cryptography.Xml.Tests/core.1001.99?sv=2019-07-07&se=2021-04-15T18%3A31%3A06Z&sr=c&sp=rl&sig=y51iybChEIg%2B9i1C8IWdd8szXZPbfdQZ9pqMpk3XUz8%3D

console.3a06eda1.log:
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-50288-merge-db294f34f17e4673ad/System.Security.Cryptography.Xml.Tests/console.3a06eda1.log?sv=2019-07-07&se=2021-04-15T18%3A31%3A06Z&sr=c&sp=rl&sig=y51iybChEIg%2B9i1C8IWdd8szXZPbfdQZ9pqMpk3XUz8%3D

Interesting snippets from log.

=================================================================
	Managed Stacktrace:
=================================================================
	  at <unknown> <0xffffffff>
	  at CryptoInitializer:EnsureOpenSslInitialized <0x00007>
	  at CryptoInitializer:.cctor <0x00023>
Thread 10 (Thread 0x7f728a41e0 (LWP 110)):
#0  0x0000007f71c29f6c in bn_mul_add_words () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#1  0x0000007f71c2c0ec in ?? () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#2  0x0000007f5010cdf8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Thread 11 (Thread 0x7f726a41e0 (LWP 111)):
#0  0x0000007f7cd94a38 in __waitpid (pid=116, stat_loc=0x7f7269ca44, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x0000007f7c842fcc in dump_native_stacktrace (signal=<optimized out>, mctx=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:966
#2  mono_dump_native_crash_info (signal=<optimized out>, mctx=<optimized out>, info=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:1010
#3  0x0000007f7c801e14 in mono_handle_native_crash (signal=0x7f7c8b85c8 "SIGSEGV", mctx=0x7f7269d5d0, info=0x7f7269d930) at /__w/1/s/src/mono/mono/mini/mini-exceptions.c:3395
#4  0x0000007f7c772764 in mono_sigsegv_signal_handler_debug (_dummy=11, _info=0x7f7269d930, context=0x7f7269d9b0, debug_fault_addr=0x3d4) at /__w/1/s/src/mono/mono/mini/mini-runtime.c:3556
#5  <signal handler called>
#6  _dl_close (_map=0x0) at dl-close.c:809
#7  0x0000007f7cdbc9f8 in _dl_catch_error (objname=0x7f7cd71038 <dlclose_doit>, errstring=0x7f7cd82000, mallocedp=0x7f5402a7e0, operate=0x7f7269ec24, args=0x7f5402a7e8) at dl-error.c:187
#8  0x0000007f7cd71610 in _dlerror_run (operate=operate@entry=0x7f7cd71038 <dlclose_doit>, args=0x0) at dlerror.c:163
#9  0x0000007f7cd7106c in __dlclose (handle=<optimized out>) at dlclose.c:46
#10 0x0000007f71dce2a4 in DlOpen (libraryName=0x7f71dd6dfe "libssl.so.1.1") at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/opensslshim.c:50
#11 0x0000007f71dce178 in OpenLibrary () at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/opensslshim.c:82
#12 0x0000007f71dc8074 in InitializeOpenSSLShim () at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/opensslshim.c:127
#13 0x0000007f71dbc620 in CryptoNative_EnsureOpenSslInitialized () at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/openssl.c:1318
#14 0x0000007f71e4172c in ?? ()
#15 0x8020080280200802 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 13 (Thread 0x7f722a41e0 (LWP 113)):
#0  0x0000007f71c29fb4 in bn_mul_add_words () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#1  0x0000007f71c2c0ec in ?? () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#2  0x0000007f4c0b8390 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Observation from @bartonjs is that threads 10 and 13 should not be inside libcrypto before thread 11 completes initialization.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Security untriaged New issue has not been triaged by the area owner labels Mar 26, 2021
@ghost
Copy link

ghost commented Mar 26, 2021

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq, @GrabYourPitchforks
See info in area-owners.md if you want to be subscribed.

Issue Details

Tests are crashing with segfault while initializing OpenSSL.

core.1001.99:
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-50288-merge-db294f34f17e4673ad/System.Security.Cryptography.Xml.Tests/core.1001.99?sv=2019-07-07&se=2021-04-15T18%3A31%3A06Z&sr=c&sp=rl&sig=y51iybChEIg%2B9i1C8IWdd8szXZPbfdQZ9pqMpk3XUz8%3D

console.3a06eda1.log:
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-50288-merge-db294f34f17e4673ad/System.Security.Cryptography.Xml.Tests/console.3a06eda1.log?sv=2019-07-07&se=2021-04-15T18%3A31%3A06Z&sr=c&sp=rl&sig=y51iybChEIg%2B9i1C8IWdd8szXZPbfdQZ9pqMpk3XUz8%3D

Interesting snippets from log.

=================================================================
	Managed Stacktrace:
=================================================================
	  at <unknown> <0xffffffff>
	  at CryptoInitializer:EnsureOpenSslInitialized <0x00007>
	  at CryptoInitializer:.cctor <0x00023>
Thread 10 (Thread 0x7f728a41e0 (LWP 110)):
#0  0x0000007f71c29f6c in bn_mul_add_words () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#1  0x0000007f71c2c0ec in ?? () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#2  0x0000007f5010cdf8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Thread 11 (Thread 0x7f726a41e0 (LWP 111)):
#0  0x0000007f7cd94a38 in __waitpid (pid=116, stat_loc=0x7f7269ca44, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x0000007f7c842fcc in dump_native_stacktrace (signal=<optimized out>, mctx=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:966
#2  mono_dump_native_crash_info (signal=<optimized out>, mctx=<optimized out>, info=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:1010
#3  0x0000007f7c801e14 in mono_handle_native_crash (signal=0x7f7c8b85c8 "SIGSEGV", mctx=0x7f7269d5d0, info=0x7f7269d930) at /__w/1/s/src/mono/mono/mini/mini-exceptions.c:3395
#4  0x0000007f7c772764 in mono_sigsegv_signal_handler_debug (_dummy=11, _info=0x7f7269d930, context=0x7f7269d9b0, debug_fault_addr=0x3d4) at /__w/1/s/src/mono/mono/mini/mini-runtime.c:3556
#5  <signal handler called>
#6  _dl_close (_map=0x0) at dl-close.c:809
#7  0x0000007f7cdbc9f8 in _dl_catch_error (objname=0x7f7cd71038 <dlclose_doit>, errstring=0x7f7cd82000, mallocedp=0x7f5402a7e0, operate=0x7f7269ec24, args=0x7f5402a7e8) at dl-error.c:187
#8  0x0000007f7cd71610 in _dlerror_run (operate=operate@entry=0x7f7cd71038 <dlclose_doit>, args=0x0) at dlerror.c:163
#9  0x0000007f7cd7106c in __dlclose (handle=<optimized out>) at dlclose.c:46
#10 0x0000007f71dce2a4 in DlOpen (libraryName=0x7f71dd6dfe "libssl.so.1.1") at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/opensslshim.c:50
#11 0x0000007f71dce178 in OpenLibrary () at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/opensslshim.c:82
#12 0x0000007f71dc8074 in InitializeOpenSSLShim () at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/opensslshim.c:127
#13 0x0000007f71dbc620 in CryptoNative_EnsureOpenSslInitialized () at /__w/1/s/src/libraries/Native/Unix/System.Security.Cryptography.Native/openssl.c:1318
#14 0x0000007f71e4172c in ?? ()
#15 0x8020080280200802 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 13 (Thread 0x7f722a41e0 (LWP 113)):
#0  0x0000007f71c29fb4 in bn_mul_add_words () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#1  0x0000007f71c2c0ec in ?? () from /lib/aarch64-linux-gnu/libcrypto.so.1.0.0
#2  0x0000007f4c0b8390 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Observation from @bartonjs is that threads 10 and 13 should not be inside libcrypto before thread 11 completes initialization.

Author: ericstj
Assignees: -
Labels:

area-System.Security, untriaged

Milestone: -

@bartonjs
Copy link
Member

It's also weird that the already-running OpenSSL is libcrypto.so.1.0.0... I'd expect Ubuntu 18.04 to have libcrypto.so.1.1 (with maybe also having 1.0.0 as a fallback).

I don't see any obvious linking of the mono CLR to OpenSSL, so I don't know how it would have gotten into the process already to be running things.

The failing thread is in CryptoNative_EnsureOpenSslInitialized, called by the CryptoInitializer cctor, so that should be gating down to single threaded access (though I suppose it could end up being called concurrently from up to 4 different libraries' version of the cctor. (S.S.C.Algorithms, S.S.C.OpenSsl, S.S.C.X509Certificates, System.Net.Security all end up having one, IIRC) -- not sure what about that would cause dlopen to segfault, though.

@stephentoub
Copy link
Member

Looks like the same issue I just hit in another PR:
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-55499-merge-be24a0ec13b44c1f85/System.Security.Cryptography.OpenSsl.Tests/console.c9c21603.log?sv=2019-07-07&se=2021-08-01T11%3A13%3A45Z&sr=c&sp=rl&sig=j%2FC9Z%2FKgea3jtDad34ERDD14Sj8ahWBvUpOOIO%2BcV%2BE%3D

Console log: 'System.Security.Cryptography.OpenSsl.Tests' from job be24a0ec-13b4-4c1f-850c-f1fb9e66679d workitem 4a3219a3-8cca-4175-afb0-7f0b3e9ac445 (osx.1014.amd64.open) executed on machine dci-mac-build-004.local
+ ./RunTests.sh --runtime-path /tmp/helix/working/B80509D6/p
----- start Mon Jul 12 04:14:36 PDT 2021 =============== To repro directly: =====================================================
pushd .
/tmp/helix/working/B80509D6/p/dotnet exec --runtimeconfig System.Security.Cryptography.OpenSsl.Tests.runtimeconfig.json --depsfile System.Security.Cryptography.OpenSsl.Tests.deps.json xunit.console.dll System.Security.Cryptography.OpenSsl.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/private/tmp/helix/working/B80509D6/w/ADC80997/e /private/tmp/helix/working/B80509D6/w/ADC80997/e
  Discovering: System.Security.Cryptography.OpenSsl.Tests (method display = ClassAndMethod, method display options = None)
No usable version of libssl was found

=================================================================
	Native Crash Reporting
=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

=================================================================
	Native stacktrace:
=================================================================
	0x10e6cc186 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_dump_native_crash_info
	0x10e66b7de - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_handle_native_crash
	0x10e6cba82 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : sigabrt_signal_handler
	0x7fff74aabb5d - /usr/lib/system/libsystem_platform.dylib : _sigtramp
	0x0 - Unknown
	0x7fff749656a6 - /usr/lib/system/libsystem_c.dylib : abort
	0x11299444b - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0//libSystem.Security.Cryptography.Native.OpenSsl.dylib : InitializeOpenSSLShim
	0x1129886dd - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0//libSystem.Security.Cryptography.Native.OpenSsl.dylib : EnsureOpenSslInitializedCore
	0x1129886b9 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0//libSystem.Security.Cryptography.Native.OpenSsl.dylib : EnsureOpenSslInitializedOnce
	0x7fff74ab2ce3 - /usr/lib/system/libsystem_pthread.dylib : __pthread_once_handler
	0x7fff74aa8aab - /usr/lib/system/libsystem_platform.dylib : _os_once_callout
	0x7fff74ab2c7f - /usr/lib/system/libsystem_pthread.dylib : pthread_once
	0x11298869b - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0//libSystem.Security.Cryptography.Native.OpenSsl.dylib : CryptoNative_EnsureOpenSslInitialized
	0x11292e770 - Unknown
	0x10e5c4c30 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_jit_runtime_invoke
	0x10e4dbadf - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_runtime_try_invoke
	0x10e4da65e - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_runtime_class_init_full
	0x10e5bc5fa - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_jit_compile_method_inner
	0x10e5c0a87 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : jit_compile_method_with_opt
	0x10e5bfefa - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_jit_compile_method
	0x10e66e385 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : common_call_trampoline
	0x10e66de50 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_magic_trampoline
	0x10e880396 - Unknown
	0x11292e613 - Unknown
	0x10e5c4c30 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_jit_runtime_invoke
	0x10e4dbadf - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_runtime_try_invoke
	0x10e4da65e - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_runtime_class_init_full
	0x10e5bc5fa - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_jit_compile_method_inner
	0x10e5c0a87 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : jit_compile_method_with_opt
	0x10e5c0520 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : jit_compile_method_with_opt
	0x10e5bfefa - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_jit_compile_method
	0x10e66e385 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : common_call_trampoline
	0x10e66de50 - /private/tmp/helix/working/B80509D6/p/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.dylib : mono_magic_trampoline
	0x10e880396 - Unknown
	0x11292e083 - Unknown
	0x112917656 - Unknown
	0x1129140e3 - Unknown
	0x112913ee3 - Unknown
	0x11291367d - Unknown
	0x11290c054 - Unknown
	0x11290bd5f - Unknown
	0x11290844b - Unknown
	0x1129079e7 - Unknown
	0x112907946 - Unknown
	0x112906c73 - Unknown
	0x11290635b - Unknown

@bartonjs
Copy link
Member

@stephentoub Your callstack looks a bit different. We hit the abort() call in InitializeOpenSSLShim, meaning we couldn't find an appropriate libssl. The previous ones were SIGSEGV via a dlopen call.

@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jul 13, 2021
@jeffhandley jeffhandley added this to the 6.0.0 milestone Jul 13, 2021
@bartonjs
Copy link
Member

I believe ("hope"?) that the DlOpen-related errors should stop happening as of #55370. We only ever go through library loading and initialization once each now.

If I'm wrong, then at the very least, we'll stop getting red herring callstacks, and maybe we can make more sense of them.

@bartonjs bartonjs removed their assignment Jul 26, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Security needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

5 participants