-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
globus-url-copy fails during GSI handshake with dCache when using TLS v1.3 #174
Comments
My laptop is running Debian:
Here is the OpenSSL package:
|
I can reproduce the error using prometheus:
If it helps, I can create accounts for anyone interested in testing further. |
Here's the output with all the debug environment variables cranked up to 99.
|
I only quickly scanned through but I thought the error message looked familiar and indeed, dCache/dcache#5939 from June 2021 looks very much related. But AFAIR we solved that with @ellert's patch in July 2021 back then. And as the first bad commit is from March 2021 - so before the fix happened - I wonder how this could have gotten reintroduced into the GCT. @paulmillar Will now have a closer look to your material. |
My approving comment for #155 is from 2021-07-20, so my testing against "prometheus.desy.de" happened at that date or earlier. |
Just to be sure: Did you compile the tested I think we never tested any static builds of the GCT. Can you please provide the full build command(s) for your static builds? If there are differences between your manual build and the builds done by your script for the bisecting, please provide both then. |
Hi @fscheiner, Indeed, I agree this is "oddly" familiar. On your specific questions: yes, this is Debain (see above for Here is the build script I use: #!/bin/bash
set -e
rm -rf $(dirname $0)/build
mkdir $(dirname $0)/build
cd $(dirname $0)/build
../configure --prefix=/home/paul/local --disable-gsi-openssh
make -j4
make install Just to be absolutely sure, I've built and started dCache v6.2.0 on my laptop (the I've also removed all traces of Debian globus packages on my laptop. I had some installed libraries installed, but was quite careful to check they weren't being used. Nevertheless, now they are all now uninstalled. I've also rebuilt gct (current tip of Using the filesystem timestamp, I've verified that the Here's the output from ldd: paul@sprocket:~$ which globus-url-copy
/home/paul/local/bin/globus-url-copy
paul@sprocket:~$ ldd `which globus-url-copy`
linux-vdso.so.1 (0x00007fff9b0fd000)
libglobus_gass_copy.so.2 => /home/paul/local/lib/libglobus_gass_copy.so.2 (0x00007f150c686000)
libglobus_common.so.0 => /home/paul/local/lib/libglobus_common.so.0 (0x00007f150c639000)
libglobus_ftp_client.so.2 => /home/paul/local/lib/libglobus_ftp_client.so.2 (0x00007f150c5f7000)
libglobus_gsi_sysconfig.so.1 => /home/paul/local/lib/libglobus_gsi_sysconfig.so.1 (0x00007f150c5e7000)
libglobus_gass_transfer.so.2 => /home/paul/local/lib/libglobus_gass_transfer.so.2 (0x00007f150c5d0000)
libglobus_io.so.3 => /home/paul/local/lib/libglobus_io.so.3 (0x00007f150c5b3000)
libglobus_gssapi_gsi.so.4 => /home/paul/local/lib/libglobus_gssapi_gsi.so.4 (0x00007f150c58c000)
libglobus_gssapi_error.so.2 => /home/paul/local/lib/libglobus_gssapi_error.so.2 (0x00007f150c587000)
libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f150c261000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f150c256000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f150c234000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f150c06f000)
libglobus_ftp_control.so.1 => /home/paul/local/lib/libglobus_ftp_control.so.1 (0x00007f150c043000)
libglobus_gsi_callback.so.0 => /home/paul/local/lib/libglobus_gsi_callback.so.0 (0x00007f150c035000)
libglobus_gsi_credential.so.1 => /home/paul/local/lib/libglobus_gsi_credential.so.1 (0x00007f150c021000)
libglobus_xio.so.0 => /home/paul/local/lib/libglobus_xio.so.0 (0x00007f150bfbb000)
libglobus_openssl_error.so.0 => /home/paul/local/lib/libglobus_openssl_error.so.0 (0x00007f150bfb4000)
libglobus_gss_assist.so.3 => /home/paul/local/lib/libglobus_gss_assist.so.3 (0x00007f150bfa2000)
libglobus_openssl.so.0 => /home/paul/local/lib/libglobus_openssl.so.0 (0x00007f150bf9d000)
libglobus_gsi_cert_utils.so.0 => /home/paul/local/lib/libglobus_gsi_cert_utils.so.0 (0x00007f150bf96000)
libglobus_gsi_proxy_core.so.0 => /home/paul/local/lib/libglobus_gsi_proxy_core.so.0 (0x00007f150bf81000)
libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f150beee000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f150bee8000)
/lib64/ld-linux-x86-64.so.2 (0x00007f150c6b8000)
libglobus_oldgaa.so.0 => /home/paul/local/lib/libglobus_oldgaa.so.0 (0x00007f150beda000)
libglobus_callout.so.0 => /home/paul/local/lib/libglobus_callout.so.0 (0x00007f150bed3000)
libglobus_proxy_ssl.so.1 => /home/paul/local/lib/libglobus_proxy_ssl.so.1 (0x00007f150becd000)
paul@sprocket:~$ Nevertheless, I still see this problem. paul@sprocket:~$ which globus-url-copy
/home/paul/local/bin/globus-url-copy
paul@sprocket:~$ globus-url-copy file:///bin/bash gsiftp://localhost/public/test-1
error: globus_l_ftp_control_send_cmd_cb: gss_init_sec_context failed to generate output token
paul@sprocket:~$ |
Doing the following from CentOS 7 using a freshly installed "globus-gass-copy-progs" package (from EPEL7 based on our last GCT release from August 2021) and deps works for me:
|
Thanks for testing @fscheiner. This is useful information. Could you rerun the You should see a line (It's just to be certain.) My other question is which version of OpenSSL does CentOS 7 use? |
I asked because you had differing prompts in #174 (comment) and #174 (comment), so I wasn't sure if the things in your first comment happened on the same OS as the things in your second comment. Hm, it might as well be an issue with Debian. We don't actually build test on Debian, but only on CentOS7 (currently) and in the future also on RockyLinux 8 and CentOS Stream 8 and 9. I'll do some testing on Debian, too. For now I'll retry the above from Fedora and openSUSE Leap.
Right, it's "OpenSSL 1.0.2k-fips 26 Jan 2017" so no TLS v1.3 here. I forgot about that. Here's the output:
|
I can confirm, that it also doesn't work with the
And I also dug out the log from my testing on 2021-07-20 with a locally installed version based on https://github.com/gridcf/gct/tree/90b16680704527b79e499fb99470b8601d4dcf5e:
Notice the difference between:
...and:
|
I can confirm I see the exact same error and output on my OpenSUSE LEAP15.3 which has (a patched) OpenSSL 1.1.1d. |
The installation used on 2021-07-20 (on openSUSE Leap 15.2 back then) is not available anymore and I also don't have any snapshots that go back that far. I have a GCT installation from a few weeks later (2021-08-12), but it does not work:
The question remains, why did it work on 2021-07-20? @paulmillar: I had a look into the dCache master branch (as "prometheus.desy.de" is based on that according to https://confluence.desy.de/pages/viewpage.action?pageId=228761933) and are these changes:
...related to the TLS functionality we touch here? And could they (specifcally the second one) have a play in here perhaps? |
@fscheiner I'm pretty sure this isn't a regression in dCache. Yes, prometheus is rebuilt daily using the latest (successful) build from our CI. I've also tested The problem is present with all these versions of dCache. On the two patches you found (dCache/dcache@ddd6c88 and dCache/dcache@57c277a), the first was an attempt to improve the performance of TLS support within our xroot server by using a platform native TLS implementation (BoringSSL). It turns out that the way we created SSLContext objects for FTP triggered a bug in this native library support that resulted in a memory leak, so the change was reverted for the ftp doors. This means that from 7.2.0 to 7.2.2 (inclusive), the gsiftp door was using BoringSSL, which is a fork of OpenSSL. I just tested 7.2.0 and a TLS v1.3 upload works fine: ✔ 10:00:10 dCache [7.2.0 ✔] $ GLOBUS_GSSAPI_DEBUG_LEVEL=2 globus-url-copy file:///bin/bash gsiftp://localhost/public/test-1
Disabling SSLv2 and SSLv3.
acquire_cred: MICV2 MECH OID
Creating context w/ GSS_C_NO_CREDENTIAL.
init_sec_context: no mech_type requested; using MICV2 MECH OID
Disabling SSLv2 and SSLv3.
acquire_cred: MICV2 MECH OID
Ciphers available:
init_sec_context:major_status:00000001:gss_state:0 req_flags=00001013:ret_flags=00000000
SSL handshake finished
Using TLSv1.3.
cred_usage=1
Cipher being used:
TLS_AES_256_GCM_SHA384 TLSv1.3 Kx=any Au=any Enc=AESGCM(256) Mac=AEAD
X509 subject after proxy : /DC=org/DC=dCache/CN=host/localhost
Comparing names:
Name 1 is of type GLOBUS_GSS_C_NT_X509:
Name 2 is of type GLOBUS_GSS_C_NT_HOST_IP:
Compared 1
init_sec_context:major_status:00000001:gss_state:1 req_flags=00001013:ret_flags=000000ff
init_sec_context:major_status:00000001:gss_state:4 req_flags=00001013:ret_flags=000000ff
init_sec_context:major_status:00000000:gss_state:6 req_flags=00001013:ret_flags=000000ff
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
gss_wrap conf_req_flag=0 qop_req=0
✔ 10:00:23 dCache [7.2.0 ✔] $ While nice, this is perhaps not surprising. After all, an OpenSSL client should be compatible with an The problem still remains with compatibility with the Java implementation of TLS v1.3. |
This perfectly explains, what we are seeing today: The first change - use BoringSSL - was done on 2021-06-02 and reverted on 2021-11-04, basically restoring the situation from 2021-06-01 (the date when dCache/dcache#5939 was created) and before. My testing of the GCT with #155 merged happened on 2021-07-20, so against a dCache instance using BoringSSL at that time. #155 is from 2021-06-15, so @ellert most likely patched against a dCache version using BoringSSL, too. Your test against a dCache 7.2.0 instance seems to confirm that, and comparing these messages:
...to what I wrote in #174 (comment) seem to confirm, that the Java implementation of TLS v1.3 works differently (enough) from the OpenSSL/BoringSSL one to break interoperability with the GCT. For reference, IIUIC the
...from
I hope Mattias (@ellert) can come up with a patch that can handle both TLS implementations. But this time for testing we need to make sure that the |
For the time being the GCT and dCache stay interoperable when using TLS v1.2 only (according to #174 (comment)). I hence changed the title of this issue and appended " when using TLS v1.3". |
I now have successfully tested a manually compiled version based on @ellert's
...on both Fedora 34 and openSUSE Leap 15.3 against:
|
Just for reference, these are the logs for the dCache case on:
|
Thanks @ellert , I can confirm the problem is fixed for me, too. |
Fixed in GCT 6.2.20220524 maintenance release. |
When testing with the current tip of gct
master
branch (commit2217e6ec24
) I cannot upload data to dCache.I see the following error:
I've tested this will all currently supported versions of dCache: the problem is present (on my laptop) in all cases.
Note that the above command is statically built:
Therefore, it does not depends on external globus libraries (
libglobus_*
), making testing easier.I ran
git bisect run
with a script that builds gct and runs the aboveglobus-url-copy
command. The run completed with the following output:The text was updated successfully, but these errors were encountered: