-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment #3299
Comments
Known issue in JDK: https://bugs.openjdk.org/browse/JDK-8221218. Maybe it's been resolved in JDK20 |
I have the same issue using the latest helm charts and docker images. interestingly it worked for a while, after re-creating the CA and certs it stopped working consistently. |
Got the same issue. During cluster migration from 2.8 to 2.9 one of the node could not start. What is the root cause so far is not clear. |
[Triage] Going to leave this untriaged since we dont really know how to move forward yet. We can keep the issue though and add more info if we encounter this further. |
[Triage] Per @willyborankin's suggestion, you can reproduce it by starting a migration and adding a new node during migration with the same certificate. Any fixes for the issues will be accepted. Likely a change around 1.7.6 or jdk20. |
PR with BC 1.76 was merged in OpenSearch. |
Hi guys. |
Also having this issue using latest tag. And i am using proper plugins.security.nodes_dn settings. |
bug not resolved (15.01.2024), use tls 1.2 instead tls 1.3 |
Seems like a bug in JDK: https://bugs.openjdk.java.net/browse/JDK-8221218 See this forum post for more details: https://forum.opensearch.org/t/cluster-does-not-initialize-javax-net-ssl-sslhandshakeexception-insufficient-buffer-remaining-for-aead-cipher-fragment/2845/5 |
Like others have said this seems to be a known issue with how the JDK handles TLS: https://bugs.openjdk.org/browse/JDK-8221218 If you look at the comments here, they seem to suggest fixes have occurred but obviously this is not the case... It is also worth pointing out that neither of the fixes were actually intended to address this specific issue. I am not sure why they closed this issue as resolved when the linked changes were for separate bugs... Further examples of the issue being known: Oracle support page (https://support.oracle.com/knowledge/Middleware/2519569_1.html)
Another project running into this issue: https://forum.portswigger.net/thread/complete-proxy-failure-due-to-java-tls-bug-1e334581
One last attempt to fix this would be looking at increasing the Bouncycastle version:
I will try to do this and see if it is possible but I am not sure about reproducing the issue consistently so it may be challenging to test. |
@LHozzan @Thrallix @VovkaSOL We've been having no luck with this issue, one thing I'm trying to understand is how impactful this issue is to you. From our evidence it looks like this has only happened during cluster startup. If its a startup issue is unfortunate, but limited in overall impact. Whereas - if this issue happens intermittently on a cluster and takes down a node then we should invest more time, can you help provide use with details of your reproduction? |
I am seeing this issue consistently after trying to change cert providers. |
@reshippie (any anyone else experience this issue) could you include the operation system version / jdk version / opensearch distro version. Basic cluster topology (3 data nodes, 2 cluster managers). Anything interesting about your security configuration. If you don't feel conformable posting that information publicly, feel free to reach out to me first on our slack instance, I'm |
We're running: I don't think there's anything interesting in our security config
I tried the solution posted by @VovkaSOL. Adding |
I looked into updating the bouncycastle version as mentioned above. We would need to follow something similar to when it was moved to opensearch-project/OpenSearch#8247 At the time, @willyborankin only bumped to 15to18 because of the multi-release jars. I don't know if it feasible to move past that point/if opensearch can handle the later version. @willyborankin do you know? |
@scrawfor99 Not sure about it, we still support JDK 1.8 build AFAIK. |
@willyborankin, I think 18on will still work with 1.8. I saw you made the swap to 15to18 though and not 18on in the linked PR so was not sure whether you knew what was or was not compatible. |
### Description [Describe what this change achieves] Following: opensearch-project/OpenSearch#12317 in core, this PR increases the version used for bouncycastle in the Security plugin. This is an attempt to correct the intermittent failures described here: [#3299](#3299) ### Check List - [ ] ~New functionality includes testing~ - [ ] ~New functionality has been documented~ - [x] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Stephen Crawford <steecraw@amazon.com>
With the updates the bouncy castle, I am going to close this issue as this is the most we can currently do to resolve the exception. Based on some other discussions, the update to bouncy castle should help resolve the failures. |
Hi @peternied . Sorry for delay response.
This problem in our infrastructure occurring random on all nodes roles. If problem occurred only on one coordinator node, second replica is working, but if both replicas are hitting by the problem, there are basically complete cluster useless, no matter, that managers and data nodes are working fine. We actually using default community Docker image
The problem occurring in our both using setups. I mean:
Based on my observation it seems, that more often occurring on multirole, but I not have any exact data. @scrawfor99 OK, lets wait for next release (2.12.x) and hopefully problem will be fixed there. If it will be persistent, I will let you know. |
Hi @LHozzan, do you use |
After installation(2 data node, 1 manager node) with the demo config, I have updated the opensearch.yml with the following plugins.security.ssl.transport.pemcert_filepath: tls.crt
plugins.security.ssl.transport.pemkey_filepath: tls.key
plugins.security.ssl.transport.pemtrustedcas_filepath: ca.crt
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: tls.crt
plugins.security.ssl.http.pemkey_filepath: tls.key
plugins.security.ssl.http.pemtrustedcas_filepath: ca.crt
plugins.security.allow_unsafe_democertificates: false
plugins.security.allow_default_init_securityindex: true
plugins.security.authcz.admin_dn: ['CN=admin']
plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: [all_access, security_rest_api_access]
plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices:
- .plugins-ml-agent
- .plugins-ml-config
- .plugins-ml-connector
- .plugins-ml-controller
- .plugins-ml-model-group
- .plugins-ml-model
- .plugins-ml-task
- .plugins-ml-conversation-meta
- .plugins-ml-conversation-interactions
- .plugins-ml-memory-meta
- .plugins-ml-memory-message
- .plugins-ml-stop-words
- .opendistro-alerting-config
- .opendistro-alerting-alert*
- .opendistro-anomaly-results*
- .opendistro-anomaly-detector*
- .opendistro-anomaly-checkpoints
- .opendistro-anomaly-detection-state
- .opendistro-reports-*
- .opensearch-notifications-*
- .opensearch-notebooks
- .opensearch-observability
- .ql-datasources
- .opendistro-asynchronous-search-response*
- .replication-metadata-store
- .opensearch-knn-models
- .geospatial-ip2geo-data*
- .plugins-flow-framework-config
- .plugins-flow-framework-templates
- .plugins-flow-framework-state
plugins.security.ssl.http.enabled_protocols:
- "TLSv1.2"
plugins.security.nodes_dn:
- 'CN=node' Then I ran /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -icl -nhnv \
-cd "/usr/share/opensearch/config/opensearch-security" \
-key "/usr/share/opensearch/config/kirk-key.pem" \
-cert "/usr/share/opensearch/config/kirk.pem" \
-cacert "/usr/share/opensearch/config/root-ca.pem" After that point, I keep getting errors. The following makefile generates my keys keys/root-ca.key:
mkdir -p keys;
openssl genrsa -out keys/root-ca.key 2048;
keys/ca.crt: keys/root-ca.key
openssl req -new -x509 -sha256 -key keys/root-ca.key -out keys/ca.crt -days 730 -subj "/CN=ca.local";
keys/admin.key:
mkdir -p keys;
openssl genrsa -out keys/admin-temp.key 2048;
openssl pkcs8 -inform PEM -outform PEM -in keys/admin-temp.key -topk8 -nocrypt -v1 PBE-SHA1-3DES -out keys/admin.key
rm keys/admin-temp.key;
keys/admin.crt: keys/admin.key keys/ca.crt keys/root-ca.key
openssl req -new -key keys/admin.key -out keys/admin.csr -subj "/CN=admin";
openssl x509 -req -in keys/admin.csr -CA keys/ca.crt -CAkey keys/root-ca.key -CAcreateserial -sha256 -out keys/admin.crt -days 730;
rm keys/admin.csr;
keys/tls.key:
openssl genrsa -out keys/tls-temp.key 2048;
openssl pkcs8 -inform PEM -outform PEM -in keys/tls-temp.key -topk8 -nocrypt -v1 PBE-SHA1-3DES -out keys/tls.key
rm keys/tls-temp.key;
keys/tls.crt: keys/tls.key keys/ca.crt keys/root-ca.key
openssl req -new -key keys/tls.key -out keys/tls.csr -subj "/CN=node";
openssl x509 -req -in keys/tls.csr -CA keys/ca.crt -CAkey keys/root-ca.key -CAcreateserial -sha256 -out keys/tls.crt -days 730;
rm keys/tls.csr;
removeoldkeys:
rm -rf keys;
makekeys: removeoldkeys keys/admin.key keys/admin.crt keys/tls.key keys/tls.crt keys/ca.crt
@echo "Keys are generated."; I am stuck here for a while, please help! 🙏 |
…project#4052) ### Description [Describe what this change achieves] Following: opensearch-project/OpenSearch#12317 in core, this PR increases the version used for bouncycastle in the Security plugin. This is an attempt to correct the intermittent failures described here: [opensearch-project#3299](opensearch-project#3299) ### Check List - [ ] ~New functionality includes testing~ - [ ] ~New functionality has been documented~ - [x] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Stephen Crawford <steecraw@amazon.com>
I'm seeing errors like this in master node logs:
Here's the expanded stack trace:
I'm using container image I don't think this is fixed. Could someone please re-open? |
same error happened here but what I've done that caused this error was using a Cert with SANS for all my cluster nodes... I've used this kind of Cert for other services without any problems...I hope that you guys fix this issue! |
…project#4052) [Describe what this change achieves] Following: opensearch-project/OpenSearch#12317 in core, this PR increases the version used for bouncycastle in the Security plugin. This is an attempt to correct the intermittent failures described here: [opensearch-project#3299](opensearch-project#3299) - [ ] ~New functionality includes testing~ - [ ] ~New functionality has been documented~ - [x] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Stephen Crawford <steecraw@amazon.com> (cherry picked from commit b7b49b9)
…project#4052) [Describe what this change achieves] Following: opensearch-project/OpenSearch#12317 in core, this PR increases the version used for bouncycastle in the Security plugin. This is an attempt to correct the intermittent failures described here: [opensearch-project#3299](opensearch-project#3299) - [ ] ~New functionality includes testing~ - [ ] ~New functionality has been documented~ - [x] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Stephen Crawford <steecraw@amazon.com> (cherry picked from commit b7b49b9) Signed-off-by: Darshit Chanpura <dchanp@amazon.com>
## Issue When a new TLS certificate authority (CA) certificate is issued, the opensearch-operator should add this new CA to all its units and request new certificates. The new certificates (including the CA certificate) should be distributed to all OpenSearch nodes in a rolling restart manner, without downtime to the entire cluster. Due to limitations on the self-signed-certificates operator it is not possible to: - get a notice if a CA certificate is about to expire - request a new CA when the current one is about to or has expired - request an intermediate CA and sign future certificates with it There is currently no support for renewing a root / CA certificate on the self-signed-certificates operator. A new root / CA certificate will only be generated and issued if the common_name of the CA changes. We have decided to implement the logic in that way that we check each certificate if it includes a new CA. If so, we store the new CA and initiate the CA rotation workflow on OpenSearch. ## Solution This PR implements the following workflow: - check each `CertificateAvailableEvent` if it includes a new CA - add the new CA to the truststore - add a notice `tls_ca_renewing` to the unit's peer data - initiate a restart of OpenSearch (using the locking mechanism to coordinate cluster availability during the restart) - after restarting, add a notice `tls_ca_renewed` to the unit's peer data - when the restart is done on all of the cluster nodes, request new TLS certificates and apply them to the node During the phase of renewing the CA, all incoming `CertificateAvailableEvents` will be deferred in order to avoid incompatibilites in communication between the nodes. Please also see the flow of events and actions that has been documented here: https://github.com/canonical/opensearch-operator/wiki/TLS-CA-rotation-flow ## Notes - There is a dependency to #367 because during the rolling restart when the CA is rotated it is very likely that the voting exclusion issue shows up (at least in 3-node-clusters). Therefore the integration test is currently running only with two nodes. Once the voting exclusions issue is resolved, this can be updated to the usual three nodes. - Due to an upstream bug with JDK it is necessary to use TLS v1.2 (more details see opensearch-project/security#3299). - This PR introduces a method to append configuration to the jvm options file of OpenSearch (used to set TLS config to v1.2). --------- Co-authored-by: Mehdi Bendriss <bendrissmehdi@gmail.com> Co-authored-by: Judit Novak <judit.novak@canonical.com>
When a new TLS certificate authority (CA) certificate is issued, the opensearch-operator should add this new CA to all its units and request new certificates. The new certificates (including the CA certificate) should be distributed to all OpenSearch nodes in a rolling restart manner, without downtime to the entire cluster. Due to limitations on the self-signed-certificates operator it is not possible to: - get a notice if a CA certificate is about to expire - request a new CA when the current one is about to or has expired - request an intermediate CA and sign future certificates with it There is currently no support for renewing a root / CA certificate on the self-signed-certificates operator. A new root / CA certificate will only be generated and issued if the common_name of the CA changes. We have decided to implement the logic in that way that we check each certificate if it includes a new CA. If so, we store the new CA and initiate the CA rotation workflow on OpenSearch. This PR implements the following workflow: - check each `CertificateAvailableEvent` if it includes a new CA - add the new CA to the truststore - add a notice `tls_ca_renewing` to the unit's peer data - initiate a restart of OpenSearch (using the locking mechanism to coordinate cluster availability during the restart) - after restarting, add a notice `tls_ca_renewed` to the unit's peer data - when the restart is done on all of the cluster nodes, request new TLS certificates and apply them to the node During the phase of renewing the CA, all incoming `CertificateAvailableEvents` will be deferred in order to avoid incompatibilites in communication between the nodes. Please also see the flow of events and actions that has been documented here: https://github.com/canonical/opensearch-operator/wiki/TLS-CA-rotation-flow - There is a dependency to #367 because during the rolling restart when the CA is rotated it is very likely that the voting exclusion issue shows up (at least in 3-node-clusters). Therefore the integration test is currently running only with two nodes. Once the voting exclusions issue is resolved, this can be updated to the usual three nodes. - Due to an upstream bug with JDK it is necessary to use TLS v1.2 (more details see opensearch-project/security#3299). - This PR introduces a method to append configuration to the jvm options file of OpenSearch (used to set TLS config to v1.2). --------- Co-authored-by: Mehdi Bendriss <bendrissmehdi@gmail.com> Co-authored-by: Judit Novak <judit.novak@canonical.com>
When a new TLS certificate authority (CA) certificate is issued, the opensearch-operator should add this new CA to all its units and request new certificates. The new certificates (including the CA certificate) should be distributed to all OpenSearch nodes in a rolling restart manner, without downtime to the entire cluster. Due to limitations on the self-signed-certificates operator it is not possible to: - get a notice if a CA certificate is about to expire - request a new CA when the current one is about to or has expired - request an intermediate CA and sign future certificates with it There is currently no support for renewing a root / CA certificate on the self-signed-certificates operator. A new root / CA certificate will only be generated and issued if the common_name of the CA changes. We have decided to implement the logic in that way that we check each certificate if it includes a new CA. If so, we store the new CA and initiate the CA rotation workflow on OpenSearch. This PR implements the following workflow: - check each `CertificateAvailableEvent` if it includes a new CA - add the new CA to the truststore - add a notice `tls_ca_renewing` to the unit's peer data - initiate a restart of OpenSearch (using the locking mechanism to coordinate cluster availability during the restart) - after restarting, add a notice `tls_ca_renewed` to the unit's peer data - when the restart is done on all of the cluster nodes, request new TLS certificates and apply them to the node During the phase of renewing the CA, all incoming `CertificateAvailableEvents` will be deferred in order to avoid incompatibilites in communication between the nodes. Please also see the flow of events and actions that has been documented here: https://github.com/canonical/opensearch-operator/wiki/TLS-CA-rotation-flow - There is a dependency to #367 because during the rolling restart when the CA is rotated it is very likely that the voting exclusion issue shows up (at least in 3-node-clusters). Therefore the integration test is currently running only with two nodes. Once the voting exclusions issue is resolved, this can be updated to the usual three nodes. - Due to an upstream bug with JDK it is necessary to use TLS v1.2 (more details see opensearch-project/security#3299). - This PR introduces a method to append configuration to the jvm options file of OpenSearch (used to set TLS config to v1.2). --------- Co-authored-by: Mehdi Bendriss <bendrissmehdi@gmail.com> Co-authored-by: Judit Novak <judit.novak@canonical.com>
Exact same issue here on 2.18.0. Seems to start occurring more frequently when I start shipping logs from fluent-bit, effectively nuking my cluster. A client decimating a server with a faulty TLS handshake seems like a super critical vulnerability to me. |
I just installed Graylog and two Datanodes. I'm seeing this issue on one of the datanodes but the other works fine. Does anyone have fix that works? I've tried most of the suggestions above to resolve this but no luck. |
Fix for me was to disable hostname verification which is unfortunate ( @khamilton59 if you're still having this issue, please post on the Graylog Community forum and we'll try to help as best we can over there! |
Seeing error
javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
during OpenSearch startupExpected result
Should not see errors from underlying system configuration
Additional context
The text was updated successfully, but these errors were encountered: