Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment #3299

Open
peternied opened this issue Sep 4, 2023 · 27 comments
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@peternied
Copy link
Member

peternied commented Sep 4, 2023

Seeing error
javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16) during OpenSearch startup

Error: 9-04T06:39:28,837][ERROR][o.o.s.s.t.SecuritySSLNettyTransport] [smoketestnode] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
	at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:360) ~[?:?]
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:303) ~[?:?]
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:298) ~[?:?]
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:134) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:681) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:636) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:454) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:433) ~[?:?]
	at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:637) ~[?:?]
	at io.netty.handler.ssl.JdkSslEngine.unwrap(JdkSslEngine.java:92) ~[netty-handler-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.ssl.JdkAlpnSslEngine.unwrap(JdkAlpnSslEngine.java:163) ~[netty-handler-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:309) ~[netty-handler-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) ~[netty-handler-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1329) ~[netty-handler-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1378) ~[netty-handler-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) ~[netty-codec-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ~[netty-codec-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1[410](https://github.com/opensearch-project/security/actions/runs/6069984984/job/16465215605?pr=3296#step:8:423)) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:[420](https://github.com/opensearch-project/security/actions/runs/6069984984/job/16465215605?pr=3296#step:8:433)) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.97.Final.jar:4.1.97.Final]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: javax.crypto.BadPaddingException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
	at sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1894) ~[?:?]
	at sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:240) ~[?:?]
	at sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:197) ~[?:?]
	at sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:160) ~[?:?]
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:111) ~[?:?]
	... 29 more

Expected result

Should not see errors from underlying system configuration

Additional context

@github-actions github-actions bot added the untriaged Require the attention of the repository maintainers and may need to be prioritized label Sep 4, 2023
@peternied peternied added the bug Something isn't working label Sep 4, 2023
@peternied peternied changed the title SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment [Bug] SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment Sep 4, 2023
@willyborankin
Copy link
Collaborator

willyborankin commented Sep 5, 2023

Known issue in JDK: https://bugs.openjdk.org/browse/JDK-8221218. Maybe it's been resolved in JDK20

@waza-ari
Copy link

waza-ari commented Sep 8, 2023

I have the same issue using the latest helm charts and docker images. interestingly it worked for a while, after re-creating the CA and certs it stopped working consistently.

@willyborankin
Copy link
Collaborator

willyborankin commented Sep 8, 2023

Got the same issue. During cluster migration from 2.8 to 2.9 one of the node could not start. What is the root cause so far is not clear.

@stephen-crawford
Copy link
Contributor

[Triage] Going to leave this untriaged since we dont really know how to move forward yet. We can keep the issue though and add more info if we encounter this further.

@stephen-crawford stephen-crawford removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label Sep 11, 2023
@stephen-crawford
Copy link
Contributor

[Triage] Per @willyborankin's suggestion, you can reproduce it by starting a migration and adding a new node during migration with the same certificate. Any fixes for the issues will be accepted. Likely a change around 1.7.6 or jdk20.

@stephen-crawford stephen-crawford added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. bug Something isn't working and removed bug Something isn't working labels Sep 18, 2023
@willyborankin
Copy link
Collaborator

PR with BC 1.76 was merged in OpenSearch.

@LHozzan
Copy link

LHozzan commented Dec 13, 2023

Hi guys.
Problem is still persistent in v2.11.0.
I would like to kindly ask you let us know, when fix will be available in particular version.

@Thrallix
Copy link

Also having this issue using latest tag.
Note that this rule is off:
plugins.security.ssl.transport.enforce_hostname_verification: false

And i am using proper plugins.security.nodes_dn settings.

@VovkaSOL
Copy link

bug not resolved (15.01.2024), use tls 1.2 instead tls 1.3
use VM arg: -Djdk.tls.client.protocols=TLSv1.2
or if you use netty config ssl handler:
SslHandler handler = sslContext.newHandler(socketChannel.alloc()); handler.engine().setEnabledProtocols(new String[] {"TLSv1.2"});

@stephen-crawford stephen-crawford self-assigned this Feb 7, 2024
@stephen-crawford
Copy link
Contributor

Like others have said this seems to be a known issue with how the JDK handles TLS:

https://bugs.openjdk.org/browse/JDK-8221218

If you look at the comments here, they seem to suggest fixes have occurred but obviously this is not the case... It is also worth pointing out that neither of the fixes were actually intended to address this specific issue. I am not sure why they closed this issue as resolved when the linked changes were for separate bugs...

Further examples of the issue being known:

Oracle support page (https://support.oracle.com/knowledge/Middleware/2519569_1.html)

Applies to: Oracle WebLogic Server - Version 12.1.3.0.0 and later

Another project running into this issue:

https://forum.portswigger.net/thread/complete-proxy-failure-due-to-java-tls-bug-1e334581

Thanks for reporting this. It is a known unresolved bug in OpenJDK

One last attempt to fix this would be looking at increasing the Bouncycastle version:

tkohegyi/mitmJavaProxy#12

I use JDK15 and later + org.bouncycastle/bcpkix-jdk18on/1.71.1 and I cannot repro it anymore

I will try to do this and see if it is possible but I am not sure about reproducing the issue consistently so it may be challenging to test.

@peternied
Copy link
Member Author

@LHozzan @Thrallix @VovkaSOL We've been having no luck with this issue, one thing I'm trying to understand is how impactful this issue is to you. From our evidence it looks like this has only happened during cluster startup. If its a startup issue is unfortunate, but limited in overall impact. Whereas - if this issue happens intermittently on a cluster and takes down a node then we should invest more time, can you help provide use with details of your reproduction?

@reshippie
Copy link
Contributor

reshippie commented Feb 8, 2024

I am seeing this issue consistently after trying to change cert providers.
I did a full cluster restart and I'm getting that error on all of my nodes.
I don't know if it's relevant but the old certs we were using were RSA, while the new certs are id-ecPublicKey

@peternied
Copy link
Member Author

@reshippie (any anyone else experience this issue) could you include the operation system version / jdk version / opensearch distro version. Basic cluster topology (3 data nodes, 2 cluster managers). Anything interesting about your security configuration.

If you don't feel conformable posting that information publicly, feel free to reach out to me first on our slack instance, I'm Peter Nied or email pet ern @ am az on .co m (remove the spaces)

@reshippie
Copy link
Contributor

reshippie commented Feb 8, 2024

We're running:
Debian 10.13
Opensearch 2.9.0
bundled Java 17.0.7
6 data nodes, 3 managers, 1 coordinating node (for Dashboards)

I don't think there's anything interesting in our security config

plugins.security.ssl_cert_reload_enabled: true
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.advanced_modules_enabled: true
plugins.security.nodes_dn:
  - 'CN=dashboards-*-mgmt'
  - 'CN=esmaster-*-mgmt'
  - 'CN=elasticsearch-*-mgmt'
  - 'CN=osdata-*-mgmt'
 # Trasnport layer TLS
plugins.security.ssl.transport.enabled: true
plugins.security.ssl.transport.pemkey_filepath: ssl/{{ ansible_hostname }}-mgmt.pk8
plugins.security.ssl.transport.pemcert_filepath: ssl/{{ ansible_hostname }}-mgmt.crt
plugins.security.ssl.transport.pemtrustedcas_filepath: ssl/{{ ansible_hostname }}-mgmt.issuer.crt
plugins.security.ssl.transport.truststore_filepath: cacerts
#
# REST layer TLS
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemkey_filepath: ssl/{{ ansible_hostname }}-mgmt.pk8
plugins.security.ssl.http.pemcert_filepath: ssl/{{ ansible_hostname }}-mgmt.crt
plugins.security.ssl.http.pemtrustedcas_filepath: ssl/{{ ansible_hostname }}-mgmt.issuer.crt
plugins.security.restapi.roles_enabled: ["admin_role", "security_rest_api_access"]
plugins.security.authcz.admin_dn: CN=DOMAIN.org

I tried the solution posted by @VovkaSOL. Adding -Djdk.tls.client.protocols=TLSv1.2 did not make the error go away.

@stephen-crawford
Copy link
Contributor

I looked into updating the bouncycastle version as mentioned above. We would need to follow something similar to when it was moved to opensearch-project/OpenSearch#8247

At the time, @willyborankin only bumped to 15to18 because of the multi-release jars. I don't know if it feasible to move past that point/if opensearch can handle the later version. @willyborankin do you know?

@willyborankin
Copy link
Collaborator

willyborankin commented Feb 12, 2024

I looked into updating the bouncycastle version as mentioned above. We would need to follow something similar to when it was moved to opensearch-project/OpenSearch#8247

At the time, @willyborankin only bumped to 15to18 because of the multi-release jars. I don't know if it feasible to move past that point/if opensearch can handle the later version. @willyborankin do you know?

@scrawfor99 Not sure about it, we still support JDK 1.8 build AFAIK.

@stephen-crawford
Copy link
Contributor

@willyborankin, I think 18on will still work with 1.8. I saw you made the swap to 15to18 though and not 18on in the linked PR so was not sure whether you knew what was or was not compatible.

stephen-crawford added a commit that referenced this issue Feb 14, 2024
### Description
[Describe what this change achieves]
Following: opensearch-project/OpenSearch#12317
in core, this PR increases the version used for bouncycastle in the
Security plugin. This is an attempt to correct the intermittent failures
described here:
[#3299](#3299)

### Check List
- [ ] ~New functionality includes testing~
- [ ] ~New functionality has been documented~
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@stephen-crawford
Copy link
Contributor

With the updates the bouncy castle, I am going to close this issue as this is the most we can currently do to resolve the exception. Based on some other discussions, the update to bouncy castle should help resolve the failures.

@LHozzan
Copy link

LHozzan commented Feb 16, 2024

Hi @peternied .

Sorry for delay response.

We've been having no luck with this issue, one thing I'm trying to understand is how impactful this issue is to you. From our evidence it looks like this has only happened during cluster startup. If its a startup issue is unfortunate, but limited in overall impact. Whereas - if this issue happens intermittently on a cluster and takes down a node then we should invest more time, can you help provide use with details of your reproduction?

This problem in our infrastructure occurring random on all nodes roles. If problem occurred only on one coordinator node, second replica is working, but if both replicas are hitting by the problem, there are basically complete cluster useless, no matter, that managers and data nodes are working fine.
Same situation, if any another roles are affected in same time or with some delay.
We have monitoring and watching, if components before OpenSearch cluster can connect to it, but it is inconvenient.

We actually using default community Docker image opensearchproject/opensearch:2.11.1, but only little time. We have actually clusters only in AWS and M$ and I can observe same problem on both providers.

Basic cluster topology (3 data nodes, 2 cluster managers). Anything interesting about your security configuration.

The problem occurring in our both using setups. I mean:

  • one multirole node
  • 2 coordinators, 2 manager, 2 data nodes

Based on my observation it seems, that more often occurring on multirole, but I not have any exact data.

@scrawfor99 OK, lets wait for next release (2.12.x) and hopefully problem will be fixed there. If it will be persistent, I will let you know.

@willyborankin
Copy link
Collaborator

Hi @LHozzan, do you use Wireguard/IPSec as an addition encryption mechanism for the communication between nodes? If yes the problem could be related to Wireguard/IPSec configurtaion

@malayh
Copy link

malayh commented Apr 12, 2024

After installation(2 data node, 1 manager node) with the demo config, I have updated the opensearch.yml with the following

plugins.security.ssl.transport.pemcert_filepath: tls.crt
plugins.security.ssl.transport.pemkey_filepath: tls.key
plugins.security.ssl.transport.pemtrustedcas_filepath: ca.crt
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: tls.crt
plugins.security.ssl.http.pemkey_filepath: tls.key
plugins.security.ssl.http.pemtrustedcas_filepath: ca.crt
plugins.security.allow_unsafe_democertificates: false
plugins.security.allow_default_init_securityindex: true
plugins.security.authcz.admin_dn: ['CN=admin']
plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: [all_access, security_rest_api_access]
plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices:
    - .plugins-ml-agent
    - .plugins-ml-config
    - .plugins-ml-connector
    - .plugins-ml-controller
    - .plugins-ml-model-group
    - .plugins-ml-model
    - .plugins-ml-task
    - .plugins-ml-conversation-meta
    - .plugins-ml-conversation-interactions
    - .plugins-ml-memory-meta
    - .plugins-ml-memory-message
    - .plugins-ml-stop-words
    - .opendistro-alerting-config
    - .opendistro-alerting-alert*
    - .opendistro-anomaly-results*
    - .opendistro-anomaly-detector*
    - .opendistro-anomaly-checkpoints
    - .opendistro-anomaly-detection-state
    - .opendistro-reports-*
    - .opensearch-notifications-*
    - .opensearch-notebooks
    - .opensearch-observability
    - .ql-datasources
    - .opendistro-asynchronous-search-response*
    - .replication-metadata-store
    - .opensearch-knn-models
    - .geospatial-ip2geo-data*
    - .plugins-flow-framework-config
    - .plugins-flow-framework-templates
    - .plugins-flow-framework-state
plugins.security.ssl.http.enabled_protocols:
  - "TLSv1.2"
plugins.security.nodes_dn:
  - 'CN=node'

Then I ran

/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -icl -nhnv \
-cd "/usr/share/opensearch/config/opensearch-security" \
-key "/usr/share/opensearch/config/kirk-key.pem" \
-cert "/usr/share/opensearch/config/kirk.pem" \
-cacert "/usr/share/opensearch/config/root-ca.pem"

After that point, I keep getting errors.

The following makefile generates my keys

keys/root-ca.key:
	mkdir -p keys;
	openssl genrsa -out keys/root-ca.key 2048;
keys/ca.crt: keys/root-ca.key
	openssl req -new -x509 -sha256 -key keys/root-ca.key -out keys/ca.crt -days 730 -subj "/CN=ca.local";

keys/admin.key:
	mkdir -p keys;
	openssl genrsa -out keys/admin-temp.key 2048;
	openssl pkcs8 -inform PEM -outform PEM -in keys/admin-temp.key -topk8 -nocrypt -v1 PBE-SHA1-3DES -out keys/admin.key
	rm keys/admin-temp.key;	
keys/admin.crt: keys/admin.key keys/ca.crt keys/root-ca.key
	openssl req -new -key keys/admin.key -out keys/admin.csr -subj "/CN=admin";
	openssl x509 -req -in keys/admin.csr -CA keys/ca.crt -CAkey keys/root-ca.key -CAcreateserial -sha256 -out keys/admin.crt -days 730;
	rm keys/admin.csr;

keys/tls.key:
	openssl genrsa -out keys/tls-temp.key 2048;
	openssl pkcs8 -inform PEM -outform PEM -in keys/tls-temp.key -topk8 -nocrypt -v1 PBE-SHA1-3DES -out keys/tls.key
	rm keys/tls-temp.key;
keys/tls.crt: keys/tls.key keys/ca.crt keys/root-ca.key
	openssl req -new -key keys/tls.key -out keys/tls.csr -subj "/CN=node";
	openssl x509 -req -in keys/tls.csr -CA keys/ca.crt -CAkey keys/root-ca.key -CAcreateserial -sha256 -out keys/tls.crt -days 730;
	rm keys/tls.csr;
removeoldkeys:
	rm -rf keys;
makekeys: removeoldkeys keys/admin.key keys/admin.crt keys/tls.key keys/tls.crt keys/ca.crt
	@echo "Keys are generated.";

I am stuck here for a while, please help! 🙏

dlin2028 pushed a commit to dlin2028/security that referenced this issue May 1, 2024
…project#4052)

### Description
[Describe what this change achieves]
Following: opensearch-project/OpenSearch#12317
in core, this PR increases the version used for bouncycastle in the
Security plugin. This is an attempt to correct the intermittent failures
described here:
[opensearch-project#3299](opensearch-project#3299)

### Check List
- [ ] ~New functionality includes testing~
- [ ] ~New functionality has been documented~
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@smlx
Copy link

smlx commented Jun 5, 2024

I'm seeing errors like this in master node logs:

[2024-06-05T01:05:39,152][INFO ][o.o.s.a.s.DebugSink      ] [opensearch-cluster-master-2] AUDIT_LOG: {
  "audit_node_id" : "lP5ZYpVDR1O9n8EDWhKe1g",
  "audit_request_layer" : "TRANSPORT",
  "audit_request_exception_stacktrace" : "javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)\n\tat java.base/sun.security.ssl.Alert.createSSLException(Alert.java:130)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316)\n\tat java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:134)\n\tat java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736)\n\tat java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691)\n\tat java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506)\n\tat java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482)\n\tat java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679)\n\tat io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:310)\n\tat io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1445)\n\tat io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)\n\tat io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)\n\tat io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)\n\tat io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: javax.crypto.BadPaddingException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)\n\tat java.base/sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1864)\n\tat java.base/sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:239)\n\tat java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:196)\n\tat java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:159)\n\tat java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)\n\t... 27 more\n",
  "@timestamp" : "2024-06-05T01:00:55.484+00:00",
  "audit_request_effective_user_is_admin" : false,
  "audit_cluster_name" : "opensearch-cluster",
  "audit_format_version" : 4,
  "audit_node_host_address" : "10.200.2.124",
  "audit_node_name" : "opensearch-cluster-master-2",
  "audit_category" : "SSL_EXCEPTION",
  "audit_request_origin" : "TRANSPORT",
  "audit_node_host_name" : "10.200.2.124"
}

Here's the expanded stack trace:

javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:130)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:134)
	at java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736)
	at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691)
	at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506)
	at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482)
	at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679)
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:310)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1445)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: javax.crypto.BadPaddingException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
	at java.base/sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1864)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:239)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:196)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:159)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
	... 27 more

I'm using container image docker.io/opensearchproject/opensearch:2.14.0@sha256:96af4ace999e20f3f74b1675e501d7dba46f2e7c185cfcffd4626898b00e6743 on linux/arm64.

I don't think this is fixed. Could someone please re-open?

@farhadson
Copy link

farhadson commented Jul 20, 2024

same error happened here but what I've done that caused this error was using a Cert with SANS for all my cluster nodes... I've used this kind of Cert for other services without any problems...I hope that you guys fix this issue!

DarshitChanpura pushed a commit to DarshitChanpura/security that referenced this issue Jul 30, 2024
…project#4052)

[Describe what this change achieves]
Following: opensearch-project/OpenSearch#12317
in core, this PR increases the version used for bouncycastle in the
Security plugin. This is an attempt to correct the intermittent failures
described here:
[opensearch-project#3299](opensearch-project#3299)

- [ ] ~New functionality includes testing~
- [ ] ~New functionality has been documented~
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
(cherry picked from commit b7b49b9)
DarshitChanpura pushed a commit to DarshitChanpura/security that referenced this issue Jul 31, 2024
…project#4052)

[Describe what this change achieves]
Following: opensearch-project/OpenSearch#12317
in core, this PR increases the version used for bouncycastle in the
Security plugin. This is an attempt to correct the intermittent failures
described here:
[opensearch-project#3299](opensearch-project#3299)

- [ ] ~New functionality includes testing~
- [ ] ~New functionality has been documented~
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
(cherry picked from commit b7b49b9)
Signed-off-by: Darshit Chanpura <dchanp@amazon.com>
reneradoi added a commit to canonical/opensearch-operator that referenced this issue Sep 6, 2024
## Issue
When a new TLS certificate authority (CA) certificate is issued, the
opensearch-operator should add this new CA to all its units and request
new certificates. The new certificates (including the CA certificate)
should be distributed to all OpenSearch nodes in a rolling restart
manner, without downtime to the entire cluster.

Due to limitations on the self-signed-certificates operator it is not
possible to:
- get a notice if a CA certificate is about to expire
- request a new CA when the current one is about to or has expired
- request an intermediate CA and sign future certificates with it

There is currently no support for renewing a root / CA certificate on
the self-signed-certificates operator. A new root / CA certificate will
only be generated and issued if the common_name of the CA changes.

We have decided to implement the logic in that way that we check each
certificate if it includes a new CA. If so, we store the new CA and
initiate the CA rotation workflow on OpenSearch.

## Solution

This PR implements the following workflow:
- check each `CertificateAvailableEvent` if it includes a new CA
- add the new CA to the truststore
- add a notice `tls_ca_renewing` to the unit's peer data
- initiate a restart of OpenSearch (using the locking mechanism to
coordinate cluster availability during the restart)
- after restarting, add a notice `tls_ca_renewed` to the unit's peer
data
- when the restart is done on all of the cluster nodes, request new TLS
certificates and apply them to the node

During the phase of renewing the CA, all incoming
`CertificateAvailableEvents` will be deferred in order to avoid
incompatibilites in communication between the nodes.

Please also see the flow of events and actions that has been documented
here:
https://github.com/canonical/opensearch-operator/wiki/TLS-CA-rotation-flow

## Notes
- There is a dependency to
#367 because during
the rolling restart when the CA is rotated it is very likely that the
voting exclusion issue shows up (at least in 3-node-clusters). Therefore
the integration test is currently running only with two nodes. Once the
voting exclusions issue is resolved, this can be updated to the usual
three nodes.
- Due to an upstream bug with JDK it is necessary to use TLS v1.2 (more
details see opensearch-project/security#3299).
- This PR introduces a method to append configuration to the jvm options
file of OpenSearch (used to set TLS config to v1.2).

---------

Co-authored-by: Mehdi Bendriss <bendrissmehdi@gmail.com>
Co-authored-by: Judit Novak <judit.novak@canonical.com>
skourta pushed a commit to canonical/opensearch-operator that referenced this issue Sep 18, 2024
When a new TLS certificate authority (CA) certificate is issued, the
opensearch-operator should add this new CA to all its units and request
new certificates. The new certificates (including the CA certificate)
should be distributed to all OpenSearch nodes in a rolling restart
manner, without downtime to the entire cluster.

Due to limitations on the self-signed-certificates operator it is not
possible to:
- get a notice if a CA certificate is about to expire
- request a new CA when the current one is about to or has expired
- request an intermediate CA and sign future certificates with it

There is currently no support for renewing a root / CA certificate on
the self-signed-certificates operator. A new root / CA certificate will
only be generated and issued if the common_name of the CA changes.

We have decided to implement the logic in that way that we check each
certificate if it includes a new CA. If so, we store the new CA and
initiate the CA rotation workflow on OpenSearch.

This PR implements the following workflow:
- check each `CertificateAvailableEvent` if it includes a new CA
- add the new CA to the truststore
- add a notice `tls_ca_renewing` to the unit's peer data
- initiate a restart of OpenSearch (using the locking mechanism to
coordinate cluster availability during the restart)
- after restarting, add a notice `tls_ca_renewed` to the unit's peer
data
- when the restart is done on all of the cluster nodes, request new TLS
certificates and apply them to the node

During the phase of renewing the CA, all incoming
`CertificateAvailableEvents` will be deferred in order to avoid
incompatibilites in communication between the nodes.

Please also see the flow of events and actions that has been documented
here:
https://github.com/canonical/opensearch-operator/wiki/TLS-CA-rotation-flow

- There is a dependency to
#367 because during
the rolling restart when the CA is rotated it is very likely that the
voting exclusion issue shows up (at least in 3-node-clusters). Therefore
the integration test is currently running only with two nodes. Once the
voting exclusions issue is resolved, this can be updated to the usual
three nodes.
- Due to an upstream bug with JDK it is necessary to use TLS v1.2 (more
details see opensearch-project/security#3299).
- This PR introduces a method to append configuration to the jvm options
file of OpenSearch (used to set TLS config to v1.2).

---------

Co-authored-by: Mehdi Bendriss <bendrissmehdi@gmail.com>
Co-authored-by: Judit Novak <judit.novak@canonical.com>
skourta pushed a commit to canonical/opensearch-operator that referenced this issue Sep 18, 2024
When a new TLS certificate authority (CA) certificate is issued, the
opensearch-operator should add this new CA to all its units and request
new certificates. The new certificates (including the CA certificate)
should be distributed to all OpenSearch nodes in a rolling restart
manner, without downtime to the entire cluster.

Due to limitations on the self-signed-certificates operator it is not
possible to:
- get a notice if a CA certificate is about to expire
- request a new CA when the current one is about to or has expired
- request an intermediate CA and sign future certificates with it

There is currently no support for renewing a root / CA certificate on
the self-signed-certificates operator. A new root / CA certificate will
only be generated and issued if the common_name of the CA changes.

We have decided to implement the logic in that way that we check each
certificate if it includes a new CA. If so, we store the new CA and
initiate the CA rotation workflow on OpenSearch.

This PR implements the following workflow:
- check each `CertificateAvailableEvent` if it includes a new CA
- add the new CA to the truststore
- add a notice `tls_ca_renewing` to the unit's peer data
- initiate a restart of OpenSearch (using the locking mechanism to
coordinate cluster availability during the restart)
- after restarting, add a notice `tls_ca_renewed` to the unit's peer
data
- when the restart is done on all of the cluster nodes, request new TLS
certificates and apply them to the node

During the phase of renewing the CA, all incoming
`CertificateAvailableEvents` will be deferred in order to avoid
incompatibilites in communication between the nodes.

Please also see the flow of events and actions that has been documented
here:
https://github.com/canonical/opensearch-operator/wiki/TLS-CA-rotation-flow

- There is a dependency to
#367 because during
the rolling restart when the CA is rotated it is very likely that the
voting exclusion issue shows up (at least in 3-node-clusters). Therefore
the integration test is currently running only with two nodes. Once the
voting exclusions issue is resolved, this can be updated to the usual
three nodes.
- Due to an upstream bug with JDK it is necessary to use TLS v1.2 (more
details see opensearch-project/security#3299).
- This PR introduces a method to append configuration to the jvm options
file of OpenSearch (used to set TLS config to v1.2).

---------

Co-authored-by: Mehdi Bendriss <bendrissmehdi@gmail.com>
Co-authored-by: Judit Novak <judit.novak@canonical.com>
@cwperks cwperks reopened this Oct 15, 2024
@github-actions github-actions bot added the untriaged Require the attention of the repository maintainers and may need to be prioritized label Oct 15, 2024
@cwperks cwperks removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label Oct 21, 2024
@rdvansloten
Copy link

rdvansloten commented Dec 4, 2024

I'm seeing errors like this in master node logs:

[2024-06-05T01:05:39,152][INFO ][o.o.s.a.s.DebugSink      ] [opensearch-cluster-master-2] AUDIT_LOG: {
  "audit_node_id" : "lP5ZYpVDR1O9n8EDWhKe1g",
  "audit_request_layer" : "TRANSPORT",
  "audit_request_exception_stacktrace" : "javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)\n\tat java.base/sun.security.ssl.Alert.createSSLException(Alert.java:130)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)\n\tat java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316)\n\tat java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:134)\n\tat java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736)\n\tat java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691)\n\tat java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506)\n\tat java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482)\n\tat java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679)\n\tat io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:310)\n\tat io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1445)\n\tat io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)\n\tat io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)\n\tat io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)\n\tat io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: javax.crypto.BadPaddingException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)\n\tat java.base/sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1864)\n\tat java.base/sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:239)\n\tat java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:196)\n\tat java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:159)\n\tat java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)\n\t... 27 more\n",
  "@timestamp" : "2024-06-05T01:00:55.484+00:00",
  "audit_request_effective_user_is_admin" : false,
  "audit_cluster_name" : "opensearch-cluster",
  "audit_format_version" : 4,
  "audit_node_host_address" : "10.200.2.124",
  "audit_node_name" : "opensearch-cluster-master-2",
  "audit_category" : "SSL_EXCEPTION",
  "audit_request_origin" : "TRANSPORT",
  "audit_node_host_name" : "10.200.2.124"
}

Here's the expanded stack trace:

javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:130)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:134)
	at java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736)
	at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691)
	at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506)
	at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482)
	at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679)
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:310)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1445)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: javax.crypto.BadPaddingException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
	at java.base/sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1864)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:239)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:196)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:159)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
	... 27 more

I'm using container image docker.io/opensearchproject/opensearch:2.14.0@sha256:96af4ace999e20f3f74b1675e501d7dba46f2e7c185cfcffd4626898b00e6743 on linux/arm64.

I don't think this is fixed. Could someone please re-open?

Exact same issue here on 2.18.0. Seems to start occurring more frequently when I start shipping logs from fluent-bit, effectively nuking my cluster. A client decimating a server with a faulty TLS handshake seems like a super critical vulnerability to me.

@khamilton59
Copy link

I just installed Graylog and two Datanodes. I'm seeing this issue on one of the datanodes but the other works fine. Does anyone have fix that works? I've tried most of the suggestions above to resolve this but no luck.

@williamtrelawny
Copy link

Fix for me was to disable hostname verification which is unfortunate (plugins.security.ssl.transport.enforce_hostname_verification:false). Also in my case my cert CN was using a wildcard, so maybe there's a weird matching issue going on. Bc obviously my CN *.test.com won't match my actual hostname opensearch.test.com.

@khamilton59 if you're still having this issue, please post on the Graylog Community forum and we'll try to help as best we can over there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests