TLS error: unexpected EOF #6165

petergvizd · 2022-10-09T16:49:53Z

Bug Report

Describe the bug
Looks like there is issue during recycling multiple TLS connections (when there is only one opened connection to upstream, or no TLS is used, everything works fine), that is causing error in communication between fluent-bit and fluentd. According to captured communication, it looks like from time to time fluent-bit is sending encrypted alert number 21 (probably TLS close notify) during TLS handshake.

To Reproduce
With docker could be reproduced by:

fluent-bit.conf

[SERVICE]
  Flush        5
  Grace        5
  Daemon       Off
  Log_Level    info
  Coro_Stack_Size    24576
  HTTP_Server  On
  HTTP_Listen  0.0.0.0
  HTTP_Port    9090
  storage.path  /tmp/fluent-bit-data

[INPUT]
  Name   dummy
  Tag    dummy1.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy2.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy3.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy4.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy5.log
  Rate   10

[OUTPUT]
  Name          forward
  Match         *
  Host          fluentd
  Port          24240
  Workers  1
  tls           On
  tls.verify    Off
  tls.ca_file   /fluent-bit/tls/ca.crt
  tls.crt_file  /fluent-bit/tls/tls.crt
  tls.key_file  /fluent-bit/tls/tls.key
  Empty_Shared_Key true
  net.keepalive on
  net.keepalive_max_recycle 2
  net.dns.mode UDP
  net.dns.resolver LEGACY
  Retry_Limit  False
  storage.total_limit_size  2G

fluentd.conf

<system>
  rpc_endpoint 127.0.0.1:24444
  log_level info
  workers 1
  root_dir /tmp/buffers
</system>

<source>
  @type forward
  @id main_forward
  bind 0.0.0.0
  port 24240
  <transport tls>
    ca_path /fluentd/tls/ca.crt
    cert_path /fluentd/tls/tls.crt
    client_cert_auth true
    private_key_path /fluentd/tls/tls.key
    version TLSv1_2
  </transport>
  <security>
    self_hostname fluentd
    shared_key
  </security>
</source>
<filter **>
  @type stdout
</filter>
<match **>
  @type null
</match>

Commands:

docker network create fluent
docker run --rm --name fluentd --net fluent -v ${PWD}/fluentd:/fluentd/tls fluentd:v1.14.0-1.0 -c /fluentd/tls/fluentd.conf
docker run --rm --net fluent -v ${PWD}/fluent-bit:/fluent-bit/tls fluent/fluent-bit:1.9.7-debug /fluent-bit/bin/fluent-bit -c /fluent-bit/tls/fluent-bit.conf

Error message on fluent-bit side

[2022/10/09 15:49:35] [error] [tls] error: unexpected EOF
[2022/10/09 15:49:35] [error] [output:forward:forward.0] no upstream connections available
[2022/10/09 15:49:35] [ warn] [engine] failed to flush chunk '1-1665330570.669279879.flb', retry in 9 seconds: task_id=4, input=dummy.4 > output=forward.0 (out_id=0)

Error message on fluentd side

2022-10-09 15:49:35 +0000 [warn]: #0 [main_forward] unexpected error before accepting TLS connection by OpenSSL addr="?" host="name resolution failed" port="?" error_class=OpenSSL::SSL::SSLError error="SSL_accept returned=1 errno=104 state=error: invalid alert"

Expected behavior
Multiple TLS connections should be recycled correctly without any errors.

Screenshots
Screenshot from captured communication (fluent-bit: 172.19.0.3, fluentd: 172.19.0.2)

Your Environment
Fluent-bit version: 1.9.7
Fluent-bit OpenSSL version: 1.1.1n
Fluentd version: 1.14.0
Fluentd OpenSSL version: 1.1.1q

Additional context
We would like to have possibility to dynamically scale aggregator part (fluentd) in kubernetes environment along with usage of TLS. Typically we are sending logs from multiple containers collected by fluent-bit (opening multiple upstream connections) to aggregator and in case of aggregator scale out, we would like from fluent-bit to reload new addresses of fluentd pods. To achieve this we tried usage of net.keepalive_max_recycle, but hit issue above.

The text was updated successfully, but these errors were encountered:

moorthi07 · 2022-10-21T16:42:28Z

Ubuntu OS direct install: (not docker)
conf:
[INPUT]
Name cpu
Tag cpu

[OUTPUT]
Name opensearch
Match *
Host 192.168.64.9
Port 9200
Index mars11_index
Type mymars11_type
TLS on

Error:
/10/21 02:39:07] [error] [tls] error: unexpected EOF
/10/21 02:39:07] [error] [tls] error: unexpected EOF
/10/21 02:39:07] [ warn] [engine] chunk '3487-1666345138.721945093.flb' cannot be retried: task_id=>
/10/21 02:39:07] [ warn] [engine] chunk '3487-1666345139.665267196.flb' cannot be retried: task_id=>
/10/21 02:39:07] [error] [tls] error: unexpected EOF
/10/21 02:39:07] [ warn] [engine] failed to flush chunk '3487-1666345146.687375985.flb', retry in 1>
/10/21 02:39:08] [error] [tls] error: unexpected EOF
/10/21 02:39:08] [ warn] [engine] chunk '3487-1666345141.656286388.flb' cannot be retried: task_id=>
/10/21 02:39:08] [error] [tls] error: unexpected EOF
/10/21 02:39:08] [ warn] [engine] failed to flush chunk '3487-1666345147.724985021.flb', retry in 1>
~
~
~

leowinterde · 2022-10-25T19:57:43Z

I can reproduce this bug, output from fluentbit through fluentd by forward.

leowinterde · 2022-10-31T08:27:12Z

FluentD:

2022-10-31 07:58:06 +0000 [warn]: #0 [input-forward-metric] unexpected error before accepting TLS connection by OpenSSL addr="10.35.112.143" host="HOSTNAME1" port=61136 error_class=OpenSSL::SSL::SSLError error="SSL_accept returned=1 errno=0 state=error: invalid alert"
2022-10-31 07:58:36 +0000 [warn]: #0 [input-forward-metric] unexpected error before accepting TLS connection by OpenSSL addr="10.35.112.143" host="HOSTNAME1" port=59134 error_class=OpenSSL::SSL::SSLError error="SSL_accept returned=1 errno=0 state=error: invalid alert"
2022-10-31 08:08:36 +0000 [warn]: #0 [input-forward-metric] unexpected error before accepting TLS connection by OpenSSL addr="10.35.112.143" host="HOSTNAME1" port=56636 error_class=OpenSSL::SSL::SSLError error="SSL_accept returned=1 errno=0 state=error: unexpected message"

FluentBit:

Oct 31 07:58:36 HOSTNAME1 fluent-bit[826]: [2022/10/31 07:58:36] [error] [output:forward:forward.1] no upstream connections available
Oct 31 07:58:36 HOSTNAME1 fluent-bit[826]: [2022/10/31 07:58:36] [ warn] [engine] failed to flush chunk '826-1667203111.514059781.flb', retry in 8 seconds: task_id=1, input=disk.1 > output=forward.1 (out_id=1)
Oct 31 07:58:44 HOSTNAME1 fluent-bit[826]: [2022/10/31 07:58:44] [ info] [engine] flush chunk '826-1667203111.514059781.flb' succeeded at retry 1: task_id=1, input=disk.1 > output=forward.1 (out_id=1)
Oct 31 08:00:06 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:00:06] [error] [tls] connection #43 SSL_connect: error in error
Oct 31 08:00:06 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:00:06] [error] [tls] error: unexpected EOF
Oct 31 08:00:06 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:00:06] [error] [output:forward:forward.1] no upstream connections available
Oct 31 08:00:06 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:00:06] [ warn] [engine] failed to flush chunk '826-1667203201.514215553.flb', retry in 7 seconds: task_id=1, input=disk.1 > output=forward.1 (out_id=1)
Oct 31 08:00:13 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:00:13] [ info] [engine] flush chunk '826-1667203201.514215553.flb' succeeded at retry 1: task_id=1, input=disk.1 > output=forward.1 (out_id=1)
Oct 31 08:08:36 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:08:36] [error] [tls] connection #43 SSL_connect: error in error
Oct 31 08:08:36 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:08:36] [error] [tls] error: unexpected EOF
Oct 31 08:08:36 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:08:36] [error] [output:forward:forward.1] no upstream connections available
Oct 31 08:08:36 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:08:36] [ warn] [engine] failed to flush chunk '826-1667203711.514047393.flb', retry in 9 seconds: task_id=1, input=disk.1 > output=forward.1 (out_id=1)
Oct 31 08:08:45 HOSTNAME1 fluent-bit[826]: [2022/10/31 08:08:45] [ info] [engine] flush chunk '826-1667203711.514047393.flb' succeeded at retry 1: task_id=1, input=disk.1 > output=forward.1 (out_id=1)

leowinterde · 2022-10-31T13:18:31Z

The bug ist gone or fixed for me with FluentBit Version 2.0.3

leonardo-albertovich · 2022-10-31T13:52:02Z

Is it or isn't it gone @leowinterde? I worked in that layer recently so I could take a look if it's still a problem.

petergvizd · 2022-11-03T08:14:58Z

I can see still the issue, even in version 2.0.3

tirelibirefe · 2022-11-08T12:06:51Z

yes things look like working but I see this error if the output is ES.

petergvizd · 2022-12-15T09:18:19Z

Looks like issue is finally fixed in version 2.0.6, so closing the issue

moorthi07 · 2022-12-15T15:45:20Z

what was the issue @petergvizd

salacr · 2023-03-06T15:43:51Z

I have fluentbit 2.0.9 and the issue is still present:

[2023/03/06 15:41:31] [error] [tls] error: unexpected EOF
[2023/03/06 15:41:31] [ warn] [engine] failed to flush chunk '1-1678117291.286914214.flb', retry in 11 seconds: task_id=0, input=syslog.0 > output=es.1 (out_id=1)
[2023/03/06 15:41:42] [error] [tls] error: unexpected EOF
[2023/03/06 15:41:42] [error] [engine] chunk '1-1678117291.286914214.flb' cannot be retried: task_id=0, input=syslog.0 > output=es.1

[OUTPUT]
Name es
Match be.php.monolog
Host ${OPENSEARCH_HOST}
Port 9200
Logstash_Format On
Logstash_Prefix fluentbit
Logstash_DateFormat %Y.%m.%d
Time_Key_Format %Y-%m-%dT%H:%M:%S.%L
Generate_ID On
HTTP_User ${OPENSEARCH_USER}
HTTP_Passwd ${OPENSEARCH_PASSWORD}
tls On
tls.verify Off

leonardo-albertovich · 2023-03-06T16:03:56Z

Would you be able to share some information about your setup? If you prefer to do it in private you can message me in slack.

I'd like to know which operating system version you are running fluent-bit in and the same about the opensearch server.

You can probably get enough information from the opensearch server running this command curl -vv https://opensarch_host_domain_or_address:9200 and copying the lines that start with an asterisk up to the line that says > GET.

As for the fluent-bit host I'd like to know which operating system (distribution and version if linux) or container image you are running so I can determine if there is an issue with the openssl version.

Please don't hesitate to include as much information as possible and I think if this actually persists for you then it might be appropriate to open a new issue so it can be properly tracked.

salacr · 2023-03-06T16:28:31Z

Hi,

We are in containers fluentbit is fluent/fluent-bit:2.0.9
OpenSearch is opensearchproject/opensearch:2.5.0

I'm using self-signed certificates which are generated inside the the OpenSearch container. and I hoped that tls.verify Off will be sufficient to overcome the limitations of the fact that it's self-signed.

I also tried to change the plugin from es to OpenSearch but there is no difference. I might open a ticket but I'm not sure if it's really a bug as I'm using a self-signed certs and I'm not providing any of the:

tls.ca_file
tls.crt_file
tls.key_file

leonardo-albertovich · 2023-03-06T17:38:27Z

You don't need to provide any of those tls settings fluent-bit when acting as a client and since you disabled tls.verify it should be fine, I think you should create a new issue and if you do I'd urge you to add a detailed reproduction procedure to simplify the process.

salacr · 2023-03-06T19:21:43Z

Ok it's a little embarrassing but when I started working on repro steps / setting up minimal containers where the bug will be reproducible I found that in that "test" environment it works... so the bug must be somewhere in my configuration.
I will let you know when I will find what's the real root cause. any way thanks for your help!

leonardo-albertovich · 2023-03-08T10:35:45Z

It's ok, thank you for letting us know, any information is valuable information, even a failure to reproduce the issue, keep it up and don't hesitate to ask for help!

charan1135 · 2023-03-14T15:20:12Z

@salacr were you able to sort out the issue? I get similar error when I am trying to flush records from fluentbit ocp container to logstash port. I also have v2.0.9. Below is my config and error:
Error:
[2023/03/14 14:50:54] [error] [upstream] connection #82 to tcp://.... timed out after 10 seconds (connection timeout)
[2023/03/14 14:50:54] [error] [output:http:http.1] no upstream connections available to ##
[2023/03/14 14:50:54] [ warn] [engine] failed to flush chunk '1-1678805443.598972736.flb', retry in 6 seconds: task_id=0, input=tail.0 > output=http.1 (out_id=1)
[2023/03/14 14:51:00] [error] [tls] error: unexpected EOF

Config:
[SERVICE]
Flush 1
Log_Level info
Daemon off
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Tag example.*
Path $logpath
Skip_Long_Lines On
Refresh_Interval 10
Inotify_Watcher true
read_from_head false
[OUTPUT]
Name stdout
Match example.*
Format json_lines
json_date_key time
json_date_format iso8601
[OUTPUT]
Name http
Match *
Host $logstash.host
Port $logstash.port
Format json
tls on
tls.verify off
tls.ca_file /usr/share/fluentbit/certs/ca.crt
tls.crt_file /usr/share/fluentbit/certs/crt.cer
tls.key_file /usr/share/fluentbit/certs/file.key

Let me know if you or someone has any thoughts. Thanks

salacr · 2023-03-14T15:28:04Z

Nope I started configuring everything from scratch and it "just works" now. Don't know where was a problem :/ (Actually it might be a problem with Opensearch as it's quite unfriendly in a term of docker provisioning so there might be some bugs in my previous instalation )

ccampo133 · 2023-03-23T15:26:17Z

Ran into this earlier running Fluent Bit (statically linked) on an Alpine docker image. Turns out I needed to install the ca-certificates package (apk add ca-certificates). Probably similar for other distros if this (or a similar) package is not installed on the system.

For anybody in this thread, just a warning that setting TLS.verify Off should not be considered a solution. That is not really any more secure than not using TLS altogether.

dss010101 · 2023-05-13T00:23:05Z

im seeing this error as well. i was using this docker-compose,
https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/fluent-bit.conf

but needed to turn 'tls on' for OpenSearch to accept a fluent-bit communication...but now i see the error this issue talks about.

in addition OpenSearch logs this exceptions later - not sure if the two are related:

opensearch | [2023-05-13T00:42:04,313][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [23cfaf6da342] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

mansi1597 · 2023-12-29T07:49:28Z

I faced the same issue recently. My fluent bit pods were running behind a kubernetes load balancer which was sending health probes. These health probes were causing the "[error] [tls] error: unexpected EOF" errors. To fix this, I modified the externalTrafficPolicy to Local and updated the healthCheckNodePort. This ensures Kubernetes LB send the health probes on a separate port. Refer this for configuration: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip

mlathara · 2024-05-09T15:03:38Z

@dss010101 I'm running into the same issue as you with both fluentbit and Opensearch. As you probably saw, the opensearch issue may be a JDK problem that seems to not be resolved opensearch-project/security#3299

As a workaround, I tried bumping TLS version down to v1.2, but then Opensearch complains about expired certificate when fluentbit attempts TLS handshake. What's weird is that fluentbit certs aren't expired, and Opensearch certs weren't expired either (but were recently renewed). Also, Opensearch dashboards is able to communicate with the OS cluster -- so there must be something specific to the handshake fluentbit and OS are attempting.

All that said, did you ever figure this out @dss010101?

mlathara · 2024-05-09T17:16:34Z

Closed issue and all, but I'll update for future users -- for me, this issue goes away once we moved away from self signed certs. For some reason, renewal with self signed certs broke stuff, but if we move to LetsEncrypt certificates fluentbit is able to talk to OS again.

dss010101 · 2024-05-10T05:25:27Z

@dss010101 I'm running into the same issue as you with both fluentbit and Opensearch. As you probably saw, the opensearch issue may be a JDK problem that seems to not be resolved opensearch-project/security#3299

As a workaround, I tried bumping TLS version down to v1.2, but then Opensearch complains about expired certificate when fluentbit attempts TLS handshake. What's weird is that fluentbit certs aren't expired, and Opensearch certs weren't expired either (but were recently renewed). Also, Opensearch dashboards is able to communicate with the OS cluster -- so there must be something specific to the handshake fluentbit and OS are attempting.

All that said, did you ever figure this out @dss010101?

unfortunately, no and due to lack of engagement on the issue, decided to go with other libraries. im glad u figured it out though.

hkhelif · 2024-12-03T10:28:57Z

It seems this issue persists even when running Fluent Bit version 3.1.0. In my case, it occurred during a cluster update and continued afterward. Downgrading to 3.1.10 from 3.2.0 also resolved the problem, though some instances in the fleet running the same version (3.1.10) still experienced the error. Sharing this here in case it helps others troubleshoot similar scenarios.

petergvizd added the status: waiting-for-triage label Oct 9, 2022

petergvizd closed this as completed Dec 15, 2022

aweimeow mentioned this issue Jul 12, 2023

ca_file is not working with self-signed certificate #7686

Open

niedbalski reopened this Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLS error: unexpected EOF #6165

TLS error: unexpected EOF #6165

petergvizd commented Oct 9, 2022 •

edited

Loading

moorthi07 commented Oct 21, 2022

leowinterde commented Oct 25, 2022

leowinterde commented Oct 31, 2022

leowinterde commented Oct 31, 2022

leonardo-albertovich commented Oct 31, 2022 •

edited

Loading

petergvizd commented Nov 3, 2022

tirelibirefe commented Nov 8, 2022

petergvizd commented Dec 15, 2022 •

edited

Loading

moorthi07 commented Dec 15, 2022

salacr commented Mar 6, 2023

leonardo-albertovich commented Mar 6, 2023

salacr commented Mar 6, 2023

leonardo-albertovich commented Mar 6, 2023

salacr commented Mar 6, 2023

leonardo-albertovich commented Mar 8, 2023

charan1135 commented Mar 14, 2023 •

edited

Loading

salacr commented Mar 14, 2023

ccampo133 commented Mar 23, 2023 •

edited

Loading

dss010101 commented May 13, 2023 •

edited

Loading

mansi1597 commented Dec 29, 2023

mlathara commented May 9, 2024

mlathara commented May 9, 2024

dss010101 commented May 10, 2024

hkhelif commented Dec 3, 2024

TLS error: unexpected EOF #6165

TLS error: unexpected EOF #6165

Comments

petergvizd commented Oct 9, 2022 • edited Loading

Bug Report

moorthi07 commented Oct 21, 2022

leowinterde commented Oct 25, 2022

leowinterde commented Oct 31, 2022

leowinterde commented Oct 31, 2022

leonardo-albertovich commented Oct 31, 2022 • edited Loading

petergvizd commented Nov 3, 2022

tirelibirefe commented Nov 8, 2022

petergvizd commented Dec 15, 2022 • edited Loading

moorthi07 commented Dec 15, 2022

salacr commented Mar 6, 2023

leonardo-albertovich commented Mar 6, 2023

salacr commented Mar 6, 2023

leonardo-albertovich commented Mar 6, 2023

salacr commented Mar 6, 2023

leonardo-albertovich commented Mar 8, 2023

charan1135 commented Mar 14, 2023 • edited Loading

salacr commented Mar 14, 2023

ccampo133 commented Mar 23, 2023 • edited Loading

dss010101 commented May 13, 2023 • edited Loading

mansi1597 commented Dec 29, 2023

mlathara commented May 9, 2024

mlathara commented May 9, 2024

dss010101 commented May 10, 2024

hkhelif commented Dec 3, 2024

petergvizd commented Oct 9, 2022 •

edited

Loading

leonardo-albertovich commented Oct 31, 2022 •

edited

Loading

petergvizd commented Dec 15, 2022 •

edited

Loading

charan1135 commented Mar 14, 2023 •

edited

Loading

ccampo133 commented Mar 23, 2023 •

edited

Loading

dss010101 commented May 13, 2023 •

edited

Loading