Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ipv4/ipv6] Periodical log spams when hitting IPV6 endpoints and IPV6 is not supported by the VM #4473

Closed
qingling128 opened this issue Dec 16, 2021 · 6 comments

Comments

@qingling128
Copy link
Collaborator

qingling128 commented Dec 16, 2021

Bug Report

Describe the bug
logging.googleapis.com returns both IPV4 and IPV6 addresses. IPV6 is not supported out of the box on GCE VMs though. When IPV6 address is selected, Fluent Bit running on such a VM emits an error like below:

[2021/12/16 18:05:06] [error] [upstream] connection #1948 to logging.googleapis.com:443 timed out after 10 seconds

For GCE, this is only happening on Windows 2016 VMs. (More details in #4300 (comment)).

It happens in a batch when the agent first starts up to init connections, and also in a batch every hour or so (This interval can change depending on how Fluent Bit is configured) when Fluent Bit needs to refresh connections.

GOOD (versions / commits that do not have this issue)

  • 1.8.4
  • cd5b0bf8b1acf35310f0a56140287ac4395043e6 (1-8-4-with-chunkio-fix)

BAD (versions that has this issue)

  • 1.8.11

Ultimately it will retry and succeed with an IPV4 address.

Full log

[2021/12/16 18:04:52] [ info] [engine] started (pid=3588)
[2021/12/16 18:04:52] [ info] [storage] version=1.1.5, initializing...
[2021/12/16 18:04:52] [ info] [storage] root path 'C:\ProgramData\Google\Cloud Operations\Ops Agent\run\buffers'
[2021/12/16 18:04:52] [ info] [storage] normal synchronization mode, checksum enabled, max_chunks_up=128
[2021/12/16 18:04:52] [ info] [storage] backlog input plugin: storage_backlog.2
[2021/12/16 18:04:52] [ info] [cmetrics] version=0.2.2
[2021/12/16 18:04:52] [ info] [input:storage_backlog:storage_backlog.2] queue memory limit: 47.7M
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] metadata_server set to http://metadata.google.internal
[2021/12/16 18:04:52] [ warn] [output:stackdriver:stackdriver.0] client_email is not defined, using a default one
[2021/12/16 18:04:52] [ warn] [output:stackdriver:stackdriver.0] private_key is not defined, fetching it from metadata server
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] metadata_server set to http://metadata.google.internal
[2021/12/16 18:04:52] [ warn] [output:stackdriver:stackdriver.1] client_email is not defined, using a default one
[2021/12/16 18:04:52] [ warn] [output:stackdriver:stackdriver.1] private_key is not defined, fetching it from metadata server
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #0 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #1 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #2 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #3 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #4 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #5 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #6 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.0] worker #7 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #0 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #1 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #2 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #3 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #4 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #5 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #6 started
[2021/12/16 18:04:52] [ info] [output:stackdriver:stackdriver.1] worker #7 started
[2021/12/16 18:04:52] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2021/12/16 18:04:52] [ info] [sp] stream processor started
[2021/12/16 18:05:03] [error] [upstream] connection #1920 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:05:06] [error] [upstream] connection #1948 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:05:13] [error] [upstream] connection #1976 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:05:24] [error] [upstream] connection #1948 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:05:27] [error] [upstream] connection #2008 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:05:37] [error] [upstream] connection #704 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:08:27] [error] [upstream] connection #2032 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:09:55] [error] [upstream] connection #2072 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:10:30] [error] [upstream] connection #2012 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:10:40] [error] [upstream] connection #2088 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:10:40] [error] [upstream] connection #2064 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:10:46] [error] [upstream] connection #2060 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:10:49] [ warn] [engine] failed to flush chunk '3588-1639678244.904108600.flb', retry in 8 seconds: task_id=13, input=emitter_for_rewrite_tag.6 > output=stackdriver.0 (out_id=0)
[2021/12/16 18:10:49] [ warn] [engine] failed to flush chunk '3588-1639678247.156873400.flb', retry in 8 seconds: task_id=14, input=tail.1 > output=stackdriver.1 (out_id=1)
[2021/12/16 18:10:49] [ warn] [engine] failed to flush chunk '3588-1639678241.156816000.flb', retry in 9 seconds: task_id=12, input=tail.1 > output=stackdriver.1 (out_id=1)
[2021/12/16 18:16:03] [error] [upstream] connection #2104 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:16:03] [error] [upstream] connection #2076 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:16:06] [error] [upstream] connection #2040 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:16:07] [error] [upstream] connection #2128 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:16:13] [error] [upstream] connection #2112 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 18:16:24] [error] [upstream] connection #2132 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:07] [error] [upstream] connection #2148 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:14] [ warn] [engine] failed to flush chunk '3588-1639682169.999706800.flb', retry in 11 seconds: task_id=21, input=emitter_for_rewrite_tag.6 > output=stackdriver.0 (out_id=0)
[2021/12/16 19:16:14] [ warn] [engine] failed to flush chunk '3588-1639682167.920950900.flb', retry in 10 seconds: task_id=20, input=tail.1 > output=stackdriver.1 (out_id=1)
[2021/12/16 19:16:14] [ warn] [engine] failed to flush chunk '3588-1639682165.66023800.flb', retry in 10 seconds: task_id=19, input=emitter_for_rewrite_tag.6 > output=stackdriver.0 (out_id=0)
[2021/12/16 19:16:24] [error] [upstream] connection #2064 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:30] [error] [upstream] connection #2040 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:33] [error] [upstream] connection #2168 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:34] [error] [upstream] connection #2148 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:39] [error] [upstream] connection #2176 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:41] [error] [upstream] connection #2112 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:43] [error] [upstream] connection #2076 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:47] [error] [upstream] connection #2172 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:50] [error] [upstream] connection #2132 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:50] [error] [upstream] connection #2156 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:16:57] [error] [upstream] connection #2208 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 19:17:07] [error] [upstream] connection #2172 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:07:01] [error] [upstream] connection #2212 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:33] [error] [upstream] connection #2212 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:40] [error] [upstream] connection #2160 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:40] [error] [upstream] connection #2184 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:42] [error] [upstream] connection #2172 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:44] [error] [upstream] connection #2088 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:45] [error] [upstream] connection #1908 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:46] [error] [upstream] connection #2132 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:51] [error] [upstream] connection #2124 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:51] [error] [upstream] connection #2216 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:54] [error] [upstream] connection #2280 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:56] [error] [upstream] connection #2296 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:57] [error] [upstream] connection #2172 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:16:57] [error] [upstream] connection #2304 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:17:02] [error] [upstream] connection #2260 to logging.googleapis.com:443 timed out after 10 seconds
[2021/12/16 20:17:08] [error] [upstream] connection #2296 to logging.googleapis.com:443 timed out after 10 seconds
@qingling128
Copy link
Collaborator Author

The failed to flush chunk errors only happened 3 times and never came back, indicating the retries are successful, so we don't need to worry about them.

@qingling128
Copy link
Collaborator Author

qingling128 commented Dec 16, 2021

It's now [2021/12/16 18:36:24] (UTC). The last occurrence of this error is still [2021/12/16 18:16:24] [error] [upstream] connection #2132 to logging.googleapis.com:443 timed out after 10 seconds. So the spam is not as severe as it originally appeared to be.

[Update @ 2021/12/16 19:06 (UTC)] - Still no new log spam like above since [2021/12/16 18:16:24]. aka the past 30 mins are quiet. It’s probably just that during the initial booting, a bunch of connections were established in parallel. And a few unlucky ones hit IPV6.

[Update @ 2021/12/16 20:30 (UTC)] - New log spam like above appeared. They happen every hour or so, which matches the time interval at which connections are forced to be refreshed. This is expected.

Per discussion offline with @edsiper and @leonardo-albertovich , we are proposing two changes:

  • [Temp workaround] A flag to mute timeout errors, or allow users to change it to debug. At least for our case, these timeout errors do not require any user actions, as connections will be retried.
  • [Medium term] A new feature to enforce IPV4 address usage (basically ignore any IPV6 one)

@edsiper edsiper changed the title Log spams when hitting IPV6 endpoints [ipv4/ipv6] Log spams when hitting IPV6 endpoints and is not supported Dec 16, 2021
@qingling128
Copy link
Collaborator Author

Also set up some log traffic to double confirm the impact. Looks like the logs are coming through just fine.

image

@qingling128 qingling128 changed the title [ipv4/ipv6] Log spams when hitting IPV6 endpoints and is not supported [ipv4/ipv6] Initial log spams right after startup when hitting IPV6 endpoints and IPV6 is not supported by the VM Dec 16, 2021
edsiper added a commit that referenced this issue Dec 16, 2021
When a connection times out, an error message is dispatched informing
the user about the event. In some cases the user would prefer to silent
those messages by making them available only as 'debug' messages instead
of an 'error'.

This patch adds a new option to the upstream configuration:

   net.connect_timeout_log_error  (default: true)

This boolean option is enabled by default to keep the usual behavior but
the user can turn it off in the OUTPUT section of the plugin so when
a connection timeout is faced, it will be send a 'debug' message.

Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
edsiper added a commit that referenced this issue Dec 16, 2021
When a connection times out, an error message is dispatched informing
the user about the event. In some cases the user would prefer to silent
those messages by making them available only as 'debug' messages instead
of an 'error'.

This patch adds a new option to the upstream configuration:

   net.connect_timeout_log_error  (default: true)

This boolean option is enabled by default to keep the usual behavior but
the user can turn it off in the OUTPUT section of the plugin so when
a connection timeout is faced, it will be send a 'debug' message.

Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
@qingling128
Copy link
Collaborator Author

Related PR: #4474

qingling128 pushed a commit that referenced this issue Dec 16, 2021
When a connection times out, an error message is dispatched informing
the user about the event. In some cases the user would prefer to silent
those messages by making them available only as 'debug' messages instead
of an 'error'.

This patch adds a new option to the upstream configuration:

   net.connect_timeout_log_error  (default: true)

This boolean option is enabled by default to keep the usual behavior but
the user can turn it off in the OUTPUT section of the plugin so when
a connection timeout is faced, it will be send a 'debug' message.

Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
@qingling128 qingling128 changed the title [ipv4/ipv6] Initial log spams right after startup when hitting IPV6 endpoints and IPV6 is not supported by the VM [ipv4/ipv6] Periodical log spams when hitting IPV6 endpoints and IPV6 is not supported by the VM Dec 16, 2021
edsiper added a commit that referenced this issue Jan 6, 2022
When a connection times out, an error message is dispatched informing
the user about the event. In some cases the user would prefer to silent
those messages by making them available only as 'debug' messages instead
of an 'error'.

This patch adds a new option to the upstream configuration:

   net.connect_timeout_log_error  (default: true)

This boolean option is enabled by default to keep the usual behavior but
the user can turn it off in the OUTPUT section of the plugin so when
a connection timeout is faced, it will be send a 'debug' message.

Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
edsiper added a commit that referenced this issue Jan 6, 2022
When a connection times out, an error message is dispatched informing
the user about the event. In some cases the user would prefer to silent
those messages by making them available only as 'debug' messages instead
of an 'error'.

This patch adds a new option to the upstream configuration:

   net.connect_timeout_log_error  (default: true)

This boolean option is enabled by default to keep the usual behavior but
the user can turn it off in the OUTPUT section of the plugin so when
a connection timeout is faced, it will be send a 'debug' message.

Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
@qingling128
Copy link
Collaborator Author

Verified that the short-term workaround to mute those logs worked for us.

Leaving this open for the medium term solution, aka a new feature to enforce IPV4 address usage (basically ignore any IPV6 one)

0Delta pushed a commit to 0Delta/fluent-bit that referenced this issue Jan 20, 2022
When a connection times out, an error message is dispatched informing
the user about the event. In some cases the user would prefer to silent
those messages by making them available only as 'debug' messages instead
of an 'error'.

This patch adds a new option to the upstream configuration:

   net.connect_timeout_log_error  (default: true)

This boolean option is enabled by default to keep the usual behavior but
the user can turn it off in the OUTPUT section of the plugin so when
a connection timeout is faced, it will be send a 'debug' message.

Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
@lecaros
Copy link
Contributor

lecaros commented Feb 2, 2022

closing this as resolved with #4501

@lecaros lecaros closed this as completed Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants