You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When containerd rotates logs it turns a log such as path/0.log to path/0.log.20240904-210753
when this occurs the collector tries to follow it, however many filelog include examples no longer match and so it is not followed.
receivers:
filelog:
include:
- /var/log/pods/*/*/*.log
- /var/log/pods/*/*/*.log.*
exclude:
# Exclude zipped logs so we don't double ship them
- /var/log/pods/*/*/*.gz
# The .tmp log only appears briefly during rotation as part of the zipping process
- /var/log/pods/*/*/*.tmp
Generate enough logs to trigger a file rotation. Likely 10MB or more.
Inspect the logs to see if it moved or more likely "lost"
Bonus: Check the attributes["log.file.path"] to see if it ever included more that `(restart count).log)
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/examples/kubernetes/otel-collector.yamlExhibits this behavior with the following changereceivers:
filelog:
include:
- /var/log/pods/*/*/*.log
- /var/log/pods/*/*/*.log.*exclude:
# Exclude zipped logs so we don't double ship them
- /var/log/pods/*/*/*.gz
- /var/log/pods/*/*/*.tmp
Component(s)
No response
What happened?
Description
When containerd rotates logs it turns a log such as
path/0.log
topath/0.log.20240904-210753
when this occurs the collector tries to follow it, however many filelog include examples no longer match and so it is not followed.
If the included is fixed however, then there are errors determining the path as the rotated timestamp is missing from https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/parser/container/parser.go#L31
When not using the container parser and instead using https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/examples/kubernetes/otel-collector.yaml The following error is found
However it can be fixed by changing
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/examples/kubernetes/otel-collector.yaml#L56
to
regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<pod_id>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log.*$'
I think container operator can be fixed by changing https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/parser/container/parser.go#L31 to
const logpathPattern = "^.*\\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\\-]+)\\/(?P<container_name>[^\\._]+)\\/(?P<restart_count>\\d+)\\.log.* $"
With both the the fixes above the successful messages as follows can be seen.
Steps to Reproduce
Start in an environment using containerd, eg GKE
Configure container-operator or older file log parsers according to documentation.
Increase telemetry
Generate enough logs to trigger a file rotation. Likely 10MB or more.
Inspect the logs to see if it moved or more likely "lost"
Bonus: Check the
attributes["log.file.path"]
to see if it ever included more that `(restart count).log)Expected Result
Actual Result
No logs are found during/after rotation.
See Also
Here is a small screen recording of the file rotation process for containerd.
https://github.com/user-attachments/assets/8b2ca45a-ffe5-41f9-a862-97a7da216442
Collector version
any
Environment information
Environment
Containerd
OpenTelemetry Collector configuration
Turning on extra logging helps to see the problem
The text was updated successfully, but these errors were encountered: