-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
otelcol-contrib file_storage does not recover gracefully upon potentially corrupted database #35899
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@djaglowski looks like you are the codeowner, is this something you can help with? |
Component(s)
extension/storage/filestorage
What happened?
Description
When there are issues with underlying a linux guest storage, mostly in scenarios where the hypervisor loses connectivity or experiences extremely high latency with the NAS/SAN, the database can be in a corrupted state, which requires completely deleting the filesstorage queue, to allow otel to function (or even start running) again.
Steps to Reproduce
While a VM is running, try to remove it's storage beneath it, potentially in the middle of otel writing to the bolt filestorage.
This may also be caused by machines being improperly shut down (like power loss.)
Expected Result
Some data will be lost, but the filestorage extension will recover gracefully and be able to resume running.
Actual Result
The otelcol service attempts to start, but fails to do so when loading the filestorage extension. This requires manual intervention to delete the specific filestorage file that was corrupted, and then to start the service.
Collector version
0.106.1
Environment information
Environment
OS: Ubuntu 22.04
Compiler(if manually compiled): N/A, using https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.106.1/otelcol-contrib_0.106.1_linux_amd64.deb
OpenTelemetry Collector configuration
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: