Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper] fix deadlock when initializing persistent queue #7400

Merged

Conversation

swiatekm
Copy link
Contributor

Description:
Fixing a potential deadlock in persistent queue initialization.

The queue maintains a channel of dummy items whose size should be equal to the queue size at all times. This makes it easier for consumers to wait when the queue is empty. During initialization, we simply add a dummy item to the channel for each queue item.

However, we also attempt to requeue items which were previously dispatched, but not sent, before this step, which actually does add dummy items to the channel as well. As a result, we could have more dummy items in the channel than there are actual items in the storage. This is normally not a big problem, as the queue will quietly discard the extraneous dummy items when it's empty.

It is, however, a problem, if this causes the dummy item channel to go over capacity during initialization. If number_of_dispatched_items + queue_size > queue_capacity, then we try to put more dummy items in the channel than we have capacity, resulting in a deadlock.

The reason this problem doesn't appear in practice very often, is that if the queue is full, the dispatched items are simply discarded. I have seen it happen in combination with other storage-related problems, where it muddies the waters and makes troubleshooting more difficult.

Testing: Added a test that deadlocks without the change.

@swiatekm swiatekm changed the title fix deadlock when initializing persistent queue [exporterhelper] fix deadlock when initializing persistent queue Mar 21, 2023
@swiatekm swiatekm force-pushed the fix/persistentstorage/deadlock branch from f64143b to a963201 Compare March 21, 2023 13:10
@codecov
Copy link

codecov bot commented Mar 21, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 ⚠️

Comparison is base (2399471) 91.05% compared to head (209d3c9) 91.04%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7400      +/-   ##
==========================================
- Coverage   91.05%   91.04%   -0.01%     
==========================================
  Files         292      292              
  Lines       14234    14233       -1     
==========================================
- Hits        12961    12959       -2     
- Misses       1006     1007       +1     
  Partials      267      267              
Impacted Files Coverage Δ
...rter/exporterhelper/internal/persistent_storage.go 87.86% <100.00%> (-0.47%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@swiatekm swiatekm marked this pull request as ready for review March 21, 2023 13:27
@swiatekm swiatekm requested review from a team and bogdandrutu March 21, 2023 13:27
@swiatekm swiatekm force-pushed the fix/persistentstorage/deadlock branch from a963201 to 209d3c9 Compare March 21, 2023 13:49
@bogdandrutu bogdandrutu merged commit 53881c9 into open-telemetry:main Mar 21, 2023
@swiatekm swiatekm deleted the fix/persistentstorage/deadlock branch March 21, 2023 14:58
@codeboten codeboten added this to the v0.75.0 milestone Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants