Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition(s) while unloading/loading tracks #2305

Merged
merged 5 commits into from
Sep 30, 2019
Merged

Fix race condition(s) while unloading/loading tracks #2305

merged 5 commits into from
Sep 30, 2019

Conversation

uklotzde
Copy link
Contributor

@uklotzde uklotzde commented Sep 29, 2019

I'm not able to reproduce the bug. Awaiting confirmation that this PR fixes the bug!

https://bugs.launchpad.net/mixxx/+bug/1845695

Not a single, but multiple potential race conditions and wrong ordering of responses while unloading and loading tracks!

Pending read requests are discarded when unloading a track. But their response arrived when the new track had already been loaded. This would reset the readable frame index range to empty and the new track plays with silence.

Also, the FIFO between the worker and the reader needs to be drained entirely when unloading a track. No new read requests must be accepted until the new track has been loaded.

I've added many debug assertions to detect any inconsistencies and wrong assumptions.

Pending read requests are discarded when unloading a track. But their
response might arrive when the new track has already been loaded.
This would reset the readable frame index range to empty and the
new track plays with silence.

Also, the FIFO between the worker and the reader needs to be drained
entirely when unloading a track. No new read requests must be accepted
until the new track has been loaded.
@uklotzde
Copy link
Contributor Author

Naming is still a big mess. But I have extracted the state of the reader into its own enum class and thereby simplified the CachingReader::process() function. Debug assertions now verify that no discarded chunks are processed. According to my reasoning, this state machine should be safe now.

The main cause for the bug was the invalid ordering of updates by the worker thread, i.e. first inserting the track loaded message, followed by invalid, discarded chunks from pending read requests for the previous track.

@uklotzde uklotzde changed the title [WiP] Fix race condition(s) while unloading/loading tracks Fix race condition(s) while unloading/loading tracks Sep 29, 2019
Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for wrapping your brain around it. I have left some comments.

// Don't accept any new read requests until the current
// track has been unloaded and the new track has been
// loaded!
m_state = State::TrackLoading;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set this bit before touching the worker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed. The worker thread is not intercepting the reader. The reader polls for updates. But if it helps to understand what is happening I will reorder the instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...it is even desired to inform the worker asap. A new comment explains why.

// All chunks have been freed before loading the next track!
DEBUG_ASSERT(!m_mruCachingReaderChunk);
DEBUG_ASSERT(!m_lruCachingReaderChunk);
// Discard all pending read requests for the previous track
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Discard all results from pending read requests

m_readableFrameIndexRange = intersect(
m_readableFrameIndexRange,
update.readableFrameIndexRange());
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we hit always the else branch? We need to receive status update without a chunk.
I think it would be better to have that decoupled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is caused by the coupled design. But I won't change that. The debug assertions clearly indicate that there are two kinds of updates:

  • Read result with a chunk
  • State change track loaded/unloaded without a chunk

update.init(TRACK_NOT_LOADED);
// Discard all pending read requests
CachingReaderChunkReadRequest request;
while (m_pChunkReadRequestFIFO->read(&request, 1) == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the fifo multi consumer aware? I think we need to move this into the run() method and the if (m_newTrackAvailable) { branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No and it doesn't have to be. loadTrack() is private and only invoked from run() within the same thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I misread and took it for newTrack()

@uklotzde
Copy link
Contributor Author

I have only added or modified comments, no code changes needed IMHO.

@daschuer
Copy link
Member

LGTM, thanks.

@daschuer daschuer merged commit cd871db into mixxxdj:2.2 Sep 30, 2019
@daschuer
Copy link
Member

I think this PR rectifies a point release NOW.
Thoughts?

@uklotzde
Copy link
Contributor Author

I don't think so. It has probably introduced after 2.2.2, I need to check.

@uklotzde uklotzde deleted the 2.2_cachingreader_reload_tracks branch September 30, 2019 07:03
@Be-ing
Copy link
Contributor

Be-ing commented Sep 30, 2019

Fortunately we have not done a bugfix release since the regression was introduced in #2265. The bug was only reported by people using master.

#2253 is now ready for review; I'd like to get that in 2.2.3.

Thank you for the speedy fix @uklotzde.

@uklotzde
Copy link
Contributor Author

I agree, let's first finish the 2 outstanding PRs.

@uklotzde
Copy link
Contributor Author

I got a debug asserting in CachingReader caused by the latest commit...

@uklotzde
Copy link
Contributor Author

OMG.

I didn't notice the enforced direct connections between worker and reader that both live on separate threads. These are wrong!! Or am I missing something???

Moreover, the reader should be responsible for emitting signals, but not the worker. Otherwise, the state of the reader and the emitted signals are inconsistent due to race conditions.

I should quit and stop working on this evil and cursed code, it doesn't get any better.

}

void CachingReader::process() {
ReaderStatusUpdate update;
while (m_readerStatusFIFO.read(&update, 1) == 1) {
while (m_stateFIFO.read(&update, 1) == 1) {
DEBUG_ASSERT(m_state != State::Idle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one always fails after ejecting a track and loading a new one.
This is normal for every update.status == TRACK_LOADED message.
So it can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • After TRACK_LOADED the state becomes State::TrackLoaded
  • After TRACK_UNLOADED the state becomes State::Idle and no new updates are expected.

While the reader is idle no update messages are expected!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug assertion is there for a reason. Idle means don't bother me with update messages or something is seriously wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be true after your recent changes in the other PR, but this assumption is here wrong.
Try it out. I am still wrapping my head around you new code but I want first understand the original issue.

Was this your assertion as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants