Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deadlocks in the address book mutex #3244

Merged
merged 15 commits into from
Dec 20, 2021
Merged

Avoid deadlocks in the address book mutex #3244

merged 15 commits into from
Dec 20, 2021

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Dec 16, 2021

Motivation

zebra-network keeps on having hard-to-diagnose hangs. These hangs could be happening because the AddressBook uses a threaded mutex, which blocks the executor, and all async tasks scheduled on the same thread.

Solution

  • Spawn the address book updater on a blocking thread
  • Spawn CandidateSet address book operations on blocking threads
  • Replace the PeerSet address book with a metrics watch channel

Closes #3240

Review

@dconnolly can review this PR.

This is urgent because syncs are very slow, and #3191 or #3244 should help fix that.

It's based on PR #3191.

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

@teor2345 teor2345 added P-Medium C-security Category: Security issues I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness A-network Area: Network protocol updates or fixes labels Dec 16, 2021
@teor2345 teor2345 requested a review from dconnolly December 16, 2021 08:13
@teor2345 teor2345 self-assigned this Dec 16, 2021
@teor2345 teor2345 changed the title Address book mutex Avoid deadlocks in the address book mutex Dec 16, 2021
Copy link
Contributor Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to make a few fixes to this, mainly to await spawned tasks

@teor2345
Copy link
Contributor Author

I've marked this PR and all its dependencies as a high priority, because this is a known bug that has actually happened before, and seems to still be happening.

@mpguerra
Copy link
Contributor

@teor2345 should this still be in draft and is it ready to be reviewed?

@conradoplg conradoplg force-pushed the crawler-timing-tweaks branch from 94b6f52 to 4f13056 Compare December 17, 2021 16:33
@conradoplg
Copy link
Collaborator

Rebased after the base #3191 was rebased 😅
Will need another rebase after that one is merged.

@teor2345
Copy link
Contributor Author

@teor2345 should this still be in draft and is it ready to be reviewed?

@mpguerra yes it needs to be in draft, and yes it is ready to be reviewed.

See my detailed answer here: #3191 (comment)

Base automatically changed from crawler-timing-tweaks to main December 19, 2021 23:02
@teor2345 teor2345 marked this pull request as ready for review December 19, 2021 23:03
@oxarbitrage oxarbitrage self-requested a review December 19, 2021 23:15
oxarbitrage
oxarbitrage previously approved these changes Dec 20, 2021
Copy link
Contributor

@oxarbitrage oxarbitrage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, i made a few comments but nothing really blocking imho.

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
@teor2345 teor2345 requested a review from oxarbitrage December 20, 2021 00:19
@teor2345 teor2345 enabled auto-merge (squash) December 20, 2021 00:32
@teor2345 teor2345 merged commit d0e6de8 into main Dec 20, 2021
@teor2345 teor2345 deleted the address-book-mutex branch December 20, 2021 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes C-security Category: Security issues I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run async tasks that access AddressBook threaded mutexes on dedicated threads
4 participants