Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

util/network-devp2p: fix possible deadlock caused by conflicting lock order #11769

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BurtonQin
Copy link
Contributor

@BurtonQin BurtonQin commented Jun 6, 2020

Similar to #11766

This PR fixes possible deadlock caused by conflicting lock order:
Basically, I found three pairs of lock that may have varying lock orders:

reserved_nodes->handlers:
762-825,1118-1122,977-978 conflict with 851-865,851-876,1095-1098.

session->handlers:
763-825 conflicts with 851-850, 851-869.

sessions->handlers:
953-959 conflicts with 851-853.

Note that write lock has a higher priority than read lock. So the following code can lead to deadlock:

Thread-A Thread-B Thread-C Thread-D
a.read() b.read()
b.write() a.write()
b.read() a.read()

B holds a.read(), C holds b.read():
a.read() in C waits for a.write() (high priority) in D, which in turn waits for a.read() in B.
Similarly, b.read() in B waits for b.write() (high priority) in A, which in turn waits for b.read() in C.

The fix is to enforce the order.

@dvdplm dvdplm requested a review from vorot93 June 8, 2020 12:06
@sorpaas sorpaas added A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust. labels Jul 29, 2020
@rakita
Copy link

rakita commented Aug 27, 2020

Hello, I didn't dive deep enough to approve this, but conflicting lock order on its own is a bad code smell. I wanted to just point out when this can happen and put it into context.

My reasoning:
1: It is already stated that for this to happens we need two RwLocks and both of them at some point need to lock both read and write and I totally agree. (And of course, those locks need to be intervened between themselves, and order of locks should be in question, and you already prove that order is conflicting but let's leave this for later)

2: Mutexes in question are: reserved_nodes, sessions, handlers

3: The approach that I used to check this is to find all write locks (do a search for some_mutex.write()) on those mutexes and see when they happen in execution.

  1. Result are:
  • handlers.write() is used only in:
    • fn message when we get NetworkIoMessage::AddHandler (This happens only once on application initialization)
  • reserved_nodes.write() is used in:
    • fn add_reserved_node is used in
      • Host::new called only once in initialization of app
      • fn add_reserved_peer and this is called from parity_addReservedPeer API
    • fn remove_reserved_node is called from
      • fn remove_reserved_peer is called from parity_removeReservedPeer API
  • session.write() is called multiple times
    • fn create_connection called from fn connect_peer and fn accept. And there is few more functions behind that, but we can say that this write is called on new peer.
    • fn deregister_stream called from fn kill_connection and fn session_writable and same as create_connection this can be called at any time.
  1. Summary:
  • handlers.write() is not called at all (except in initialiation).
  • reserved_nodes.write() is called only from API's (parity_addReservedPeer and parity_removeReservedPeer)
  • session.write() is called on peer connection/disconnection and this can be called at any time.
  1. Conclusion
    This deadlock can only happen when peer disconnects/connects and at the same time somebody calls API for reserved peers.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants