-
Notifications
You must be signed in to change notification settings - Fork 45
/
Copy pathsync.txt
76 lines (69 loc) · 4.6 KB
/
sync.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
This file describes the synchronization strategy used for Homa.
* In the Linux TCP/IP stack, the primary locking mechanism is a lock
per socket. However, per-socket locks aren't adequate for Homa, because
sockets are "larger" in Homa. In TCP, a socket corresponds to a single
connection between the source and destination; an application can have
hundreds or thousands of sockets open at once, so per-socket locks leave
lots of opportunities for concurrency. With Homa, a single socket can be
used for communicating with any number of peers, so there will typically
be no more than one socket per thread. As a result, a single Homa socket
must support many concurrent RPCs efficiently, and a per-socket lock would
create a bottleneck (Homa tried this approach initially).
* Thus, the primary lock used in Homa is a per-RPC spinlock. This allows operations
on different RPCs to proceed concurrently. RPC locks are actually stored in
the hash table buckets used to look them up. This is important because it
makes looking up RPCs and locking them atomic. Without this approach it
is possible that an RPC could get deleted after it was looked up but before
it was locked.
* Certain operations are not permitted while holding spinlocks, such as memory
allocation and copying data to/from user space (spinlocks disable
interrupts, so the holder must not block). RPC locks are spinlocks,
and that results in awkward code in several places to move prohibited
operations outside the locked regions. In particular, there is extra
complexity to make sure that RPCs are not garbage-collected while these
operations are occurring without a lock.
* There are several other locks in Homa besides RPC locks. When multiple
locks are held, they must always be acquired in a consistent order, in
order to prevent deadlock. For each lock, here are the other locks that
may be acquired while holding the given lock.
* RPC: socket, grantable, throttle, peer->ack_lock
* Socket: port_map.write_lock
Any lock not listed above must be a "leaf" lock: no other lock will be
acquired while holding the lock.
* Homa's approach means that socket shutdown and deletion can potentially
occur while operations are underway that hold RPC locks but not the socket
lock. This creates several potential problems:
* A socket might be deleted and its memory reclaimed while an RPC still
has access to it. Home assumes that Linux will prevent socket deletion
while the kernel call is executing. In situations outside kernel call
handling, Homa uses rcu_read_lock to prevent socket deletion.
* A socket might be shut down while there are active operations on
RPCs. For example, a new RPC creation might be underway when a socket
is shut down, which could add the new RPC after all of its RPCs
have supposedly been deleted. Handling this requires careful ordering
of operations during shutdown, plus the rest of Homa must be careful
never to add new RPCs to a socket that has been shut down.
* There are a few places where Homa needs to process RPCs on lists
associated with a socket, such as the timer. Such code must first lock
the socket (to synchronize access to the link pointers) then lock
individual RPCs on the list. However, this violates the rules for locking
order. It isn't safe to unlock the socket before locking the RPC, because
the RPC could be deleted and its memory recycled between the unlock of the
socket lock and the lock of the RPC; this could result in corruption. Homa
uses a few different ways to handle this situation:
* Use homa_protect_rpcs to prevent RPC reaping for a socket. RPCs can still
be deleted, but their memory won't go away until homa_unprotect_rpcs is
invoked. This allows the socket lock to be released before acquiring
the RPC lock; after acquiring the RPC lock check to see if it has been
deleted; if so, skip it. Note: the Linux RCU mechanism could have been
used to achieve the same effect, but it results in *very* long delays
before final reclamation (tens of ms), even without contention, which
means that a large number of dead RPCs could accumulate.
* Use spin_trylock_bh to acquire the RPC lock, while still holding the
socket lock. If this fails, then release the socket lock, then retry
both the socket lock and the RPC lock.
* There are also a few places where Homa is doing something related to an
RPC (such as copying message data to user space) and needs the RPC to stay
around, but it isn't holding the RPC lock. In this situations, Homa sets
a bit in rpc->flags and homa_rpc_reap will not reap RPCs with any of these
flags set.