batch outgoing RRs better #4730

richvdh · 2019-02-24T23:42:03Z

I made a heinous bodge which delays all outgoing EDUs (including typing notifs) by 5 seconds. We need to do somethig better

In order to reduce the number of outgoing federation transactions, we want to aggregate read-receipts, since there is little harm if outgoing RRs are delayed by a few seconds, and we can dramatically reduce the number of transactions by doing so. This change introduces the concept of EDU 'buckets'; currently we have the 'Instant' bucket and the 'Delayed' bucket. Fixes #4730, #3951.

richvdh · 2019-03-06T16:03:37Z

I've been doing some analysis on RRs to determine how we can best handle them. The following analysis is based on a period of a few hours today, which is a fairly average time of day, but still with enough RR peaks to let the federation sender get a couple of minutes behind at times.

Firstly: the number of RR EDUs increases as O(N^2) on the number of members of the room (number of people reading messages on matrix.org and hence sending RRs increases linearly; number of servers in the room which we have to send each RR to also increases linearly.) That means that the vast majority of RR EDUs are attributable to a minority of rooms. In the sample period, 36% of RR EDUs were for Matrix HQ, another 20% were for 'Synapse admins', and 80% of all RR EDUs were for 11 rooms.

Some datapoints about RR age: 16% arrive within 1 second of their corresponding event, and 37% within 5 seconds.

RR EDUs tend to be somewhat older on average: only 7% happen within 1 second of their event, and 28% within 5 seconds:

Presumably this means that RRs happen later in large rooms like Matrix HQ. Probably makes sense: people are more likely to read messages in one-to-one rooms immediately. It does suggest that we might want to consider using a different bucketing scheme for larger rooms.

In the sample period, there were 399 RRs for Matrix HQ, each of which would have been sent to around 667 servers. (This will include "offline" servers, and in practice some EDUs will be grouped into transactions if the remote server did not respond immediately).

richvdh · 2019-03-06T16:26:13Z

Some analysis on different bucketing techniques as they would affect Matrix HQ:

No bucketing: 399 transmissions
Delay each RR by a fixed period to see if any others turn up:
- 1 second: 330 transmissions
- 5 seconds: 244 transmissions (nb this is what I proposed in Collect outgoing read-receipts into buckets #4777).
- 30 seconds: 135 transmissions
Delay each RR by a factor of the event's age, with a cap of 30 seconds:
- age*0.5 (ie, if the RR is 10 seconds after the event, wait 5 seconds to see if there are any more RRs): 228 transmissions
- age*1: 191 transmissions
Fixed delays of 0, 5 seconds and 30 seconds, for RRs received <5s, 5-30s, >30s after their events: 265 transmissions.

Any other suggestions?

erikjohnston · 2019-03-06T17:03:04Z

Basing it on a factor of the age feels like a good one, and half feels about right. I might be tempted to squish the low age RR's a bit more so that things feel a bit more instant (e.g. (x - 1 + 1/(x + 1))/2)

ara4n · 2019-03-06T18:01:24Z

399->244 and 399->228 seem to be relatively modest improvements, although I much prefer the idea of only aggressively batching RRs for older events.

However, I'm finding it hard to visualise the circumstances when RRs for older events even happen. Are they are ones generated by clients as they catch up on scrollback? I'm a bit confused as to why we'd be sending these, as all clients I know of load the room at the bottom of the page, where they'd send an RR for the most recent event... and so RRs for historical events are irrelevant (other than for updating the RM, which is specific to the local user).

richvdh · 2019-03-06T18:05:58Z

erik's suggestion of (x - 1s + 1/(x + 1s))/2, with a cap of 30s: 245 transmissions.

richvdh · 2019-03-06T18:09:14Z

However, I'm finding it hard to visualise the circumstances when RRs for older events even happen. Are they are ones generated by clients as they catch up on scrollback? I'm a bit confused as to why we'd be sending these, as all clients I know of load the room at the bottom of the page, where they'd send an RR for the most recent event... and so RRs for historical events are irrelevant (other than for updating the RM, which is specific to the local user).

Unless somebody happens to be watching the current timeline of a given room at the point that an event is sent, there will be a delay before they open the app, switch to the room, etc.

ara4n · 2019-03-06T18:21:20Z

oh, of course. so the "old event" could still be the most recent one in the room... it's just that we don't rapidly show RRs as people catch up on the message if that message happens to be old? I'm a bit worried that this could still feel weird (especially if the message is an hour old, and it takes 30 minutes for the RR to be sent?!), given from a user POV the expectation people are used to is that idle users coming back online should immediately send an RR as they catch up (especially given we lack presence).

richvdh · 2019-03-06T18:33:53Z

it's just that we don't rapidly show RRs as people catch up on the message if that message happens to be old?

I don't quite understand the question. I think you're describing my proposed behaviour.

(especially if the message is an hour old, and it takes 30 minutes for the RR to be sent?!)

This is why I'm suggesting capping at 30 seconds.

ara4n · 2019-03-06T18:42:09Z

I don't quite understand the question.

I was repeating my understanding back to check i had it right, which it seems like i do.

Would an alternative approach be to simply throttle the rate at which we send RRs in a room? So if things are slow (e.g. idle users returning and catching up on scrollback), the RRs come in rapidly, acting as a nice proxy for presence. But if a conversation is in full swing and RRs are flying around everywhere, we do something like "don't send RRs more rapidly than 1 batch every N seconds, per room"? This would then give us "only throttle in a crisis" behaviour. Or is that what the original PR did? :S

erikjohnston · 2019-03-06T18:57:46Z

erik's suggestion of (x - 1s + 1/(x + 1s))/2, with a cap of 30s: 245 transmissions.

Ooh, nice. FTR that basically shakes out to adding a 50% delay for older events (e.g. if its taken you 15s to send an RR then we delay it by ~7s), but much less for recent events (e.g. if we send an RR within 2s then we'll only delay by ~600ms and 1s delayed by ~250ms).

ara4n · 2019-03-06T18:59:33Z

presumably if we are delaying by 7s, in practice this means that it'll get sent sooner anyway if a less-delayed RR needs to get sent in the interim?

erikjohnston · 2019-03-06T19:00:44Z

presumably if we are delaying by 7s, in practice this means that it'll get sent sooner anyway if a less-delayed RR needs to get sent in the interim?

Yup, or something else needs to get sent like a typing notification, events, etc.

richvdh · 2019-03-06T19:10:19Z

Or is that what the original PR did? :S

It's not quite, because (a) the throttling in #4777 was independent of rooms; (b) it doesn't send anything out immediately.

So: Matthew's suggestion of throttling RRs per room to one batch every N seconds: 190 transmissions with N=30.

ara4n · 2019-03-06T19:10:53Z

i was thinking a much lower value of N ftr - e.g. 3-5s

richvdh · 2019-03-06T19:10:58Z

erik's suggestion of (x - 1s + 1/(x + 1s))/2, with a cap of 30s: 245 transmissions.

Ooh, nice.

well, not that nice imho. 245 is a bit of a marginal reduction.

erikjohnston · 2019-03-06T19:12:26Z

Or is that what the original PR did? :S

It's not quite, because (a) the throttling in #4777 was independent of rooms; (b) it doesn't send anything out immediately.

So: Matthew's suggestion of throttling RRs per room to one batch every N seconds: 190 transmissions with N=30.

Doesn't that mean a second receipt sent in a room from a given server will potentially be delayed by up to 30s?

richvdh · 2019-03-06T19:12:29Z

i was thinking a much lower value of N ftr - e.g. 3-5s

At N=3: 339 transmissions
At N=5: 313 transmissions

richvdh · 2019-03-06T19:12:49Z

Doesn't that mean a second receipt sent in a room from a given server will potentially be delayed by up to 30s?

well, yes.

richvdh · 2019-03-06T19:16:57Z

At N=3: 339 transmissions
At N=5: 313 transmissions

The point being that currently it's unusual for two RRs to land within 3s of each other, so this doesn't change much.

ara4n · 2019-03-06T19:51:32Z

hum. what are we even trying to optimise for here? i thought the problem was that CPU was spiking/saturating when there's a lot of traffic, and we end up sending RRs too frequently because we don't batch them. Are we trying to reduce the total number of RRs, or just smooth things out to not muller the CPU? In which case perhaps 399->339 is okay, as long as they're spread out nicely no more rapidly than every 3s?

erikjohnston · 2019-03-06T19:55:33Z

We're trying to reduce the frequency of transactions, hence the idea of waiting a bit to see if any more RRs come in before sending the transaction.

as long as they're spread out nicely no more rapidly than every 3s?

The problem is that for large rooms we need to send a transaction to each host in the room, so every 3s could easily mean 100Hz of outbound transactions.

ara4n · 2019-03-06T20:49:28Z

okay, could we pick the "send RRs no more rapidly than every N seconds" such that N ensures less than X Hz of outbound transactions? So small rooms don't get crippled, but big ones do?

richvdh · 2019-03-06T21:14:05Z

In which case perhaps 399->339 is okay, as long as they're spread out nicely no more rapidly than every 3s?

Having thought about this a bit harder: yes you're right that this does make a difference, because although the reduction is only 20%, it happens at the most beneficial time. However, as erik says, it's not enough, because empirically we top out around 150Hz of transactions and one transaction every 3 seconds to each of 600 servers is 200Hz.

okay, could we pick the "send RRs no more rapidly than every N seconds" such that N ensures less than X Hz of outbound transactions? So small rooms don't get crippled, but big ones do?

yes, this definitely sounds like a possible strategy. Let's say you're aiming for 100Hz of transactions: when you get a RR in HQ, you realise it needs to go to 667 servers, so you say that you won't forward any more RRs for 667/100Hz = 6.67s, and any RRs which arrive in the meantime get queued up.

We might need to be careful about what happens if there are two rooms with activity at once, but this is currently rare.

erikjohnston · 2019-03-07T11:19:50Z

This does mean for large rooms RR will be significantly delayed, as the first RR will go out to all servers, and then it waits 7s before sending any more. That sounds like its gonna be quite noticeable in a conversation tbh.

I'm also slightly wary that instead of defining this in terms of "this is an acceptable delay" which is fairly static, we now have a magic parameter that we'd need to keep tweaking as our load/usage patterns change (e.g. how many large rooms have concurrent conversations).

richvdh · 2019-03-07T11:32:49Z

This does mean for large rooms RR will be significantly delayed, as the first RR will go out to all servers, and then it waits 7s before sending any more. That sounds like its gonna be quite noticeable in a conversation tbh.

well right, but empirically going any faster than that means we start getting behind.

I'm also slightly wary that instead of defining this in terms of "this is an acceptable delay" which is fairly static, we now have a magic parameter that we'd need to keep tweaking as our load/usage patterns change (e.g. how many large rooms have concurrent conversations).

I suspect we'd end up tweaking the "acceptable delay" just as much.

ara4n · 2019-03-08T08:23:23Z

We might need to be careful about what happens if there are two rooms with activity at once, but this is currently rare.

It might just be sufficient to maintain a queue and stack the batches of RRs to go out in series?

richvdh · 2019-03-08T09:53:59Z

It might just be sufficient to maintain a queue and stack the batches of RRs to go out in series?

well, that's what happens at the moment. The problem is that everything else gets caught up in it. It's difficult to come up with a mechanism where RRs are queued separately to events without RRs getting completely starved out.

richvdh · 2019-03-12T11:15:48Z

I think we need to try something here rather than continue to theorise about what might or might not feel ok.

I'm going to try Matthew's suggestion of rate-limiting RRs by room size. As Erik points out, that might make RRs feel unacceptably laggy, so we might need to consider tweaking it in the future.

This is mostly a prerequisite for #4730, but also fits with the general theme of "move everything off the master that we possibly can".

Rate-limit outgoing read-receipts as per #4730.

richvdh · 2019-03-25T16:48:48Z

this has been deployed on matrix.org since 2019-03-21 14:14:49, ftr.

neilisfragile added enhancement labels Feb 26, 2019

neilisfragile assigned richvdh Feb 26, 2019

richvdh mentioned this issue Mar 1, 2019

Synapse is not limiting the number of EDUs in a transaction #3951

Closed

richvdh mentioned this issue Mar 1, 2019

Collect outgoing read-receipts into buckets #4777

Closed

richvdh added a commit that referenced this issue Mar 13, 2019

Move client receipt processing to federation sender worker.

fdcad8e

This is mostly a prerequisite for #4730, but also fits with the general theme of "move everything off the master that we possibly can".

richvdh mentioned this issue Mar 13, 2019

Move client receipt processing to federation sender worker. #4852

Merged

richvdh mentioned this issue Mar 19, 2019

Batch up outgoing read-receipts to reduce federation traffic. #4890

Merged

richvdh added a commit that referenced this issue Mar 19, 2019

Batch up outgoing read-receipts to reduce federation traffic.

1de40fe

Rate-limit outgoing read-receipts as per #4730.

richvdh added a commit that referenced this issue Mar 20, 2019

Batch up outgoing read-receipts to reduce federation traffic. (#4890)

a902d13

Rate-limit outgoing read-receipts as per #4730.

richvdh closed this as completed Mar 25, 2019

erikjohnston mentioned this issue Nov 14, 2024

Fix up logic for delaying sending read receipts over federation. element-hq/synapse#17933

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch outgoing RRs better #4730

batch outgoing RRs better #4730

richvdh commented Feb 24, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019 •

edited

Loading

erikjohnston commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

ara4n commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

erikjohnston commented Mar 7, 2019 •

edited

Loading

richvdh commented Mar 7, 2019

ara4n commented Mar 8, 2019

richvdh commented Mar 8, 2019

richvdh commented Mar 12, 2019

richvdh commented Mar 25, 2019

batch outgoing RRs better #4730

batch outgoing RRs better #4730

Comments

richvdh commented Feb 24, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019 • edited Loading

erikjohnston commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

ara4n commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019

richvdh commented Mar 6, 2019

ara4n commented Mar 6, 2019

erikjohnston commented Mar 6, 2019

ara4n commented Mar 6, 2019

richvdh commented Mar 6, 2019

erikjohnston commented Mar 7, 2019 • edited Loading

richvdh commented Mar 7, 2019

ara4n commented Mar 8, 2019

richvdh commented Mar 8, 2019

richvdh commented Mar 12, 2019

richvdh commented Mar 25, 2019

richvdh commented Mar 6, 2019 •

edited

Loading

erikjohnston commented Mar 7, 2019 •

edited

Loading