-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC4171 Omit service members from room summary #17866
Conversation
exclude_users_clause, args = make_in_list_sql_clause( | ||
self.database_engine, "state_key", exclude_members, negative=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be careful about there being too many exclude_members
and bailing to prevent DoS.
Alternatively, we could just fetch N extra members like we do in case one of them is the calling user and do the exclusion outside of the SQL. But we should probably have a limit on that as well. Perhaps something to clarify in the spec and then we can bail if the list is longer than the specc'ed max. Perhaps we just rely on max length of an event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to rely on the max length of an event, which I think even if you had a long list of very short userIds would still only be up to about 9k or so (once you've included the usual event padding).
I'm not quite sure on the performance cost here, but I'd assume that a 9k string list filter in postgres isn't terrible as it's not going to impact IO.
Alternatively we could do as you say and set a sensible max number of users in the spec (say 100 or so). I'm generally a bit allergic to limitations in the spec, as someones probably going to come up with a use case of 101 members however it might be justified in the case of performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and get_room_summary
has big fat cache on it so it's probably okay to do a slightly more expensive call here. Admittedly this does impact the hot path of sync, but I think the operation of pulling out excluded users is fairly fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to limit make_in_list_sql_clause(...)
to at-most 1000 (see usages of batch_iter(..., 1000)
) but it's a bit tough to do here with the negative=True
condition.
We should at-least add a comment here that we are assuming that there should be no more than 9.3k members to exclude based on the max length of an event (65535 bytes).
- There is nothing stopping someone from just using a single letter string over and over in the list so it could be practically ~65k things in the list.
- With actual MXIDs: 16.4k = 65535 / 4 (
@m:h
) maximally - In the most likely maximal scenario with public federation: 10.9k = 65535 / 6 (
@m:h.io
) - With enough unique combos in localpart: 9.3k = 65535 / 7 (
@mm:h.io
)
We could have a valid MXID check and add them to a Set
to deduplicate but I don't think it's worth the extra computation.
Perhaps we should just have a practical limit to re-evaluate if someone hits it. Have a 1k check and set exclude_members = []
in that case with a log warning (warning instead of assert because we don't want to break the whole /sync
response). This can always be increased in the future when someone has a practical use case but avoids rooms where there only goal is performance abuse.
if isinstance(functional_members_data, list) and all( | ||
isinstance(item, str) for item in functional_members_data | ||
): | ||
exclude_members = functional_members_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit dubious to change the behavior of /sync
(m.heroes
) without it being under an unstable endpoint or key in the response. It is under a feature flag (msc4171_enabled
) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unsure how best to do this one. I felt it would be better for the homeserver to uniformly provide the same result across all users / endpoints based on the configuration, than adding query parameters to each endpoint and filtering. It's a reasonable request though, we could do it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I didn't manage to add a feature flag check here in the original PR. Have updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably would be prudent to separate this behavior, either via a new key in the /sync
response as io.element.msc4171.heroes
or conditional based on a new query parameter (any good prior art?)
New endpoint(s) is also a valid option but not worth how cumbersome that would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query parameter would be probably best I think. I'll just need to let the EX crew know that this needs adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the response. I do think we need some review on the MSC as it's a naturally ripe-for-abuse feature. So from my view, this feature primarily helps with DM rooms where no room name nor alias is set and you will have around 3 participants (Alice, Bob, and a bot).
Deprioritise service users to the end of the join list (this doesn't need to be behind a flag)
We can do this, but unfortunately really only helps with larger groups with no other metadata to gather a name from.
Have clients remove service users from heroes manually
We can do this, but it does present a couple of problems. First, it means we need to fetch state as part of the sync loop which either means a small delay or a flicker as room names adjust. I am unsure how palatable that will be, but it's an option.
Otherwise, I'm happy adding a new query param flag to enable filtering of
m.heroes
.
I think this would be my preference. This avoids changing behaviors for existing clients, and gives clients the opportunity to opt out. It also prevents the flicker problem noted above which is my primary concern with the client approach. I'll proceed with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this, but it does present a couple of problems. First, it means we need to fetch state as part of the sync loop which either means a small delay or a flicker as room names adjust. I am unsure how palatable that will be, but it's an option.
I don't really follow this? Clients using sync v2 will get the service state event alongside the heroes, and for sliding sync the client can add it to the list of requested state events.
We can also server side annotate the hero membership events in sync v2 to indicate that they are service users (and do something similar in sliding sync), but given the client already has that information it feels a bit redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the point of m.heroes was to calculate a room name when you don't have the full state locally? Admittedly I didn't know that you could ask for specific state events to appear alongside. That seems like a more sensible approach to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, we always send down all the state to the clients, except for membership events if they have enabled "lazy-loading" for sync v2. In sliding sync the client can specify what state event types it wants the server to return (and gives an option for lazy loading members).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes things much easier. I think the optimal path here is probably to scrap this PR and have the clients pull the correct state and filter then.
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
ORDER BY | ||
CASE membership WHEN ? THEN 1 WHEN ? THEN 2 WHEN ? THEN 3 ELSE 4 END ASC, | ||
event_stream_ordering ASC | ||
LIMIT ? | ||
""" | ||
""" % (exclude_users_clause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use f-strings
, or the more modern .format(..)
.
The old %
is a bit of a footgun, as e.g. its meant to take a collection like a tuple, but you've actually just passed it a plain string (since you forgot a comma), which is valid as a string is a collection of strings. This does work, as python does magic to detect the case, but its a bit ugh.
I think on balance and after talking to @erikjohnston, we have the ability to do this filtering in the client and it sounds trivial to do so from both v2 and sliding sync. Let's do that, and if that ends up being annoying then we can circle back to a backend implementation. |
For #17859
This applies the rules for MSC4171 to the
m.heroes
response in sync. For clarity, this PR is needed to fix behaviors in some modern clients such as EX which use heroes to determine room names rather than local calculation.MSC discussion for excluding service members from
m.heroes
: matrix-org/matrix-spec-proposals#4171 (comment)Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)