-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Check appservice user interest against the local users instead of all users (get_users_in_room
mis-use)
#13958
Check appservice user interest against the local users instead of all users (get_users_in_room
mis-use)
#13958
Conversation
… users `get_local_users_in_room` is way more performant since it looks at a single table (`local_current_membership`) and is looking through way less data since it only worries about the local users in the room instead of everyone in the room across the federation. Fix #13942 Related to: - #13605 - #13608 - #13606
Spawned while working on, #13958 (comment)
# In the future, we can consider re-using | ||
# `store.get_app_service_users_in_room` which is very similar to this | ||
# function but has a slightly worse performance than this because we | ||
# have an early escape-hatch if we find a single user that the | ||
# appservice is interested in. The juice would be worth the squeeze if | ||
# `store.get_app_service_users_in_room` was used in more places besides | ||
# an experimental MSC. But for now we can avoid doing more work and | ||
# barely using it later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more context on this comment, see #13958 (comment)
changelog.d/13958.bugfix
Outdated
@@ -0,0 +1 @@ | |||
Fix performance bottleneck with heavy appservice and bridged users in Synapse by checking appservice user interest against the local users instead of all users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone curious, there might be a few more get_users_in_room
mis-uses in the sync
and presence
code for someone who has more domain knowledge on what those things should exactly do.
For example #13967 (feel free to pick it up)
# `store.get_app_service_users_in_room` was used in more places besides | ||
# an experimental MSC. But for now we can avoid doing more work and | ||
# barely using it later. | ||
local_user_ids = await store.get_local_users_in_room( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behavior of is_interested_in_room
slightly. Previously, we could match remote users when the appservice's user regex didn't include a hostname (all of our test cases and examples in the documentation don't). Was that just a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was that just a bug?
I don't see how an appservice could control remote users, so I think so...but might be worth asking someone with more info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really a question of controlling the users, but more of whether the application service wants to receive events and updates related to those users, which an application service may well want to do for events coming from non-local users.
So I think this is a change we wouldn't want to go ahead with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I hadn't considered this use case.
For my own reference, the spec doesn't mention any restriction either: https://spec.matrix.org/v1.1/application-service-api/#registration
I've updated the comments to clarify local and remote so it's obvious if this is attempted again. We do have tests around this kinda but because they were just mocking get_users_in_room
so they happily passed when I updated it to get_local_users_in_room
as well 🤦♂️. I've added some actual homeserver integration tests and verified that they fail if we use get_local_users_in_room
here ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #14000 to introduce these clarification changes with so we can leave the diff in this PR for reference of what we didn't want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly, though the historical context (which may be lost to time) is still very prevelant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though the historical context (which may be lost to time) is still very prevelant here.
Is there something specific to point to here? Or just from your memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implied parts of "r0 was a nightmare" above :p (mostly my memory, but also the memory of others from around that time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a MSC for this: MSC3905
And prodding this point in case we can merge earlier,
That being said, the changes here are valid - simply only returning events targeted at local users covers the same use cases.
I tend to agree with @anoadragon453 here (I think we agree 😬). The use cases of seeing events from remote users (like moderation) is already supported by matching the entire room with the rooms
namespace regex so this PR should be ok to merge.
We have to rip the band-aid off at some point whether it before or after matrix-org/matrix-spec-proposals#3905 merges. Maybe there is a way to detect people relying on the old behavior? -> #13958 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSC3905 is merged and this behavior is ok to change now ✅
This reverts commit 4451998.
@parameterized.expand( | ||
[ | ||
("@local_as_user:test", True), | ||
# Defining remote users in an application service user namespace regex is a | ||
# footgun since the appservice might assume that it'll receive all events | ||
# sent by that remote user, but it will only receive events in rooms that | ||
# are shared with a local user. So we just remove this footgun possibility | ||
# entirely and we won't notify the application service based on remote | ||
# users. | ||
("@remote_as_user:remote", False), | ||
] | ||
) | ||
def test_match_interesting_room_members( | ||
self, interesting_user: str, should_notify: bool | ||
): | ||
""" | ||
Test to make sure that a interesting user (local or remote) in the room is | ||
notified as expected when someone else in the room sends a message. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note to previous reviewers, this parameterized test is new!
Originally from #14000 but this integration test actually passes and fails based on Synapse's behavior instead of mocked functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specifically wanted to make sure that an appservice registering interest in *
users would still receive an event that was sent from a local user to a remote user. As in, the appservice would still receive events sent from an interesting user.
And here I believe that is the case, though it may be nice to check send_mock
was called or not called both when a message is sent from alice and the parameterized user. Just to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…emods/13942-appservice-get_users_in_room-mis-uses
Thanks for the review @squahtx, @clokep, @anoadragon453, and @turt2live 🦨 |
Synapse 1.71.0 (2022-11-08) =========================== Please note that, as announced in the release notes for Synapse 1.69.0, legacy Prometheus metric names are now disabled by default. They will be removed altogether in Synapse 1.73.0. If not already done, server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names. See the [upgrade notes](https://matrix-org.github.io/synapse/v1.71/upgrade.html#upgrading-to-v1710) for more details. **Note:** in line with our [deprecation policy](https://matrix-org.github.io/synapse/latest/deprecation_policy.html) for platform dependencies, this will be the last release to support PostgreSQL 10, which reaches upstream end-of-life on November 10th, 2022. Future releases of Synapse will require PostgreSQL 11+. No significant changes since 1.71.0rc2. Synapse 1.71.0rc2 (2022-11-04) ============================== Improved Documentation ---------------------- - Document the changes to monthly active user metrics due to deprecation of legacy Prometheus metric names. ([\matrix-org#14358](matrix-org#14358), [\matrix-org#14360](matrix-org#14360)) Deprecations and Removals ------------------------- - Disable legacy Prometheus metric names by default. They can still be re-enabled for now, but they will be removed altogether in Synapse 1.73.0. ([\matrix-org#14353](matrix-org#14353)) Internal Changes ---------------- - Run unit tests against Python 3.11. ([\matrix-org#13812](matrix-org#13812)) Synapse 1.71.0rc1 (2022-11-01) ============================== Features -------- - Support back-channel logouts from OpenID Connect providers. ([\matrix-org#11414](matrix-org#11414)) - Allow use of Postgres and SQLlite full-text search operators in search queries. ([\matrix-org#11635](matrix-org#11635), [\matrix-org#14310](matrix-org#14310), [\matrix-org#14311](matrix-org#14311)) - Implement [MSC3664](matrix-org/matrix-spec-proposals#3664), Pushrules for relations. Contributed by Nico. ([\matrix-org#11804](matrix-org#11804)) - Improve aesthetics of HTML templates. Note that these changes do not retroactively apply to templates which have been [customised](https://matrix-org.github.io/synapse/latest/templates.html#templates) by server admins. ([\matrix-org#13652](matrix-org#13652)) - Enable write-ahead logging for SQLite installations. Contributed by [@asymmetric](https://github.com/asymmetric). ([\matrix-org#13897](matrix-org#13897)) - Show erasure status when [listing users](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html#query-user-account) in the Admin API. ([\matrix-org#14205](matrix-org#14205)) - Provide a specific error code when a `/sync` request provides a filter which doesn't represent a JSON object. ([\matrix-org#14262](matrix-org#14262)) Bugfixes -------- - Fix a long-standing bug where the `update_synapse_database` script could not be run with multiple databases. Contributed by @thefinn93 @ Beeper. ([\matrix-org#13422](matrix-org#13422)) - Fix a bug which prevented setting an avatar on homeservers which have an explicit port in their `server_name` and have `max_avatar_size` and/or `allowed_avatar_mimetypes` configuration. Contributed by @ashfame. ([\matrix-org#13927](matrix-org#13927)) - Check appservice user interest against the local users instead of all users in the room to align with [MSC3905](matrix-org/matrix-spec-proposals#3905). ([\matrix-org#13958](matrix-org#13958)) - Fix a long-standing bug where Synapse would accidentally include extra information in the response to [`PUT /_matrix/federation/v2/invite/{roomId}/{eventId}`](https://spec.matrix.org/v1.4/server-server-api/#put_matrixfederationv2inviteroomideventid). ([\matrix-org#14064](matrix-org#14064)) - Fix a bug introduced in Synapse 1.64.0 where presence updates could be missing from `/sync` responses. ([\matrix-org#14243](matrix-org#14243)) - Fix a bug introduced in Synapse 1.60.0 which caused an error to be logged when Synapse received a SIGHUP signal if debug logging was enabled. ([\matrix-org#14258](matrix-org#14258)) - Prevent history insertion ([MSC2716](matrix-org/matrix-spec-proposals#2716)) during an partial join ([MSC3706](matrix-org/matrix-spec-proposals#3706)). ([\matrix-org#14291](matrix-org#14291)) - Fix a bug introduced in Synapse 1.34.0 where device names would be returned via a federation user key query request when `allow_device_name_lookup_over_federation` was set to `false`. ([\matrix-org#14304](matrix-org#14304)) - Fix a bug introduced in Synapse 0.34.0 where logs could include error spam when background processes are measured as taking a negative amount of time. ([\matrix-org#14323](matrix-org#14323)) - Fix a bug introduced in Synapse 1.70.0 where clients were unable to PUT new [dehydrated devices](matrix-org/matrix-spec-proposals#2697). ([\matrix-org#14336](matrix-org#14336)) Improved Documentation ---------------------- - Explain how to disable the use of [`trusted_key_servers`](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#trusted_key_servers). ([\matrix-org#13999](matrix-org#13999)) - Add workers settings to [configuration manual](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#individual-worker-configuration). ([\matrix-org#14086](matrix-org#14086)) - Correct the name of the config option [`encryption_enabled_by_default_for_room_type`](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#encryption_enabled_by_default_for_room_type). ([\matrix-org#14110](matrix-org#14110)) - Update docstrings of `SynapseError` and `FederationError` to bettter describe what they are used for and the effects of using them are. ([\matrix-org#14191](matrix-org#14191)) Internal Changes ---------------- - Remove unused `@lru_cache` decorator. ([\matrix-org#13595](matrix-org#13595)) - Save login tokens in database and prevent login token reuse. ([\matrix-org#13844](matrix-org#13844)) - Refactor OIDC tests to better mimic an actual OIDC provider. ([\matrix-org#13910](matrix-org#13910)) - Fix type annotation causing import time error in the Complement forking launcher. ([\matrix-org#14084](matrix-org#14084)) - Refactor [MSC3030](matrix-org/matrix-spec-proposals#3030) `/timestamp_to_event` endpoint to loop over federation destinations with standard pattern and error handling. ([\matrix-org#14096](matrix-org#14096)) - Add initial power level event to batch of bulk persisted events when creating a new room. ([\matrix-org#14228](matrix-org#14228)) - Refactor `/key/` endpoints to use `RestServlet` classes. ([\matrix-org#14229](matrix-org#14229)) - Switch to using the `matrix-org/backend-meta` version of `triage-incoming` for new issues in CI. ([\matrix-org#14230](matrix-org#14230)) - Build wheels on macos 11, not 10.15. ([\matrix-org#14249](matrix-org#14249)) - Add debugging to help diagnose lost device list updates. ([\matrix-org#14268](matrix-org#14268)) - Add Rust cache to CI for `trial` runs. ([\matrix-org#14287](matrix-org#14287)) - Improve type hinting of `RawHeaders`. ([\matrix-org#14303](matrix-org#14303)) - Use Poetry 1.2.0 in the Twisted Trunk CI job. ([\matrix-org#14305](matrix-org#14305)) <details> <summary>Dependency updates</summary> Runtime: - Bump anyhow from 1.0.65 to 1.0.66. ([\matrix-org#14278](matrix-org#14278)) - Bump jinja2 from 3.0.3 to 3.1.2. ([\matrix-org#14271](matrix-org#14271)) - Bump prometheus-client from 0.14.0 to 0.15.0. ([\matrix-org#14274](matrix-org#14274)) - Bump psycopg2 from 2.9.4 to 2.9.5. ([\matrix-org#14331](matrix-org#14331)) - Bump pysaml2 from 7.1.2 to 7.2.1. ([\matrix-org#14270](matrix-org#14270)) - Bump sentry-sdk from 1.5.11 to 1.10.1. ([\matrix-org#14330](matrix-org#14330)) - Bump serde from 1.0.145 to 1.0.147. ([\matrix-org#14277](matrix-org#14277)) - Bump serde_json from 1.0.86 to 1.0.87. ([\matrix-org#14279](matrix-org#14279)) Tooling and CI: - Bump black from 22.3.0 to 22.10.0. ([\matrix-org#14328](matrix-org#14328)) - Bump flake8-bugbear from 21.3.2 to 22.9.23. ([\matrix-org#14042](matrix-org#14042)) - Bump peaceiris/actions-gh-pages from 3.8.0 to 3.9.0. ([\matrix-org#14276](matrix-org#14276)) - Bump peaceiris/actions-mdbook from 1.1.14 to 1.2.0. ([\matrix-org#14275](matrix-org#14275)) - Bump setuptools-rust from 1.5.1 to 1.5.2. ([\matrix-org#14273](matrix-org#14273)) - Bump twine from 3.8.0 to 4.0.1. ([\matrix-org#14332](matrix-org#14332)) - Bump types-opentracing from 2.4.7 to 2.4.10. ([\matrix-org#14133](matrix-org#14133)) - Bump types-requests from 2.28.11 to 2.28.11.2. ([\matrix-org#14272](matrix-org#14272)) </details>
Check appservice user interest against the local users instead of all users (
get_users_in_room
mis-use).get_local_users_in_room
is way more performant since it looks at a single table (local_current_membership
) and is looking through way less data since it only worries about the local users in the room instead of everyone in the room across the federation.Fix #13942
(probably fixes, we don't know for sure but it's our hunch atm)
Related to matrix-org/matrix-spec#1272
It has also become a spec-compliance issue and this PR will align our behavior to MSC3905
Related to the following PRs where we also cleaned up some
get_users_in_room
mis-uses:get_users_in_room
mis-use to get hosts with dedicatedget_current_hosts_in_room
#13605get_users_in_room
mis-use) #13608get_local_users_in_room
to find local users when calculatingjoin_authorised_via_users_server
of a/make_join
request (get_users_in_room
mis-use) #13606get_users_in_room
mis-use intransfer_room_state_on_room_upgrade
#13960See #13575 (comment) for the original exploration around finding
get_users_in_room
mis-uses.Problem
#13575 made the
get_users_in_room
SQL query a little more complicated by adding an extra join statement to the query which means it takes more time.This extra database time can show up as a hot-spot for homeservers that make heavy use of application services and bridged users (like Element One), see #13942
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Pull request includes a sign off(run the linters)