-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some useful endpoints to Admin API #17948
Add some useful endpoints to Admin API #17948
Conversation
Thanks for the quick review! I've re-written to expand endpoints where possible and added an index on |
self.db_pool.updates.register_background_index_update( | ||
update_name="events_received_ts_index", | ||
index_name="received_ts_idx", | ||
table="events", | ||
columns=("received_ts",), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we okay with the extra disk space and extra CPU cycles dealing with this index on insert/update?
Seems necessary to the use case so the answer is probably "this is ok" but I'm not the one dealing with the database woes on a daily basis to know for sure. There are already 7 other indexes on the events
table.
$ psql synapse
> \d+ events
...
Indexes:
"event_contains_url_index" btree (room_id, topological_ordering, stream_ordering) WHERE contains_url = true AND outlier = false
"events_event_id_key" UNIQUE CONSTRAINT, btree (event_id)
"events_jump_to_date_idx" btree (room_id, origin_server_ts) WHERE NOT outlier
"events_order_room" btree (room_id, topological_ordering, stream_ordering)
"events_room_stream" btree (room_id, stream_ordering)
"events_stream_ordering" UNIQUE, btree (stream_ordering)
"events_ts" btree (origin_server_ts, stream_ordering)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I am unclear on this so I've consulted the backend team for their thoughts - will update once I hear back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two angles to the cost of adding an index.
As mentioned above, insert/update/delete performance. The most interesting one for us here is insert — we don't really want to make it take longer to persist new events if we don't have a good justification for doing so. But I can't really give you any numbers, so it's hard to comment on it from a technical perspective.
Update — I don't think we ever change the received_ts
of an event, so not applicable.
Delete — we purge events, but not really a problem if this goes a bit slower since it's already a 'background'-ish job.
The other angle is how much space this will take on disk. Looking at events_stream_ordering
(which is also an index on a BIGINT, so the same), this takes 247 GB on matrix.org currently. (See with \di+
in psql
BTW)
So adding this index will cost us another 247 GB roughly. That feels a bit on the chunky side unless it's very useful.
Do we have any alternatives? I can think of a few ideas that would help this.
BRIN
A neat trick that applies in this situation is that, because the values in received_ts
are timestamps of (roughly) when we insert the event, they will — excluding clock issues and so on — only increase.
We could therefore create a BRIN (Block-Range INdex) on this, which is a more lightweight type of index that stores, for each page of rows in the table, the minimum and maximum value of the column.
I'm not sure if this type of index might be upset by VACUUM FULL
, or restoring a backup, especially since we might be left with gaps in the tables due to purging rooms or events. So in that sense it's a little bit risky.
Using stream_ordering
instead
stream_ordering
also (excluding backfilled events) only increases. It's possible you could efficiently convert from received_ts
to stream_ordering
using a binary search or something like that, then perform your queries using stream_ordering
.
However if the server clock isn't consistently / monotonically increasing, then you can't really use stream_ordering
in its place and so this approach probably reaches its limit.
It'd be suitable for some rough approximation if you were poking around but probably no good for a feature built into Synapse without being able to bypass this problem.
Partial index
Both of your queries only care about m.room.member
events!
We can restrict the index to only indexing m.room.member
events and then make sure the query planner notices that's all we care about, when you come to query.
This is mostly good because it makes the index smaller (which should also provide a minor performance boost when reading, because there will be less extraneous index rows to have to consider)
self.db_pool.updates.register_background_index_update(
update_name="events_received_ts_index",
index_name="received_ts_idx",
table="events",
columns=("received_ts",),
where_clause = "type = 'm.room.member'"
)
-
SELECT COUNT(event_id) FROM events WHERE sender = ? AND type = 'm.room.member' AND received_ts > ? AND state_key != ?`
This query already should make it obvious enough to the query planner that we only care about
m.room.member
events. -
SELECT c.room_id FROM current_state_events c JOIN events e ON c.event_id = e.event_id WHERE c.state_key = ? AND c.membership = 'join' AND e.received_ts > ?
likely needs to be adapted to:
SELECT c.room_id FROM current_state_events c JOIN events e ON c.event_id = e.event_id WHERE c.state_key = ? AND c.membership = 'join' AND e.received_ts > ? AND e.type = 'm.room.member'
Look-up table
Not saying this is my favourite suggestion right now, but you could make Synapse build and maintain a lookup table that converts the received_ts
to earliest stream_ordering
, e.g. on a daily basis.
Sketch:
(please note you would need something symmetrical for negative stream orderings, which I won't write here because it's just a sketch)
CREATE TABLE events_received_ts_lookup_table(received_day_ts BIGINT NOT NULL PRIMARY KEY, min_stream_ordering BIGINT NOT NULL);
when inserting a new event:
received_day_ts = received_ts - (received_ts % 86400_000)
INSERT INTO events_received_ts_lookup_table (received_day_ts, min_stream_ordering)
VALUES (?, ?) -- the received_day_ts and stream_ordering of the event
ON CONFLICT (received_day_ts) DO UPDATE min_stream_ordering = LEAST(min_stream_ordering, EXCLUDED.min_stream_ordering); -- this is probably not entirely correct but you get the idea
When querying, you'd first query for the min_stream_ordering
for the day (falling back to earlier days if it's not present, so SELECT ... WHERE received_day_ts <= ? ORDER BY .. DESC LIMIT 1
) and use it as an extra filter in your existing query, just as a 'hint' to the query planner so it can use the stream_ordering
index.
SELECT c.room_id
FROM current_state_events c
JOIN events e ON c.event_id = e.event_id
WHERE c.state_key = ? AND c.membership = 'join' AND e.received_ts > ?
AND stream_ordering >= ? -- this `?` is the `min_stream_ordering` for the timestamp (rounded down to the day)
Honestly: this is probably too much effort for the circumstances. But it gives you the freedom to choose how much space to dedicate to the look-up table and is pretty space efficient by default. It does add work into the insertion path, but you could do it in a batch job in the background occasionally if needed (again, this is effort though. It's the kind of thing that could be useful in the right circumstances but I think this probably doesn't justify it yet).
All in all, I'd probably say the partial index is the right mix for now until proven otherwise. However I am a bit concerned that the queries may not be all that fast, even with the index, because the query still has to scan through a substantial number of rows, only to discard the ones with the wrong sender
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream_ordering
also (excluding backfilled events) only increases. It's possible you could efficiently convert fromreceived_ts
tostream_ordering
using a binary search or something like that, then perform your queries usingstream_ordering
.
It's worth noting that we already lookup the stream ordering for 24 hr ago (in store stream_ordering_day_ago
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@H-Shay any decision on this front?
The idea to use stream_ordering
from @reivilibre and the helper that @erikjohnston pointed out (_find_first_stream_ordering_after_ts_txn(...)
) seems very useful to avoid the extra index and create an efficient query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @reivilibre's comment "All in all, I'd probably say the partial index is the right mix for now until proven otherwise. ", I added a partial index here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking closer, it looks like you might have gone with the partial index approach, 4af0387#diff-28da6922f7ab91da0023fa1d9b4ba5d8fa3d5abe924e9160e0646c852b31f6faR342
Seems workable. My only concern is it might be a bit niche and it's just more to add to the database.
Any reason that made you stray away from the (answered by comment above)stream_ordering
approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For total completeness one might note that AFAICT finding the timestamp correlated with a stream ordering appears to rely on the function _find_first_stream_ordering_after_ts_txn
which searches using event.recevied_ts
and so without adding an index is also subject to the original concern of slowness re: that column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The database query for _find_first_stream_ordering_after_ts_txn(...)
searches by stream_ordering
which has an index. From my point of view, _find_first_stream_ordering_after_ts_txn(...)
is potentially slow because it takes multiple round-trips from database <-> application until we complete the binary search. I don't think the comment there is accurate on why it's slow.
Relevant query:
synapse/synapse/storage/databases/main/event_push_actions.py
Lines 1326 to 1331 in 657dd51
sql = """ | |
SELECT received_ts FROM events | |
WHERE stream_ordering <= ? | |
ORDER BY stream_ordering DESC | |
LIMIT 1 | |
""" |
There is potential to improve the binary search by using some sort of WITH RECURSIVE
query to avoid the round-trips but that's a separate thing to fix.
If we needed find_first_stream_ordering_after_ts(...)
in more scenarios, there's an argument to just add the new index for received_ts
directly for all events because it would improve all of these scenarios:
- History retention
purge_history_for_rooms_in_range(...)
/purge_history
Admin API- Removing push notifications that are older than a day ago (
synapse/storage/databases/main/event_push_actions.py
) - The new use case in this PR for some admin API's to find invites and join membership
Currently, the use cases for find_first_stream_ordering_after_ts(...)
seem like they apply to more one-shot and occasionally scheduled tasks which aren't being hammered by normal users. Since these tasks don't require immediate speed, we just want to make sure we're not destroying the database.
It feels like these new admin API's fall into the same camp so using find_first_stream_ordering_after_ts(...)
seems viable.
However I am a bit concerned that the queries may not be all that fast, even with the index, because the query still has to scan through a substantial number of rows, only to discard the ones with the wrong
sender
.
@reivilibre's comment about discarding a bunch of rows is still applicable but is harder to avoid unless we utilize/add more indexes for all of the conditions we're trying to filter by. Overall, it doesn't seem like we stress about this on many of the admin endpoints since they are run occasionally on demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enlightening! Given this do you believe I should switch to using
find_first_stream_ordering_after_ts
rather than adding the partial index? I am learning here and while I can read about the tradeoffs I don't have an intuition as to which to choose, and the tradeoffs seem relatively equal.
@H-Shay I feel like I don't have a hard lean (or convincing/good reason) given you've already made the few changes for the partial index already.
I probably would have chosen the stream_ordering
approach given the prior art in Synapse for it 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding comments for database perf as summoned
self.db_pool.updates.register_background_index_update( | ||
update_name="events_received_ts_index", | ||
index_name="received_ts_idx", | ||
table="events", | ||
columns=("received_ts",), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two angles to the cost of adding an index.
As mentioned above, insert/update/delete performance. The most interesting one for us here is insert — we don't really want to make it take longer to persist new events if we don't have a good justification for doing so. But I can't really give you any numbers, so it's hard to comment on it from a technical perspective.
Update — I don't think we ever change the received_ts
of an event, so not applicable.
Delete — we purge events, but not really a problem if this goes a bit slower since it's already a 'background'-ish job.
The other angle is how much space this will take on disk. Looking at events_stream_ordering
(which is also an index on a BIGINT, so the same), this takes 247 GB on matrix.org currently. (See with \di+
in psql
BTW)
So adding this index will cost us another 247 GB roughly. That feels a bit on the chunky side unless it's very useful.
Do we have any alternatives? I can think of a few ideas that would help this.
BRIN
A neat trick that applies in this situation is that, because the values in received_ts
are timestamps of (roughly) when we insert the event, they will — excluding clock issues and so on — only increase.
We could therefore create a BRIN (Block-Range INdex) on this, which is a more lightweight type of index that stores, for each page of rows in the table, the minimum and maximum value of the column.
I'm not sure if this type of index might be upset by VACUUM FULL
, or restoring a backup, especially since we might be left with gaps in the tables due to purging rooms or events. So in that sense it's a little bit risky.
Using stream_ordering
instead
stream_ordering
also (excluding backfilled events) only increases. It's possible you could efficiently convert from received_ts
to stream_ordering
using a binary search or something like that, then perform your queries using stream_ordering
.
However if the server clock isn't consistently / monotonically increasing, then you can't really use stream_ordering
in its place and so this approach probably reaches its limit.
It'd be suitable for some rough approximation if you were poking around but probably no good for a feature built into Synapse without being able to bypass this problem.
Partial index
Both of your queries only care about m.room.member
events!
We can restrict the index to only indexing m.room.member
events and then make sure the query planner notices that's all we care about, when you come to query.
This is mostly good because it makes the index smaller (which should also provide a minor performance boost when reading, because there will be less extraneous index rows to have to consider)
self.db_pool.updates.register_background_index_update(
update_name="events_received_ts_index",
index_name="received_ts_idx",
table="events",
columns=("received_ts",),
where_clause = "type = 'm.room.member'"
)
-
SELECT COUNT(event_id) FROM events WHERE sender = ? AND type = 'm.room.member' AND received_ts > ? AND state_key != ?`
This query already should make it obvious enough to the query planner that we only care about
m.room.member
events. -
SELECT c.room_id FROM current_state_events c JOIN events e ON c.event_id = e.event_id WHERE c.state_key = ? AND c.membership = 'join' AND e.received_ts > ?
likely needs to be adapted to:
SELECT c.room_id FROM current_state_events c JOIN events e ON c.event_id = e.event_id WHERE c.state_key = ? AND c.membership = 'join' AND e.received_ts > ? AND e.type = 'm.room.member'
Look-up table
Not saying this is my favourite suggestion right now, but you could make Synapse build and maintain a lookup table that converts the received_ts
to earliest stream_ordering
, e.g. on a daily basis.
Sketch:
(please note you would need something symmetrical for negative stream orderings, which I won't write here because it's just a sketch)
CREATE TABLE events_received_ts_lookup_table(received_day_ts BIGINT NOT NULL PRIMARY KEY, min_stream_ordering BIGINT NOT NULL);
when inserting a new event:
received_day_ts = received_ts - (received_ts % 86400_000)
INSERT INTO events_received_ts_lookup_table (received_day_ts, min_stream_ordering)
VALUES (?, ?) -- the received_day_ts and stream_ordering of the event
ON CONFLICT (received_day_ts) DO UPDATE min_stream_ordering = LEAST(min_stream_ordering, EXCLUDED.min_stream_ordering); -- this is probably not entirely correct but you get the idea
When querying, you'd first query for the min_stream_ordering
for the day (falling back to earlier days if it's not present, so SELECT ... WHERE received_day_ts <= ? ORDER BY .. DESC LIMIT 1
) and use it as an extra filter in your existing query, just as a 'hint' to the query planner so it can use the stream_ordering
index.
SELECT c.room_id
FROM current_state_events c
JOIN events e ON c.event_id = e.event_id
WHERE c.state_key = ? AND c.membership = 'join' AND e.received_ts > ?
AND stream_ordering >= ? -- this `?` is the `min_stream_ordering` for the timestamp (rounded down to the day)
Honestly: this is probably too much effort for the circumstances. But it gives you the freedom to choose how much space to dedicate to the look-up table and is pretty space efficient by default. It does add work into the insertion path, but you could do it in a batch job in the background occasionally if needed (again, this is effort though. It's the kind of thing that could be useful in the right circumstances but I think this probably doesn't justify it yet).
All in all, I'd probably say the partial index is the right mix for now until proven otherwise. However I am a bit concerned that the queries may not be all that fast, even with the index, because the query still has to scan through a substantial number of rows, only to discard the ones with the wrong sender
.
I suspect complement test failures are flakes... |
…q_synapse into shay/admin_improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you're happy with how this works and performs. I haven't tested it out myself.
Awesome thanks! When you get a chance can you hit the merge button? I haven't the powers to do so myself. |
Adds endpoints to the Admin API to: