This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Backfill gets clogged up with fake prev_events when using the MSC2716 /batch_send endpoint #11091
Closed
Tracked by
#10737
Labels
Comments
MadLittleMods
added
A-Federation
T-Defect
Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
labels
Oct 14, 2021
This was referenced Oct 14, 2021
MadLittleMods
added a commit
that referenced
this issue
Oct 21, 2021
Fix #11091 We have to allow creation of events with no prev_events but do have auth_events. And since the historical member events are outliers with no prev_events to resolve them, we want to avoid putting them as backward extremeties.
MadLittleMods
added a commit
that referenced
this issue
Feb 7, 2022
…vers (MSC2716) (#11114) Fix #11091 Fix #10764 (side-stepping the issue because we no longer have to deal with `fake_prev_event_id`) 1. Made the `/backfill` response return messages in `(depth, stream_ordering)` order (previously only sorted by `depth`) - Technically, it shouldn't really matter how `/backfill` returns things but I'm just trying to make the `stream_ordering` a little more consistent from the origin to the remote homeservers in order to get the order of messages from `/messages` consistent ([sorted by `(topological_ordering, stream_ordering)`](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)). - Even now that we return backfilled messages in order, it still doesn't guarantee the same `stream_ordering` (and more importantly the [`/messages` order](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in the `stream_order` and totally throw off the sort. - This will be even more the case when we add the [MSC3030 jump to date API endpoint](matrix-org/matrix-spec-proposals#3030) so the static archives can navigate and jump to a certain date. - We're solving this in the future by switching to [online topological ordering](matrix-org/gomatrixserverlib#187) and [chunking](#3785) which by its nature will apply retroactively to fix any inconsistencies introduced by people permalinking 2. As we're navigating `prev_events` to return in `/backfill`, we order by `depth` first (newest -> oldest) and now also tie-break based on the `stream_ordering` (newest -> oldest). This is technically important because MSC2716 inserts a bunch of historical messages at the same `depth` so it's best to be prescriptive about which ones we should process first. In reality, I think the code already looped over the historical messages as expected because the database is already in order. 3. Making the historical state chain and historical event chain float on their own by having no `prev_events` instead of a fake `prev_event` which caused backfill to get clogged with an unresolvable event. Fixes #11091 and #10764 4. We no longer find connected insertion events by finding a potential `prev_event` connection to the current event we're iterating over. We now solely rely on marker events which when processed, add the insertion event as an extremity and the federating homeserver can ask about it when time calls. - Related discussion, #11114 (comment) Before | After --- | --- ![](https://user-images.githubusercontent.com/558581/139218681-b465c862-5c49-4702-a59e-466733b0cf45.png) | ![](https://user-images.githubusercontent.com/558581/146453159-a1609e0a-8324-439d-ae44-e4bce43ac6d1.png) #### Why aren't we sorting topologically when receiving backfill events? > The main reason we're going to opt to not sort topologically when receiving backfill events is because it's probably best to do whatever is easiest to make it just work. People will probably have opinions once they look at [MSC2716](matrix-org/matrix-spec-proposals#2716) which could change whatever implementation anyway. > > As mentioned, ideally we would do this but code necessary to make the fake edges but it gets confusing and gives an impression of “just whyyyy” (feels icky). This problem also dissolves with online topological ordering. > > -- #11114 (comment) See #11114 (comment) for the technical difficulties
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
Spawning from #11027 (comment)
When using the experimental MSC2716
/batch_send
endpoint, we're using a fakeprev_event
to make the chain of historical state and messages float off on their own in the DAG and nudges federated homeservers to ask for the state at the start of the chunk that we generated.Each time one of these fake
prev_events
is processed by the homeserver, it's seen as a backward extremity which we attempt to fetch when trying to do backfill federation things but will never resolve. This is problematic as we will have thousands of batches per room to backfill a big room for Gitter and will have to deal with thousands of backward extremities here. Since we useget_oldest_event_ids_with_depth_in_room
and oldest backward extremities will always be our historical events. Assuming we're paginating from the bottom (wherecurrent_depth
> historical extremity depth), we will always pick the 5 unresolvable fake events in the room to backfill from. This would stop us from backfilling any other events in the room.Potential solutions
prev_events
for a given event? How does the first event in the room work?The solution might be intertwined with #10764.
Reproduction case
Untested for sure (just theoretical): Send more than 5 batches of historical messages. Try to backfill messages from another server.
There is a Complement test which does many batches but doesn't test the backfill aspect, matrix-org/complement#212
The text was updated successfully, but these errors were encountered: