-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Scratch changes to fix have_seen_event
not being invalidated
#13861
Conversation
…vent Fix for #13856 Fixed by calling `_invalidate_caches_for_event` when we persist an event. And an additional fix in `_invalidate_caches_for_event` to make sure it uses the correct cache key. This seems like it would be an easy foot-gun for any `tree=True` cache. Wrong: ```py self.have_seen_event.invalidate((room_id, event_id)) ``` Correct: ```py self.have_seen_event.invalidate(((room_id, event_id),)) ```
self._invalidate_local_get_event_cache(event_id) | ||
self.have_seen_event.invalidate((room_id, event_id)) | ||
self.have_seen_event.invalidate(((room_id, event_id),)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This key lookup was wrong and we were never invalidating the have_seen_event
cache even over replication.
Yes, the cache key literally a set
wrapped in set
. Something weird with the TreeCache
I assume
ex. (('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)
LruCache cache values before pop {(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),): False}
We should probably check all other instances of this for the same problem. And ideally fix the cache so it uses the expected (room_id, event_id)
key tuple instead.
assert event.internal_metadata.stream_ordering | ||
|
||
relation = relation_from_event(event) | ||
self.store._invalidate_caches_for_event( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to move because we can't touch the cache objects from this background queue thread processing and persisting the events.
Otherwise we run into ValueError: Cache objects can only be accessed from the main thread
synapse_main | 2022-09-21 20:53:42,362 - synapse.http.server - 123 - ERROR - POST-3 - Failed handle request via 'RoomCreateRestServlet': <SynapseRequest at 0x7f0fde7ec160 method='POST' uri='/_matrix/client/v3/createRoom' clientproto='HTTP/1.0' site='8080'>
synapse_main | Traceback (most recent call last):
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 306, in _async_render_wrapper
synapse_main | callback_return = await self._async_render(request)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 512, in _async_render
synapse_main | callback_return = await raw_callback_return
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/rest/client/room.py", line 161, in on_POST
synapse_main | info, _ = await self._room_creation_handler.create_room(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/room.py", line 909, in create_room
synapse_main | ) = await self._send_events_for_new_room(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/room.py", line 1105, in _send_events_for_new_room
synapse_main | await send(etype=EventTypes.Create, content=creation_content)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/room.py", line 1081, in send
synapse_main | ) = await self.event_creation_handler.create_and_send_nonmember_event(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/message.py", line 990, in create_and_send_nonmember_event
synapse_main | ev = await self.handle_new_client_event(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/util/metrics.py", line 113, in measured_func
synapse_main | r = await func(self, *args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/message.py", line 1324, in handle_new_client_event
synapse_main | result, _ = await make_deferred_yieldable(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 287, in handle_queue_loop
synapse_main | ret = await self._per_item_callback(room_id, item.task)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 359, in _process_event_persist_queue_task
synapse_main | return await self._persist_event_batch(room_id, task)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 748, in _persist_event_batch
synapse_main | await self.persist_events_store._persist_events_and_state_updates(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 851, in _wrapper
synapse_main | return await func(*args, **kwargs) # type: ignore[misc]
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 214, in _persist_events_and_state_updates
synapse_main | await self.db_pool.runInteraction(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 881, in runInteraction
synapse_main | return await delay_cancellation(_runInteraction())
synapse_main | File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1692, in _inlineCallbacks
synapse_main | result = context.run(
synapse_main | File "/usr/local/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
synapse_main | return g.throw(self.type, self.value, self.tb)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 848, in _runInteraction
synapse_main | result = await self.runWithConnection(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 972, in runWithConnection
synapse_main | return await make_deferred_yieldable(
synapse_main | File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
synapse_main | result = inContext.theWork() # type: ignore[attr-defined]
synapse_main | File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
synapse_main | inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
synapse_main | File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
synapse_main | return self.currentContext().callWithContext(ctx, func, *args, **kw)
synapse_main | File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
synapse_main | return func(*args, **kw)
synapse_main | File "/usr/local/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
synapse_main | result = func(conn, *args, **kw)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 965, in inner_func
synapse_main | return func(db_conn, *args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 710, in new_transaction
synapse_main | r = func(cursor, *args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 442, in _persist_events_txn
synapse_main | self.store._invalidate_caches_for_event(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/cache.py", line 237, in _invalidate_caches_for_event
synapse_main | self.get_latest_event_ids_in_room.invalidate((room_id,))
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/descriptors.py", line 370, in <lambda>
synapse_main | wrapped.invalidate = lambda key: cache.invalidate(key[0])
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/deferred_cache.py", line 391, in invalidate
synapse_main | self.check_thread()
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/deferred_cache.py", line 136, in check_thread
synapse_main | raise ValueError(
synapse_main | ValueError: Cache objects can only be accessed from the main thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could move the cache invalidation to the persist_events
and persist_event
layer. Is that problematic/flakey if something fails in the middle? 🤔
persist_events or persist_event or update_current_state (not really)
self._event_persist_queue
_process_event_persist_queue_task
_persist_event_batch
_persist_events_and_state_updates
_persist_events_txn
assert event.internal_metadata.stream_ordering | ||
# Invalidate related caches after we persist a new event | ||
relation = relation_from_event(event) | ||
self.main_store._invalidate_caches_for_event( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out the fixes to #13863
⏩ Moved to #13863
Scratch changes to fix
have_seen_event
not being invalidatedSee #13856
The problem
_invalidate_caches_for_event
doesn't run in monolith mode which means we never even tried to clear thehave_seen_event
and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).Additionally there is bug with the key being wrong so
_invalidate_caches_for_event
never invalidates thehave_seen_event
cache even when it does run.Dev notes
Persist events
Replication stuff
https://github.com/matrix-org/synapse/blob/1a1abdda42551dad3aadc04a169c25f4cc651a2c/docs/replication.md
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)