Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

"Error leaving room No create event in auth events" after migrating from sqlite to postgresql #12487

Closed
calvinjgs opened this issue Apr 17, 2022 · 5 comments
Assignees

Comments

@calvinjgs
Copy link

Description

I can't leave some federated rooms after migration from sqlite to postgresql14
I get a "Error leaving room No create event in auth events" error in Element Desktop.
(also other actions don't work, like sending, and probably receiving, messages)

Steps to reproduce

  • create a matrix synapse server using default sqlite
  • add a user and join public rooms on other servers
  • migrate the database according to the documentation
  • attempt to leave the previously joined public rooms

I expect to be able to leave the rooms without the error. I also expect that the database was ported/migrated over correctly.

Additional Information

I think each attempt to leave produces a psycopg2.errors.UniqueViolation error in the logs. Here is one of them with a couple log entries before and after in case they provide more context.

2022-04-16 14:35:21,745 - synapse.http.matrixfederationclient - 291 - INFO - GET-22138-$p6TLBanT7v-8pZDSYiRNPnf4hPbLyDHnC4EPcI3KdK4-$HwX9TsuaewHjPWHLeFtQ89MaaptIr2AikTzKVjCVqzM-$HpPR1JZJhiUy5st4Oa-fhDBgqgydUPSgIEJoI7YQw7U - {GET-O-171629} [maunium.net] Completed request: 200 OK in 0.26 secs, got 873 bytes - GET matrix://maunium.net/_matrix/federation/v1/event/%24HpPR1JZJhiUy5st4Oa-fhDBgqgydUPSgIEJoI7YQw7U
2022-04-16 14:35:21,747 - synapse.handlers.federation_event - 823 - WARNING - GET-463-$FCwuaOYV20S1JBlZKAKiThG83bciRZXekxxwxR1hkPo - Error attempting to resolve state at missing prev_events
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 788, in _resolve_state_at_missing_prevs
    remote_state = await self._get_state_after_missing_prev_event(
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 895, in _get_state_after_missing_prev_event
    await self._get_events_and_persist(
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 1231, in _get_events_and_persist
    await self._auth_and_persist_outliers(room_id, events)
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 1290, in _auth_and_persist_outliers
    await self._auth_and_persist_outliers_inner(room_id, roots)
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 1356, in _auth_and_persist_outliers_inner
    await self.persist_events_and_notify(
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 1890, in persist_events_and_notify
    events, max_stream_token = await self._storage.persistence.persist_events(
  File "/usr/local/lib/python3.9/dist-packages/synapse/logging/opentracing.py", line 794, in _trace_inner
    return await func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/persist_events.py", line 326, in persist_events
    ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
  File "/usr/local/lib/python3.9/dist-packages/synapse/util/async_helpers.py", line 275, in yieldable_gather_results
    raise dfe.subFailure.value from None
  File "/usr/local/lib/python3.9/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
    result = current_context.run(
  File "/usr/local/lib/python3.9/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/persist_events.py", line 322, in enqueue
    return await self._event_persist_queue.add_to_queue(
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/persist_events.py", line 200, in add_to_queue
    res = await make_deferred_yieldable(end_item.deferred.observe())
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/persist_events.py", line 243, in handle_queue_loop
    ret = await self._per_item_callback(
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/persist_events.py", line 580, in _persist_event_batch
    await self.persist_events_store._persist_events_and_state_updates(
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/databases/main/events.py", line 190, in _persist_events_and_state_updates
    await self.db_pool.runInteraction(
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 780, in runInteraction
    return await delay_cancellation(defer.ensureDeferred(_runInteraction()))
  File "/usr/local/lib/python3.9/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
    result = current_context.run(
  File "/usr/local/lib/python3.9/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 752, in _runInteraction
    result = await self.runWithConnection(
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 875, in runWithConnection
    return await make_deferred_yieldable(
  File "/usr/local/lib/python3.9/dist-packages/twisted/python/threadpool.py", line 244, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/dist-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 868, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 616, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/databases/main/events.py", line 395, in _persist_events_txn
    self._store_event_txn(txn, events_and_contexts=events_and_contexts)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/databases/main/events.py", line 1367, in _store_event_txn
    self.db_pool.simple_insert_many_txn(
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 1014, in simple_insert_many_txn
    txn.execute_values(sql, values, fetch=False)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 311, in execute_values
    return self._do_execute(
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 350, in _do_execute
    return func(sql, *args)
  File "/usr/local/lib/python3.9/dist-packages/synapse/storage/database.py", line 312, in <lambda>
    lambda the_sql: execute_values(self.txn, the_sql, values, fetch=fetch),
  File "/usr/local/lib/python3.9/dist-packages/psycopg2/extras.py", line 1270, in execute_values
    cur.execute(b''.join(parts))
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "event_json_event_id_key"
DETAIL:  Key (event_id)=($bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc) already exists.

2022-04-16 14:35:21,748 - synapse.handlers.federation_event - 714 - WARNING - GET-463-$FCwuaOYV20S1JBlZKAKiThG83bciRZXekxxwxR1hkPo - Pulled event $FCwuaOYV20S1JBlZKAKiThG83bciRZXekxxwxR1hkPo failed history check.
2022-04-16 14:35:21,759 - synapse.handlers.federation_event - 1283 - INFO - _process_incoming_pdus_in_room_inner-6-$Dw-5RQ67K5Nh4YlqQLxQJXpPtK3phr3VFNRPQxyJg2I-$pF54o-Fy6azIKg-o0M-fTmq0obKJyqvxDeb8uRbrmTY-$Jofm9YXx2Ocq7nrWWj-4wM-LbF8LBWsSoXzo0jCTLVM - Persisting 1 of 5300 remaining outliers: ['$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc']
2022-04-16 14:35:21,772 - synapse.handlers.federation_event - 823 - WARNING - _process_incoming_pdus_in_room_inner-6-$Dw-5RQ67K5Nh4YlqQLxQJXpPtK3phr3VFNRPQxyJg2I-$pF54o-Fy6azIKg-o0M-fTmq0obKJyqvxDeb8uRbrmTY - Error attempting to resolve state at missing prev_events

If you want I could provide the entire 300MB log snapshot if more info is needed.
I also have the homeserver.db that I migrated from. And my homeserver.yaml from before and after as well.

my take

So is it trying to add an event that already exists?

Looking at the state events in Synapse admin, one public room had an m.room.create entry, but the others did not. (all local rooms had one). However, I ran SELECT * FROM state_events WHERE Type = 'm.room.create'; in postresql which returned one create event for every room.

But I don't know what all the tables in the database do or how they interact so I don't know if a solution involves me inserting a create room event somewhere or not.

I read through issue #7642 and the other issues it references. But I don't know if they are really the same or not.

Version information

  • Homeserver: my own homeserver (if you need it I can provide it)

If not matrix.org:

  • Version: visiting https://[my_domain]/_synapse/admin/v1/server_version in a web browser returns {"server_version":"1.56.0","python_version":"3.9.12"}

  • Install method: I installed using this docker image.

  • Platform: So, synapse is running in docker like above on my Unraid server, Version: 6.9.2
@DMRobertson DMRobertson added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Discussion labels Apr 21, 2022
@H-Shay H-Shay self-assigned this Apr 21, 2022
@reivilibre reivilibre removed the T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. label Apr 22, 2022
@reivilibre
Copy link
Contributor

To confirm: this problem did not exist with SQLite, but started immediately after porting over to Postgres?

To confirm: how many times did you run the 'port database' tool? Was Synapse stopped on (before) the final run of the tool and not started again until it was switched over to Postgres?

This issues reminds me a bit of #12147, but also seems different enough.

The error you're showing in the logs sounds like your events and event_json tables are out of sync with one another. I can imagine this happening if you port a database whilst Synapse was running.

'No create event in auth events' is a different error; does that appear anywhere in the logs? (Look at the section of log before you see your request to leave the room.)

Please could you run both of these on both Postgres (psql) and SQLite (sqlite3 <old database>.db)?:

SELECT 1 FROM events WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';
SELECT 1 FROM event_json WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';

@calvinjgs
Copy link
Author

To confirm: this problem did not exist with SQLite, but started immediately after porting over to Postgres?

Correct. It was working just fine. Only after starting it up after porting it over did I notice this.

To confirm: how many times did you run the 'port database' tool? Was Synapse stopped on (before) the final run of the tool and not started again until it was switched over to Postgres?

The error you're showing in the logs sounds like your events and event_json tables are out of sync with one another. I can imagine this happening if you port a database whilst Synapse was running.

You know what, that was probably it then. The final run of the port db script is on the actual database and not a snapshot. So, I had synapse running. Because it is in a docker container, there is no way to shut down the server, but keep the container running to run the script. ... Is there? Docker Containers live and die on the service they run, right?

'No create event in auth events' is a different error; does that appear anywhere in the logs? (Look at the section of log before you see your request to leave the room.)

It looks like it appears just after the UniqueViolation error. I can see there is an AuthError and other references like Missing auth events for... and stuff.

here is a section of the log. The as you can see the timestamps show this error occurs a couple of seconds after the unique violation error

2022-04-16 14:35:23,645 - synapse.handlers.federation_event - 1698 - WARNING - _process_incoming_pdus_in_room_inner-6-$nBFmFI5V7wPlpzMl_08Ziq4OMGJrqMlkjzu-il3pWIg-$qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI - Failed to get auth chain for <FrozenEventV3 event_id=$qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI, type=m.room.member, state_key=@_discord_701815929641631877:t2bot.io, outlier=False>: duplicate key value violates unique constraint "event_json_event_id_key"
DETAIL:  Key (event_id)=($bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc) already exists.

2022-04-16 14:35:23,649 - synapse.handlers.federation_event - 1712 - WARNING - _process_incoming_pdus_in_room_inner-6-$nBFmFI5V7wPlpzMl_08Ziq4OMGJrqMlkjzu-il3pWIg-$qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI - Missing auth events for <FrozenEventV3 event_id=$qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI, type=m.room.member, state_key=@_discord_701815929641631877:t2bot.io, outlier=False>: ['$owb4ZQGJHa6KxTLyVwxzEb_wILtqXNctM2SV1k_T49w', '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc', '$U54HzTsMC-HFMjTJb6Q3hsvFclqUj-O1tbyoT1chsiE']
2022-04-16 14:35:23,649 - synapse.handlers.federation_event - 1004 - ERROR - _process_incoming_pdus_in_room_inner-6-$nBFmFI5V7wPlpzMl_08Ziq4OMGJrqMlkjzu-il3pWIg-$qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI - Unexpected AuthError from _check_event_auth
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 996, in _process_received_pdu
    context = await self._check_event_auth(
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 1407, in _check_event_auth
    claimed_auth_events = await self._load_or_fetch_auth_events_for_event(
  File "/usr/local/lib/python3.9/dist-packages/synapse/handlers/federation_event.py", line 1720, in _load_or_fetch_auth_events_for_event
    raise AuthError(code=HTTPStatus.FORBIDDEN, msg="Auth events could not be found")
synapse.api.errors.AuthError: 403: Auth events could not be found
2022-04-16 14:35:23,649 - synapse.handlers.federation_event - 714 - WARNING - _process_incoming_pdus_in_room_inner-6-$nBFmFI5V7wPlpzMl_08Ziq4OMGJrqMlkjzu-il3pWIg-$qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI - Pulled event $qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI failed history check.
2022-04-16 14:35:23,650 - synapse.handlers.federation_event - 678 - INFO - _process_incoming_pdus_in_room_inner-6-$nBFmFI5V7wPlpzMl_08Ziq4OMGJrqMlkjzu-il3pWIg-$ZptgP0STwuPf0spImA4lJV_o8F4kWTIw9SK0IM1vR6s - Processing pulled event <FrozenEventV3 event_id=$ZptgP0STwuPf0spImA4lJV_o8F4kWTIw9SK0IM1vR6s, type=m.room.member, state_key=@_discord_185126606577205251:t2bot.io, outlier=False>
2022-04-16 14:35:23,653 - synapse.handlers.federation_event - 760 - INFO - _process_incoming_pdus_in_room_inner-6-$nBFmFI5V7wPlpzMl_08Ziq4OMGJrqMlkjzu-il3pWIg-$ZptgP0STwuPf0spImA4lJV_o8F4kWTIw9SK0IM1vR6s - Event $ZptgP0STwuPf0spImA4lJV_o8F4kWTIw9SK0IM1vR6s is missing prev_events ['$qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI']: calculating state for a backwards extremity
2022-04-16 14:35:23,653 - synapse.handlers.federation_event - 782 - INFO - _process_incoming_pdus_in_room_inner-6-$nBFmFI5V7wPlpzMl_08Ziq4OMGJrqMlkjzu-il3pWIg-$ZptgP0STwuPf0spImA4lJV_o8F4kWTIw9SK0IM1vR6s - Requesting state after missing prev_event $qHXDEATJe3sm0Js7ANAoMoOWRJQ6sM-9bJPiiXawARI

Please could you run both of these on both Postgres (psql) and SQLite (sqlite3 <old database>.db)?:

SELECT 1 FROM events WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';
SELECT 1 FROM event_json WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';

From my sqlite database (not the snapshots but the actual most resent of the sqlite ones):

sqlite> SELECT 1 FROM events WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';
1
sqlite> SELECT 1 FROM event_json WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';
1

From the current postgres database as it exists now:

synapse=# SELECT 1 FROM events WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';
 ?column?
----------
(0 rows)

synapse=# SELECT 1 FROM event_json WHERE event_id = '$bThEJSE4NlAU4XnLaHsHSn92zla7A5-5Q0BCn5DCJgc';
 ?column?
----------
        1
(1 row)

So this is the smoking gun that actually shows a desync?
I'm guessing event_json stores the actual data/details of the event and event is like some sort of directory?

Well thanks for pointing this part out too. But I guess the whole "running the db port script with synapse on" thing was what did me in. I should have taken it more seriously and run the script outside of the synapse container.

@H-Shay
Copy link
Contributor

H-Shay commented Apr 26, 2022

I am closing this as it appears to be the result of running the script with the server on, rather than a bug in Synapse/the port_db script.

@H-Shay H-Shay closed this as completed Apr 26, 2022
@reivilibre
Copy link
Contributor

Because it is in a docker container, there is no way to shut down the server, but keep the container running to run the script. ... Is there? Docker Containers live and die on the service they run, right?

@calvinjgs Although this seems to be what's done you in, it'd be good to understand why you wanted to keep the container running to run the script?
How did this look like for you? Were you trying to use docker exec or something against the running container?

If I was, for some reason, insistent on staying within Docker Containers, I think I would have used something like docker run -it --rm --entrypoint bash matrixdotorg/synapse and got in to the container that way to run a manual command.

@calvinjgs
Copy link
Author

@reivilibre,
I didn't want to keep the contianer running, but the script is inside the container. I can't run docker exec on a container that isn't running, right? So I ran

docker exec -it -w /data matrix synapse_port_db --sqlite-database homeserver.db --postgres-config homeserver-postgres.yaml

Is there actually a way to run the script without the server up?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants