You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assumingly due to matrix-org/synapse#13632, the master process is unable to handle replication requests by workers due to the load from purge jobs. It is happily logging updates on the purge job states while clients can't connect anymore.
Steps to reproduce
enable message retention
maybe be a big instance idk
wait for the scheduled job to execute
Homeserver
tchncs.de
Synapse Version
1.94.0
Installation Method
pip (from PyPI)
Database
PostgreSQL
Workers
Multiple workers
Platform
Debian GNU/Linux 12 (bookworm), dedicated
Configuration
draupnir module, presence, retention
Relevant log output
synapse.replication.tcp.client - 352 - INFO - _process_incoming_pdus_in_room_inner-124023-$fbrT_6mck678v_gNV527V0f5Jp4kvbDiQVSeHOmiN2E - Finished waiting for repl stream 'events' to reach 361593234 (event_persister1)
synapse.http.client - 923 - INFO - PUT-890470 - Received response to POST synapse-replication://master/_synapse/replication/fed_send_edu/m.receipt/IjFSBKBxIa: 200
synapse.replication.tcp.client - 332 - INFO - PUT-890470 - Waiting for repl stream 'caches' to reach 416737455 (master); currently at: 416710210
synapse.replication.tcp.client - 342 - WARNING - PUT-890464 - Timed out waiting for repl stream 'caches' to reach 416737417 (master); currently at: 416710510
synapse.replication.tcp.client - 342 - WARNING - PUT-890470 - Timed out waiting for repl stream 'caches' to reach 416737422 (master); currently at: 416710510
synapse.replication.tcp.client - 342 - WARNING - PUT-890470 - Timed out waiting for repl stream 'caches' to reach 416737422 (master); currently at: 416710510
synapse.replication.tcp.client - 342 - WARNING - PUT-890470 - Timed out waiting for repl stream 'caches' to reach 416737422 (master); currently at: 416710510
synapse.replication.tcp.client - 342 - WARNING - PUT-890470 - Timed out waiting for repl stream 'caches' to reach 416737422 (master); currently at: 416710510
synapse.replication.tcp.client - 342 - WARNING - PUT-890470 - Timed out waiting for repl stream 'caches' to reach 416737422 (master); currently at: 416710510
synapse.replication.http._base - 300 - WARNING - GET-2559861 - presence_set_state request timed out; retrying
synapse.replication.http._base - 312 - WARNING - PUT-899550 - fed_send_edu request connection failed; retrying in 1s: ConnectError(<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>)
synapse.http.client - 932 - INFO - PUT-901284 - Error sending request to POST synapse-replication://master/_synapse/replication/fed_send_edu/m.presence/WCoECfmCdH: ConnectError [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.
]
The text was updated successfully, but these errors were encountered:
Same here with Docker + Worker Setup Synapse v1.101.0.
Currently, deactivating the retention does not seem to be enough, the problem reappears after a while.
Update:
I have set the retention.enabled to false and still every night at 5am after a PostgreSQL backup I get the errors
"Timed out waiting for repl stream".
Update 2:
ok I also get the error in the middle of the day.
Can someone please take care of this?
This issue has been migrated from #16489.
Description
Assumingly due to matrix-org/synapse#13632, the master process is unable to handle replication requests by workers due to the load from purge jobs. It is happily logging updates on the purge job states while clients can't connect anymore.
Steps to reproduce
Homeserver
tchncs.de
Synapse Version
1.94.0
Installation Method
pip (from PyPI)
Database
PostgreSQL
Workers
Multiple workers
Platform
Debian GNU/Linux 12 (bookworm), dedicated
Configuration
draupnir module, presence, retention
Relevant log output
The text was updated successfully, but these errors were encountered: