Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very long query times for get_auth_chain_difference_chains #17129

Open
poettig opened this issue Apr 26, 2024 · 14 comments · May be fixed by #17154 or #17169
Open

Very long query times for get_auth_chain_difference_chains #17129

poettig opened this issue Apr 26, 2024 · 14 comments · May be fixed by #17154 or #17169

Comments

@poettig
Copy link

poettig commented Apr 26, 2024

Description

Since upgrading to v1.105.1 because of the security patch, we are seeing very high transaction times specifically for the transaction "get_auth_chain_difference_chains". In our case, we directly upgraded from v1.102.0 to v1.105.1, but heard from others with the same problem after upgrading from v1.104.0.

This could be a regression caused by #17044 which modified code called through several indirections by "get_auth_chain_difference_chains" or a regression caused by the security patches.

image

Steps to reproduce

  • Run a large enough homseserver using Synapse v1.105.1 joined to many rooms
  • Observe regular long-running transactions "get_auth_chain_difference_chains"

Homeserver

kit.edu

Synapse Version

1.105.1

Installation Method

Debian packages from packages.matrix.org

Database

postgresql 13.14-0+deb11u1

Workers

Multiple workers

Platform

root@main-1:~# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

root@main-1:~# uname -a
Linux main-1 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux

Synapse Main, Workers and the database all run in virtual machines.

Configuration

Presence is enabled.

Relevant log output

I'm not sure which logs would be helpful here and I can't find any relevant log entries in the affected code. Can add logs if you give me a hint which contexts I have to set to debug :)

Anything else that would be useful to know?

No response

@rda0
Copy link
Contributor

rda0 commented Apr 26, 2024

I believe we have the same problem. We noticed increased CPU usage after the upgrade from 1.104.0 to 1.105.1. This occurs occasionally for a few minutes.

cpu_usage_matrix_1 105 1

2024-04-26 17_08_23-phd-matrix synapse prometheus 20231213 - linux - Dashboards - Grafana — Mozilla

Homeserver: phys.ethz.ch
Synapse Version: 1.105.1
Installation Method: pip
Database: postgresql-15 15.6-0+deb12u1
Workers: Multiple workers
Presence: disabled
Platform:

+phd-matrix:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
+phd-matrix:~# uname -a
Linux phd-matrix 6.1.0-12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.52-1 (2023-09-07) x86_64 GNU/Linux

@csett86
Copy link

csett86 commented Apr 26, 2024

Same here on a semi-large homeserver deployment. Issues started with 1.105.0, get_auth_chain_difference_chains running for minutes instead of seconds. It persistet through 1.105.1:

Bildschirmfoto 2024-04-26 um 23 59 07

This evening I took a chance and reverted #17044 locally to see if it would help, but then federation breaks, so dont do it:

  File "synapse/storage/databases/main/events.py", line 464, in _persist_events_txn
    self._persist_event_auth_chain_txn(txn, [e for e, _ in events_and_contexts])
  File "synapse/storage/databases/main/events.py", line 562, in _persist_event_auth_chain_txn
    self._add_chain_cover_index(
  File "synapse/storage/databases/main/events.py", line 779, in _add_chain_cover_index
    for links in EventFederationStore._get_chain_links(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'EventFederationStore' has no attribute '_get_chain_links'

@csett86
Copy link

csett86 commented Apr 27, 2024

One more update: By reverting #17044, but keeping the class method in place for the security fix 55b0aa8, I now have a synapse that is back to normal for get_auth_chain_difference_chains:

Bildschirmfoto 2024-04-27 um 17 21 04

From what I have spotted in #17044 so far:

The first usage of the query in the previous code:

  1. First builds the links: https://github.com/element-hq/synapse/pull/17044/files#diff-1f5d8bffadd3271e42c2f6a66474bbf5e3e6694b009aa616b5f5433506217ff1L320-L330
  2. Then runs some functions on it: https://github.com/element-hq/synapse/pull/17044/files#diff-1f5d8bffadd3271e42c2f6a66474bbf5e3e6694b009aa616b5f5433506217ff1L332-L336
  3. Then does the difference_update: https://github.com/element-hq/synapse/pull/17044/files#diff-1f5d8bffadd3271e42c2f6a66474bbf5e3e6694b009aa616b5f5433506217ff1L338

In the new code, as the yield is at the end, I think step 2 and step 3 are switched (so first building links, then running difference_update, then step 2 from above).

On the second usage, the difference_updateis inside the inner set_to_chain loop https://github.com/element-hq/synapse/pull/17044/files#diff-1f5d8bffadd3271e42c2f6a66474bbf5e3e6694b009aa616b5f5433506217ff1L621, while with the new code its running at the end of the outer loop and using links as its parameter.

I am not sure which of the two usages is causing the long running queries, but reverting both to the earlier version helped to get back to normal. From my limited understanding at least the yield and the difference_update in the new class method probably need to be in the reverse order to get closer to the old implementation.

@TheDidek
Copy link

TheDidek commented Apr 30, 2024

For me usage jumped crazily completely clogging the database lately after v1.105.1

swappy-20240430_141654

obraz

obraz

@heftig
Copy link

heftig commented May 2, 2024

It looks like the refactoring missed that the chains added by _materialize to the chains dict should be removed from chains_to_fetch, so I guess it keeps requesting the same chains over and over, leading to a quadratic explosion.

@heftig
Copy link

heftig commented May 3, 2024

I've also partially reverted #17044 with apparent success. This is the diff:

Diff
diff --git c/synapse/storage/databases/main/event_federation.py w/synapse/storage/databases/main/event_federation.py
index fb132ef09..97ac07c3a 100644
--- c/synapse/storage/databases/main/event_federation.py
+++ w/synapse/storage/databases/main/event_federation.py
@@ -280,16 +280,64 @@ class EventFederationWorkerStore(SignatureWorkerStore, EventsWorkerStore, SQLBas
 
         # Now we look up all links for the chains we have, adding chains that
         # are reachable from any event.
+        #
+        # This query is structured to first get all chain IDs reachable, and
+        # then pull out all links from those chains. This does pull out more
+        # rows than is strictly necessary, however there isn't a way of
+        # structuring the recursive part of query to pull out the links without
+        # also returning large quantities of redundant data (which can make it a
+        # lot slower).
+        sql = """
+            WITH RECURSIVE links(chain_id) AS (
+                SELECT
+                    DISTINCT origin_chain_id
+                FROM event_auth_chain_links WHERE %s
+                UNION
+                SELECT
+                    target_chain_id
+                FROM event_auth_chain_links
+                INNER JOIN links ON (chain_id = origin_chain_id)
+            )
+            SELECT
+                origin_chain_id, origin_sequence_number,
+                target_chain_id, target_sequence_number
+            FROM links
+            INNER JOIN event_auth_chain_links ON (chain_id = origin_chain_id)
+        """
 
         # A map from chain ID to max sequence number *reachable* from any event ID.
         chains: Dict[int, int] = {}
-        for links in self._get_chain_links(txn, set(event_chains.keys())):
+
+        # Add all linked chains reachable from initial set of chains.
+        chains_to_fetch = set(event_chains.keys())
+        while chains_to_fetch:
+            batch2 = tuple(itertools.islice(chains_to_fetch, 1000))
+            chains_to_fetch.difference_update(batch2)
+            clause, args = make_in_list_sql_clause(
+                txn.database_engine, "origin_chain_id", batch2
+            )
+            txn.execute(sql % (clause,), args)
+
+            links: Dict[int, List[Tuple[int, int, int]]] = {}
+
+            for (
+                origin_chain_id,
+                origin_sequence_number,
+                target_chain_id,
+                target_sequence_number,
+            ) in txn:
+                links.setdefault(origin_chain_id, []).append(
+                    (origin_sequence_number, target_chain_id, target_sequence_number)
+                )
+
             for chain_id in links:
                 if chain_id not in event_chains:
                     continue
 
                 _materialize(chain_id, event_chains[chain_id], links, chains)
 
+            chains_to_fetch.difference_update(chains)
+
         # Add the initial set of chains, excluding the sequence corresponding to
         # initial event.
         for chain_id, seq_no in event_chains.items():
@@ -579,16 +627,61 @@ class EventFederationWorkerStore(SignatureWorkerStore, EventsWorkerStore, SQLBas
 
         # Now we look up all links for the chains we have, adding chains that
         # are reachable from any event.
+        #
+        # This query is structured to first get all chain IDs reachable, and
+        # then pull out all links from those chains. This does pull out more
+        # rows than is strictly necessary, however there isn't a way of
+        # structuring the recursive part of query to pull out the links without
+        # also returning large quantities of redundant data (which can make it a
+        # lot slower).
+        sql = """
+            WITH RECURSIVE links(chain_id) AS (
+                SELECT
+                    DISTINCT origin_chain_id
+                FROM event_auth_chain_links WHERE %s
+                UNION
+                SELECT
+                    target_chain_id
+                FROM event_auth_chain_links
+                INNER JOIN links ON (chain_id = origin_chain_id)
+            )
+            SELECT
+                origin_chain_id, origin_sequence_number,
+                target_chain_id, target_sequence_number
+            FROM links
+            INNER JOIN event_auth_chain_links ON (chain_id = origin_chain_id)
+        """
+
+        # (We need to take a copy of `seen_chains` as we want to mutate it in
+        # the loop)
+        chains_to_fetch = set(seen_chains)
+        while chains_to_fetch:
+            batch2 = tuple(itertools.islice(chains_to_fetch, 1000))
+            clause, args = make_in_list_sql_clause(
+                txn.database_engine, "origin_chain_id", batch2
+            )
+            txn.execute(sql % (clause,), args)
+
+            links: Dict[int, List[Tuple[int, int, int]]] = {}
+
+            for (
+                origin_chain_id,
+                origin_sequence_number,
+                target_chain_id,
+                target_sequence_number,
+            ) in txn:
+                links.setdefault(origin_chain_id, []).append(
+                    (origin_sequence_number, target_chain_id, target_sequence_number)
+                )
 
-        # (We need to take a copy of `seen_chains` as the function mutates it)
-        for links in self._get_chain_links(txn, set(seen_chains)):
             for chains in set_to_chain:
                 for chain_id in links:
                     if chain_id not in chains:
                         continue
 
                     _materialize(chain_id, chains[chain_id], links, chains)
 
+                chains_to_fetch.difference_update(chains)
                 seen_chains.update(chains)
 
         # Now for each chain we figure out the maximum sequence number reachable

@jameskitt616
Copy link

I see similar behavior since v1.105.0 on my synapse grafana dashboard.
I've also noticed that the disk writes are now constantly at 100-200MB/s, which causes the postgres service to gather multiple GB's or disk writes in a few minutes (as you can see by this screenshot made in iotop).

image

@csett86 csett86 linked a pull request May 4, 2024 that will close this issue
3 tasks
csett86 added a commit to csett86/synapse that referenced this issue May 4, 2024
For details see element-hq#17129. Compared to a full revert, we
keep the class method instroduced in element-hq#17129, as its used
elsewhere by now

Fixes: element-hq#17129
Signed-off-by: Christoph Settgast <csett86_git@quicksands.de>
csett86 added a commit to csett86/synapse that referenced this issue May 4, 2024
For details see element-hq#17129. Compared to a full revert, we
keep the class method instroduced in element-hq#17129, as its used
elsewhere by now

Fixes: element-hq#17129
Signed-off-by: Christoph Settgast <csett86_git@quicksands.de>
csett86 added a commit to csett86/synapse that referenced this issue May 4, 2024
For details see element-hq#17129. Compared to a full revert, we
keep the class method instroduced in element-hq#17129, as its used
elsewhere by now

Fixes: element-hq#17129
Signed-off-by: Christoph Settgast <csett86_git@quicksands.de>
@reivilibre reivilibre linked a pull request May 29, 2024 that will close this issue
@strutztm
Copy link

This prevents us from updating our homeservers to v1.105.1 or later, since on those versions our synapse instances are entirely unusable. In consequence, our user experience will be severely degraded when matrix.org sunsets unauthenticated media on September 4th. What's the path going forward? Do we need to deploy a custom-built image with commits from #17154 or #17169, or is there any chance of a timely patch release with a fix?

@jameskitt616
Copy link

jameskitt616 commented Aug 29, 2024

I'd also would demand a timely fix, having a software causing use of 100% of the writes of drives is just a deathtrap for any SSD and just eats through the expensive drives by writing hundreds of GB or TB per day to disk (Causing expensive Damage to all users affected by this issue). It also causes incredibly high IO for the rest of the system. IMO this is absolutely irresponsible behavior and should be instantly addressed.

Leaving aside that it also makes the homeserver barely use-able at times.

@jameskitt616
Copy link

jameskitt616 commented Oct 1, 2024

image

Just a little reminder what what this bug means for some SSDs. This my my Synapse VM's Disk IO for the past year averaged at 100MB/s diskwrite. For the past few months since the boken Synapse release which introduced this Bug in (~ April 2024), Synapse-Postgres wrote at least 1.1PB to my SSDs causing absolutely unnecessary wear (i cant just replace my SSDs all few months because of this bug)

image

@jameskitt616
Copy link

Seems like i found the culprit. It was me all along.
I had 8GB RAM for synapse + postgres. And with all my testing and nights of troubleshooting i never tried to increase the RAM it seems.
So postgres started to swap and so on.
I changed the VMs swappiness and increased the VMs RAM and now postgres is back to normal writes.
I apologize for my non-professionalism.

@n-peugnet
Copy link

I also suspect that this issue might be worse when combined with #3364 which is, in my case, probably mainly caused by #9406. It would be very good if these issues could be taken seriously by Element, because this really threatens smaller deployment. We probably won't be able to continue providing Matrix hosting in these conditions.

@daedric7
Copy link

daedric7 commented Nov 8, 2024

I also suspect that this issue might be worse when combined with #3364 which is, in my case, probably mainly caused by #9406. It would be very good if these issues could be taken seriously by Element, because this really threatens smaller deployment. We probably won't be able to continue providing Matrix hosting in these conditions.

I don't use retention, but i've been using this for unreferenced state groups: https://github.com/erikjohnston/synapse-find-unreferenced-state-groups

@bonswouar
Copy link

bonswouar commented Dec 7, 2024

FYI, after upgrading from postgres 12 to 17 and re-indexing, it got a bit better on my instance:
image
Although I'd still expect it to always be less than 1min seeing my specs, a real fix would be greatly appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants