Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

we should run the 'are_all_users_on_domain' database sanity check before running schema migrations #6870

Closed
aspacca opened this issue Feb 7, 2020 · 6 comments
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)

Comments

@aspacca
Copy link

aspacca commented Feb 7, 2020

Description

When I run a search on my hosted matrix service from any client I see no results.
I can see the correct count but not the actual results

Steps to reproduce

  • Doing a search
  • No results showed
  • I should see the results

Version information

  • Homeserver:

If not matrix.org:

  • Version:
    {"server_version": "1.9.1", "python_version": "3.7.3"}
  • Install method:

Debian apt repo package

  • Platform:
    Debian 10

I found the query that's run on sqlite for search and I saw it doesn't work properly:
https://github.com/matrix-org/synapse/blob/v1.9.1/synapse/storage/data_stores/main/search.py#L424
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg71741.html

I then realised that the bultin sqlite3 function is overidden by https://github.com/matrix-org/synapse/blob/v1.9.1/synapse/storage/engines/sqlite.py#L76

my guess is that somehow the function overide is not applied and so the bultin failing rank function is used

@aspacca aspacca changed the title Search broken on sqlite DB Migration run with unvalid configuration Feb 9, 2020
@aspacca
Copy link
Author

aspacca commented Feb 9, 2020

I was finally able to identify the issue.
It's totally unrelated to the rank function but rather to the new local_current_membership table introduced in 1.9.1 relase and relative db migration.
I use debian sid repo to manage my synapse installation. I observed that since a few version the server name is not anymore defined in homeserver.yaml but rather from conf.d/server_name.yaml
The systemd unit ExecStart is the following
/usr/bin/python3 -m synapse.app.homeserver --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/
Over every update of the synapse package the content of conf.d/server_name.yaml is replaced with the name of the server coming from /etc/hostname and I have to manually change it back to the production server name and restart the systemd service to make it work otherwise synapse will fail to start due to mismatching between the server name defined in the conf and the one in the db.
Apparently this doesn't stop the migration to be run in background: for the specific case I guess the server name from the configuration was used to populate local_current_membership table ending in an empty one since where was no matching local user for the wrong server name in the db.
This would have been prevented if the migration (especially since it is relying on the server name conf) was not run until the invalid server name configuration was fixed.

@aspacca aspacca changed the title DB Migration run with unvalid configuration DB Migration run with invalid server name configuration Feb 9, 2020
@richvdh
Copy link
Member

richvdh commented Feb 9, 2020

I'm afraid I don't quite understand your explanation.

If the problem is that conf.d/server_name.yaml is being overwritten, please can you raise that with the maintainers of the debian package?

@richvdh richvdh changed the title DB Migration run with invalid server name configuration database updates run with invalid server name configuration Feb 9, 2020
@richvdh
Copy link
Member

richvdh commented Feb 9, 2020

(title updated for clarification: we use 'database migration' to mean moving from sqlite to postgres)

@aspacca
Copy link
Author

aspacca commented Feb 10, 2020

@richvdh I will clarify better
The problem with the conf.d/server_name.yaml overwritten has to be raised with the mantainers of the debian package, and I will do.

But there's another problem and you should be able to reproduce with the following steps:

  1. Install matrix 1.9.0 with sqlite defining server_name: whatever.domain.com
  2. Register some users for the server
  3. Shut down the server and change to server_name: another.domain.com
  4. Upgrade to matrix 1.9.1, it should fail to start because of mismatching between the server name in the db and the one in the config
  5. Check in the db the existance of the table local_current_membership. It should be there but empty
  6. Fix the server name and restart matrix. Now it is working.
  7. Use search feature in any client: no results

The problem is that at point 4 the database update for local_current_membership was applied even if the server name configuration was invalid
Relying on the config.server_name the query that populates the table results in inserting no data:

cur.execute(sql, ("%:" + config.server_name,))

Once you fix the server name and restart properly the server the migration is already run and local_current_membership will stay empty breaking the search

My fix was to manually delete the table, remove the migration from the db and restart the server with the proper server name

@richvdh
Copy link
Member

richvdh commented Feb 10, 2020

ah I see. Thanks

@richvdh richvdh changed the title database updates run with invalid server name configuration we should run the 'are_all_users_on_domain' database sanity check before running schema migrations Feb 10, 2020
@richvdh richvdh added z-bug (Deprecated Label) z-p2 (Deprecated Label) and removed info-needed labels Feb 10, 2020
richvdh added a commit that referenced this issue Feb 24, 2020
Some of the database deltas rely on `config.server_name` being set correctly,
so we should check that it is before running the deltas.

Fixes #6870.
@richvdh
Copy link
Member

richvdh commented Feb 25, 2020

done in #6982

@richvdh richvdh closed this as completed Feb 25, 2020
richvdh added a commit that referenced this issue Feb 25, 2020
Some of the database deltas rely on `config.server_name` being set correctly,
so we should check that it is before running the deltas.

Fixes #6870.
hawkowl pushed a commit that referenced this issue Feb 26, 2020
Some of the database deltas rely on `config.server_name` being set correctly,
so we should check that it is before running the deltas.

Fixes #6870.
phil-flex pushed a commit to phil-flex/synapse that referenced this issue Apr 15, 2020
Some of the database deltas rely on `config.server_name` being set correctly,
so we should check that it is before running the deltas.

Fixes matrix-org#6870.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

2 participants