Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Automatic database migration from 0.23.0 to 0.24.0 does not work with postgres #2351

Closed
3 of 4 tasks
sysvinit opened this issue Jan 17, 2025 · 8 comments · Fixed by #2367
Closed
3 of 4 tasks
Labels
bug Something isn't working
Milestone

Comments

@sysvinit
Copy link

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I tried to update my headscale instance, which uses postgres as the database, from 0.23.0 to 0.24.0 using the Debian packages provided as part of the releases on Github. However, after installing the new package, headscale failed to start due to problems with the database migration, with the following message in the logs:

Jan 17 16:48:47 hostname headscale[130595]: 2025-01-17T16:48:47Z INF Opening database database=postgres path="host=/run/postgresql dbname=headscale user=headscale"
Jan 17 16:48:47 hostname headscale[130595]: 2025-01-17T16:48:47Z FTL Migration failed: ERROR: constraint "uni_users_name" of relation "users" does not exist (SQLSTATE 42704) error="ERROR: constraint \"uni_users_name\" of relation \"users\" does not exist (SQLSTATE 42704)"

I was also left unable to blindly downgrade to 0.23.0, as the new version had already executed a migration successfully before encountering the error, leaving my database in an inconsistent state that would not have been supported by the old version.

Expected Behavior

Headscale executes the database migration without errors, and then proceeds to function normally.

Steps To Reproduce

  1. Install headscale 0.23.0 using postgres as the database.
  2. Update the installed version of headscale to 0.24.0, and attempt to start the headscale process again.

Environment

- OS: Debian 12
- Headscale version: 0.23.0
- Tailscale version: N/A

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Anything else?

I manually inspected the migrations table in the database and compared this with the code in 0.24.0, which indicates that this was a problem with migration 202407191627. This is an automatic migration executed by gorm to update the schema of the users table -- I'm not sure if there have been changes in gorm, but I'm not familiar with the library so I didn't investigate that further.

As I mentioned above, my database was left in an inconsistent state which prevented me from downgrading (in fairness I should have backed up my database before performing the upgrade...). However, my database did have a uniqueness constraint for users.name as implied by the log message above, but in my database the constraint was called users_name_key instead of uni_users_name.

I used the following SQL to rename the constraint on the users table, and with this change the migration which had been causing me problems then executed correctly:

begin;
alter table users add constraint uni_users_name unique (name);
alter table users drop constraint users_name_key;
commit;

My headscale instance now appears to be working correctly, though I haven't tried adding new users or nodes to the Tailnet yet.

@sysvinit sysvinit added the bug Something isn't working label Jan 17, 2025
@kradalby
Copy link
Collaborator

This is unfortunate, I will not have time to look into this until next week, but in the meantime if someone has a Postgres backup that can reproduce this issue, I would appreciate getting it.

As often mentioned we don't have the time to do the extensive testing for Postgres as we have so much other things to fix.

A personal rant;
I personally have less and less appreciation for Gorm, and if this turns out to be some sort of auto magic footgun that doesn't help.
I want to get rid of Gorm, but that means get rid of Postgres and only support one database, which we are not doing, so it will likely continue like this where we try to test things, but if no one tests the betas with the setups we can't test, then this will continue to happen. In this case I suspect the beta testers didn't run Postgres so we didn't discover it.

@kradalby kradalby pinned this issue Jan 19, 2025
@sysvinit
Copy link
Author

This is unfortunate, I will not have time to look into this until next week, but in the meantime if someone has a Postgres backup that can reproduce this issue, I would appreciate getting it.

I do have pg_dumpall backups of the entire postgres server on that machine as part of the daily system backups (useful for restoring after a storage failure, less so for restoring individual tables in the heat of the moment). I could get you a copy of the headscale DB from my server from the night before I attempted the upgrade, though I'll need to find some free time next week to spin up a test instance of postgres which I can load the backup into first in order to extract the headscale parts (and censor things like IP addresses). What would be the best way to send you the DB dump?

@kradalby
Copy link
Collaborator

Great, email in my GitHub would be sufficient

kradalby added a commit to kradalby/headscale that referenced this issue Jan 22, 2025
Fixes juanfont#2351

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
kradalby added a commit to kradalby/headscale that referenced this issue Jan 23, 2025
Fixes juanfont#2351

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
kradalby added a commit to kradalby/headscale that referenced this issue Jan 23, 2025
* fix postgres migration issue with 0.24

Fixes juanfont#2351

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* add postgres migration test for 2351

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* update changelog

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

---------

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
@sysvinit
Copy link
Author

Given you've found a fix already -- do you still need a copy of my headscale database out of my backups?

@kradalby
Copy link
Collaborator

No, thank you, I got one from another user and wrote a test based on that.

@nblock nblock added this to the v0.24.1 milestone Jan 27, 2025
socoldkiller pushed a commit to socoldkiller/headscale that referenced this issue Jan 30, 2025
* fix postgres migration issue with 0.24

Fixes juanfont#2351

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* add postgres migration test for 2351

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* update changelog

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

---------

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
@kradalby kradalby unpinned this issue Jan 30, 2025
@panteparak
Copy link

panteparak commented Feb 1, 2025

Hi @kradalby

Just a heads up, somehow the migration in the fix PR have no effect and produce the same error.
I had to copy out the SQL statement AS IS and run manually on the DB to fix it.

Upgrading from 0.22.3 -> 0.24.2 with Postgres.

If you wanted to investigate further, I could sent you my SQL dump v0.22.3.

@kradalby
Copy link
Collaborator

kradalby commented Feb 1, 2025

Yes please, it worked from 0.23, but maybe the step from 0.22 was different, 0.23 is the time we introduced migrations so it would not surprise me. Email is in my profile

@panteparak
Copy link

@kradalby
Sorry for the late reply, any tips on sanitising the db dump?, From what I can see, I should strip out IPs, nodekeys, discokeys. Anything else should be done? or is there a script clean this all up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants