-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/postgresql-repmgr] always does a full resync of the database on standby node when coming up #52213
Comments
Hi, We added an experimental flag to use pg_rewind instead. You can try it so a full clone does not happen. |
So per my comments I have tried this flag and it seems to have the problems - see the log I posted for some issue about the postgres config file not existing when it tries to run. Also, it seems a bit strange to me that even though nothing has happened during the restart it always wants to try to do a full resync and/or rewind? |
Hi @mzealey, I'm currently working on reproducing the issue you have mentioned. Could you kindly share the docker-compose file that you are using? Thanks. |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
This is still an issue. the Flag causes this error. ***I think I found the bug. Here are the docs to pg_rewind postgresql-repmgr 04:00:49.65 INFO ==> Rejoining node...
postgresql-repmgr 04:00:49.65 INFO ==> Using pg_rewind to primary node...
postgresql-repmgr 04:00:49.65 INFO ==> Running pg_rewind data to primary node...
pg_rewind: executing "/opt/bitnami/postgresql/bin/postgres" for target server to complete crash recovery
pg_rewind: executing "/opt/bitnami/postgresql/bin/postgres" for target server to complete crash recovery
postgres: could not access the server configuration file "/bitnami/postgresql/data/postgresql.conf": No such file or directory
pg_rewind: error: postgres single-user mode in target cluster failed
pg_rewind: detail: Command was: /opt/bitnami/postgresql/bin/postgres --single -F -D /bitnami/postgresql/data template1 < /dev/null
postgresql-repmgr 04:00:49.72 WARN ==> pg_rewind failed, resorting to data cloning
postgresql-repmgr 04:00:49.72 INFO ==> Cloning data from primary node...
WARNING: following problems with command line parameters detected:
-D/--pgdata will be ignored if a repmgr configuration file is provided
NOTICE: destination directory "/bitnami/postgresql/data" provided
INFO: connecting to source node The file location is here: postgres=# SHOW config_file;
config_file
----------------------------------------------
/opt/bitnami/postgresql/conf/postgresql.conf Am I missing something? After some more review I find this in the lib script ########################
# Execute pg_rewind to get data from the primary node
# Globals:
# REPMGR_*
# Arguments:
# None
# Returns:
# None
#########################
repmgr_pgrewind() {
info "Running pg_rewind data to primary node..."
local -r flags=("-D" "$POSTGRESQL_DATA_DIR" "--source-server" "host=${REPMGR_CURRENT_PRIMARY_HOST} port=${REPMGR_CURRENT_PRIMARY_PORT} user=${REPMGR_USERNAME} dbname=${REPMGR_DATABASE}")
if [[ "$REPMGR_USE_PASSFILE" = "true" ]]; then
PGPASSFILE="$REPMGR_PASSFILE_PATH" debug_execute "${POSTGRESQL_BIN_DIR}/pg_rewind" "${flags[@]}"
else
PGPASSWORD="$REPMGR_PASSWORD" debug_execute "${POSTGRESQL_BIN_DIR}/pg_rewind" "${flags[@]}"
fi
} The flag set here are just the default flags with no option for the location of the conf file in the data directory. I think with the current flag settings they are not finding the name of the actual conf that is there. local -r flags=("-D" "$POSTGRESQL_DATA_DIR" "--source-server" "host=${REPMGR_CURRENT_PRIMARY_HOST} port=${REPMGR_CURRENT_PRIMARY_PORT} user=${REPMGR_USERNAME} dbname=${REPMGR_DATABASE}") There is an option in PG_Rewind that would reference the correct conf: --config-file=filename
Use the specified main server configuration file for the target cluster. This affects pg_rewind when it uses internally the postgres command for the rewind operation on this cluster (when retrieving restore_command with the option -c/--restore-target-wal and when forcing a completion of crash recovery). the file name is here in the data directory: I have no name!@alive-postgresql-ha-postgresql-1:/bitnami/postgresql/data$ cat post
postgresql.auto.conf postmaster.opts postmaster.pid And this seems to be the one it wants I have no name!@xxxxx-postgresql-ha-postgresql-1:/bitnami/postgresql/data$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'host=''xxxxx-postgresql-ha-postgresql-0.alive-postgresql-ha-postgresql-headless.xxxxx-postgresql-ha.svc.cluster.local'' port=5432 user=repmgr application_name=''xxxxx-postgresql-ha-postgresql-1'' password=''xxxxxxxxxxxxx'' connect_timeout=5'
primary_slot_name = 'repmgr_slot_1001' Also, while reading the documentation it says that when using pg_rewind it requires full_page_writes which is commented out by default in the bitnami setup but the doc assumes it is on by default. Is this a setting in the values.yaml? Should we set this by default? It seems like even though it is commented out it is still enabled by default postgresql psql (16.1)
Type "help" for help.
postgres=# SHOW full_page_writes;
full_page_writes
------------------
on
(1 row)
postgres=#
I believe this would be the fix in the local -r flags=("-D" "$POSTGRESQL_DATA_DIR" "--source-server" "host=${REPMGR_CURRENT_PRIMARY_HOST} port=${REPMGR_CURRENT_PRIMARY_PORT} user=${REPMGR_USERNAME} dbname=${REPMGR_DATABASE}" "--config-file=postgresql.auto.conf") |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary. |
Name and Version
bitnami/postgresql-repmgr:16.0.0-debian-11-r11
What architecture are you using?
amd64
What steps will reproduce the bug?
We have a pair of postgresql-repmgr instances running with the following config:
The same on the other instance. I have tried setting
REPMGR_USE_PGREWIND
to fix this issue but to no avail.Restarting the standby instance causes the following logs:
What is the expected behavior?
After a brief restart I wouldn't have expected a full clone to need to happen.
It's also a bit strange that pg_rewind is not working although I appreciate this feature flag is not documented
What do you see instead?
per above, full pg_basebackup is happening each time the container restarts.
Additional information
No response
The text was updated successfully, but these errors were encountered: