Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

old primary node rejoin after failover uncompatitable with pg_rewind #829

Open
Zhaoxun opened this issue Oct 11, 2023 · 2 comments
Open

Comments

@Zhaoxun
Copy link

Zhaoxun commented Oct 11, 2023

First I create role 'rep' for repmgr to use like these:

CREATE USER rep replication;
CREATE database repmgr WITH OWNER rep;
CREATE EXTENSION repmgr;
 GRANT pg_checkpoint TO rep;
 GRANT pg_read_all_stats TO rep;
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean) TO rep;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO rep;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO rep;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO rep;

So 'rep' is the least privileged user according to repmgr's official document and postgresql's pg_rewind documentation.

However, after encountering the strange error occurred on both pg15 and pg16 like these:

repmgr node rejoin --force-rewind -d 'host=192.168.1.2 port=5432 user=rep dbname=repmgr connect_timeout=5'
NOTICE: rejoin target is node "yzx2" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/81C6D80
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/pgsql-15/bin/pg_rewind -D '/pgdata' --source-server='host=192.168.1.2 port=5432 user=rep dbname=repmgr connect_timeout=5'"
ERROR: pg_rewind execution failed
DETAIL: pg_rewind: error: could not fetch remote file "global/pg_control": ERROR:  permission denied for function pg_read_binary_file

https://serverfault.com/questions/1144242/

I re-checked the documentation for pg_rewind again, and found a line:

-source-server=connstr
Specifies a libpq connection string to connect to the source PostgreSQL server to synchronize the target with. The connection must be a normal (non-replication) connection with a role having sufficient permissions to execute the functions used by pg_rewind on the source server (see Notes section for details) or a superuser role.

So I guess it is the 'replication role' that caused the issue, thus
ALTER ROLE rep NOREPLICATION;

But now the rewind encounters another problem - it won't process unless it is a replication role!

 repmgr node rejoin --force-rewind -d 'host=172.17.1.2 port=5432 user=rep dbname=repmgr connect_timeout=5'
NOTICE: rejoin target is node "yzx2" (ID: 2)
ERROR: connection to database failed
DETAIL:
connection to server at "172.17.1.2", port 5432 failed: FATAL:  must be superuser or replication role to start walsender

ERROR: unable to establish a replication connection to the rejoin target node

So either way it does not work!
repmgr =the latest version 5.4dev on both pg15 and pg16!

@Zhaoxun
Copy link
Author

Zhaoxun commented Oct 11, 2023

BTW , I tested pure pg_rewind with noreplication role rep and it worked:

pg_rewind -D '/pgdata' --source-server='host=172.17.1.2 port=5432 user=rep dbname=postgres connect_timeout=5'
pg_rewind: servers diverged at WAL location 0/81C4E60 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/81C4DE8 on timeline 1
pg_rewind: Done!

So for sheer pg_rewind it does not need a replication role, and on the contrary, it needs a noreplication role. But repmgr node rejoin requires to execute pg_rewind as well as using a replication role! Thus it would fail no matter the role is replication or noreplication!

@stephan-hahn
Copy link

Altering the user from replication to noreplication doesn't fix the problem of missing execution permissions.
The easy way is to make rep a superuser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants