PostgreSQL Stuck on Pre-Start
Description
Sometimes it can happen that the primary goes down (for an update, for example)
but both standby nodes disagree about who should be promoted. This causes the pre-start
to wait for a primary that is never promoted.
Identifying Issue
When this happens, you can see in the hanging node the following log line repeatedly:
Log file: /var/vcap/sys/log/postgresql-ha/pre-start.stderr.log
1224: Tue Jun 11 14:43:00 UTC 2019 - [info][ctl_utils.sh:192][get_master] - curl result: ###{"message":"Master node unavailable","status":"waiting","data":"Waiting for a standby to be promoted"}###
Note that this is normal to happen a few times, but if the node is waiting on pre-start
it should not stay in this state for more than a few minutes.
When running the status debug script (/var/vcap/jobs/postgresql-ha/bin/debug/status.sh
),
this is what the cluster state looks like:
{
"message": "Node status",
"status": "succeeded",
"data": {
"id": "bc6fd987-a0b0-42fe-a0be-2180881ba5f4",
"node_index": 0,
"node_ip": "10.244.20.10",
"bootstrap": true,
"firstboot": false,
"status": "no_response",
"blocked": true,
"latest_checkpoint": "2/5A000028",
"mode": "shutdown",
"master_in_majority_partition": false,
"upstream_node_id": "bc6fd987-a0b0-42fe-a0be-2180881ba5f4",
"failover_in_progress": false,
"processes": {
"postgresql": "unknown",
"repmgrd": "unknown"
},
"pg_config": {
"max_connections": "1024"
}
}
}
{
"message": "Node status",
"status": "succeeded",
"data": {
"id": "835493aa-9312-4792-b2a8-1d62b6e5f6f8",
"node_index": 1,
"node_ip": "10.244.21.10",
"bootstrap": false,
"firstboot": false,
"status": "running",
"blocked": false,
"latest_checkpoint": "2/5A000028",
"mode": "standby",
"master_in_majority_partition": false,
"upstream_node_id": "unknown",
"failover_in_progress": false,
"processes": {
"postgresql": "running",
"repmgrd": "starting"
},
"pg_config": {
"max_connections": "1024"
}
}
}
{
"message": "Node status",
"status": "succeeded",
"data": {
"id": "ab3ff693-4645-4ddd-8f7c-294e0028871b",
"node_index": 2,
"node_ip": "10.244.22.10",
"bootstrap": false,
"firstboot": false,
"status": "running",
"blocked": false,
"latest_checkpoint": "2/5A000028",
"mode": "standby",
"master_in_majority_partition": false,
"upstream_node_id": "unknown",
"failover_in_progress": false,
"processes": {
"postgresql": "running",
"repmgrd": "starting"
},
"pg_config": {
"max_connections": "1024"
}
}
}
From there, you can see that the primary node (node being updated) has its main processes in
unknown
state. And both standby nodes have the PostgreSQL process running, but repmgr is
always trying to start.
When investigating the logs of the standby nodes, you should see the following lines:
Log File: /var/vcap/sys/log/postgresql-ha/repmgrd.err.log
On standby node 2:
[2019-06-11 09:26:43] [NOTICE] node 3 is the best candidate for new master, attempting to follow...
[2019-06-11 09:26:44] [NOTICE] node 2 now following new upstream node 3
[2019-06-11 09:26:44] [WARNING] Unable to create event record: ERROR: cannot execute INSERT in a read-only transaction
And on standby node 3:
[2019-06-11 09:26:44] [NOTICE] node 2 is the best candidate for new master, attempting to follow...
[2019-06-11 09:26:45] [NOTICE] node 3 now following new upstream node 2
[2019-06-11 09:26:45] [WARNING] Unable to create event record: ERROR: cannot execute INSERT in a read-only transaction
Note that node 2 thinks node 3 is the best candidate, and node 3 thinks node 2 is the best candidate.
They both try to follow each other, but they can not update their repmgr
records
since they are both standby nodes, and therefore read-only nodes.
Recovering Cluster
Identifying Which Node to Promote to Primary
Check for one of the standby nodes with latest checkpoint. Connect agains the local PostgreSQL on each standby node and execute the following query:
SELECT pg_last_wal_receive_lsn();
Note that because they are both standby nodes, we use pg_last_wal_receive_lsn
,
if you want to execute it on the primary, please use pg_current_wal_lsn
.
Choose the node with latest WAL LSN, usually they are both at the same point, in this case you can choose any of the nodes for the following steps.
Be aware of standby nodes with broken replication. By promoting a standby node that is too far behind the current primary, you risk data loss. In this case, follow the procedure described on Replication Lag and as always, make a copy of the current data directory of all nodes.
Promoting Node Manually
The promotion scrip will check for replication lag, and promotion will fail if the replication lag
is bigger than the configured limit. You can check the replication lag as vcap
user:
source /var/vcap/jobs/postgresql-ha/helpers/postgresql_utils.sh
$(check_replication_lag)
As system user, execute:
The versions differ in the user that runs the main PostgreSQL processes.
- a9s PostgreSQL 13 and below use the
vcap
user. - a9s PostgreSQL 15 and above use the
postgres
system user.
/var/vcap/jobs/postgresql-ha/bin/promote_command.sh master
If promotion is failing due to replication lag being to big, force the promotion:
/var/vcap/jobs/postgresql-ha/bin/promote_command.sh force-master
Keep an eye of the cluster status while executing the script, you can check that with /var/vcap/jobs/postgresql-ha/bin/debug/status.sh
.
Wait until the script finishes, after this you cluster should be able to go ahead with the update. If you have canceled the update task somehow, you can trigger it again.