Cluster Status
Since anynines-deployment v4.0.0, a9s PostgreSQL includes a debug script. This script should help operators to properly identify the cluster state and the events currently happening.
The purpose of this document is to explain how to use and the output of this script, as well as how to properly identify the valid primary of the cluster.
Status Script
a9s PostgreSQL collocates the a9s PostgreSQL Info process, which listens
on port 63145. The main purpose of this service is to provide read-only
information about the node state through an HTTP server.
There is a script under /var/vcap/jobs/postgresql-ha/bin/debug/status.sh
that iterates over all PostgreSQL nodes and requests the current state of
the node, by executing a GET against the /v1/status endpoint. This way,
executing this script from any node will give you the state of the cluster
as seen from that node (this is important especially if you need to debug a
network split).
You can execute the script without arguments:
$ /var/vcap/jobs/postgresql-ha/bin/debug/status.sh
Where the output contains information about the state of all nodes.
The script can also be executed with /v1/replication_status as an argument in
order to check the replication.
$ /var/vcap/jobs/postgresql-ha/bin/debug/status.sh /v1/replication_status
The script will then retrieve the checkpoint's information from the PostgreSQL
process live (with the exception of the primary, as in this case it shows when
it was promoted). The most relevant information there is upstream_node_id and
repmgr_node_id.
As this information is extracted from the local repmgr database from each node
it can be used to compare with the output of the default execution of the script
if they diverge somehow, we strongly recommend looking into the logs in order to
understand what happened with the cluster.
Keep in mind that when executed for single instances the output falls back to the
default output, as replication status only makes sense for a cluster setting.
You can refer to the tables below in order to get detailed descriptions of what each value represents, for both types of execution.