Replication
This document briefly describes the properties related to a9s PostgreSQL replication configuration.
Max Replication Lag
The postgresql-ha.cluster.replication.max_replication_lag
property configures the maximum
acceptable replication lag of a standby node in order to be able to be promoted.
The value is an integer, representing the number of seconds. The default is 1800 (30min).
This value can be adjusted according to the installation requirements.
The value can be configured via template-uploader
.
For example:
# postgresql-service.yml
...
- name: templates-uploader
jobs:
- name: template-uploader
...
properties:
template-uploader:
template-custom-ops: |
- type: replace
path: /instance_groups/name=pg/jobs/name=postgresql-ha/properties/postgresql-ha/cluster/replication?/max_replication_lag?
value: 180
...
Replication Slots Cleanup
a9s PostgreSQL clusters use
replications slots to
achieve replication stream between the primary and the standby nodes. Replication slots can also
be used by several applications and can be created using the REPLICATION
role privilege.
There is one concern when using replication slots. WAL files are not recycled until all of them are marked as active again, so PostgreSQL will stop recycling the files, which will use up all storage space, causing the postmaster to crash, if not stopped by a9s Parachute.
a9s PostgreSQL ships with a replication slots cleanup routine which drops inactive replicating slots based different parameters to avoid the main process to crash due to an inactive replication slot.
Inactive replication slots are dropped by the replication slots cleanup routine at a configurable interval. Two different rules are applied to two different set of replication slots. The first type are the slots used for administrative tasks by the cluster (e.g.: standby nodes replicating from the primary), the second set are all other replication slots. Each configuration can specify the expiration time to drop an inactive replication slot and a property to specify a storage usage limit where an inactive replication slot is dropped regardless of the expiration time:
# postgresql-service.yml
...
- name: templates-uploader
jobs:
- name: template-uploader
...
properties:
template-uploader:
template-custom-ops: |
# Scheduler configuration valid for all sections bellow
- type: replace
path: /instance_groups/name=pg/jobs/name=postgresql-ha/properties/postgresql-info-webservice/replication_slots?/reaper?/schedule?
value: "*/10 * * * *"
# Start of the general slots section
- type: replace
path: /instance_groups/name=pg/jobs/name=postgresql-ha/properties/postgresql-info-webservice/replication_slots?/reaper?/general?/inactive_slot_expiration?
value: 3h
- type: replace
path: /instance_groups/name=pg/jobs/name=postgresql-ha/properties/postgresql-info-webservice/replication_slots?/reaper?/general?/max_storage_limit?
value: 65
# Start of the administrative cluster section
- type: replace
path: /instance_groups/name=pg/jobs/name=postgresql-ha/properties/postgresql-info-webservice/replication_slots?/reaper?/cluster?/inactive_slot_expiration?
value: 72h
- type: replace
path: /instance_groups/name=pg/jobs/name=postgresql-ha/properties/postgresql-info-webservice/replication_slots?/reaper?/cluster?/max_storage_limit?
value: 75
...
max_storage_limit
is specified as a percentage of the total storage of the persistent disk.
To recover a node that the replication slots has been reaped, see the Replication Lag documentation.