a9s-pg Manual Logical Backup Recovery
This document describes shortly how to manually restore a logical backup on a a9s-pg deployment in another site.
Requirements
There are some requirements to accomplish this:
- Most Important The backup encryption secret (
backup-encryption-secret
) and the backup manager service instance id (backup-manager-service-instance-id
) must be known by the platform operator. Have a carefully look at the Retrieve Required Information section. - Keep preferably the same credentials on
Credhub
of the a9s-pg that wants to be restored when it going to be deployed the new a9s-pg to the recovery. - aws-cli or azure-cli: With the credentials necessary to download the backup.
- Access to the bucket where the backup is stored: The CLI must be configured with credentials that permits at least reading the files.
If you are not familiarized with the pg_dump
and PostgreSQL dump files concept,
please read the PostgreSQL documentation
about the subject.
Retrieve Required Information
- Important: The platform operator must store safely the
backup-encryption-secret
and thebackup-manager-service-instance-id
because in case a a9s-pg disaster only will be able to recovery the a9s-pg with this information. Therefore, It is strongly necessary to retrieve this information as soon as the a9s-pg and Backup Manager is up and running.- The
backup-manager-service-instance-id
is used to get the backup dump at the blob storage provider. It is possible to retrieve this information by the How to Discover the a9s Backup Manager Service Instance ID guidance. - The
backup-encryption-secret
is used to decrypt backup dump. It is possible to retrieve this information by the How to Discover an Existing Backup Encryption Secret guidance.
- The
- Important: During the first Backup Service execution the
backup-encryption-secret
will be generated automatically. Therefore, in order to retrieve this information you may use the guidance How to Discover an Existing Backup Encryption Secret.- Noteworthy, the
backup-encryption-secret
collection must be done as soon as the a9s-pg and Backup Service are healthy, otherwise will be not able to gather this information.
- Noteworthy, the
- Important: In case the
backup-encryption-secret
is changed, the newbackup-encryption-secret
must be stored safely. Therefore, save thebackup-encryption-secret
information somewhere of your interesting. - Important: Even though the
backup-encryption-secret
is generated automatically, it is a good idea to set your own secret during the initial installation of the a9s Data Service Framework. This can be done once the a9s Backup Manager is installed. See the Set the Backup Encryption Secret Manually section in order to achieve it.
Set the Backup Encryption Secret Manually via a9s API V1
You can set the encryption secret using the a9s API V1, see also Update Backup Configuration.
url="https://a9s-dashboard.example.com/v1/instances/a9s-pg"
curl -X PATCH --cookie test.cookie --cookie-jar test.cookie \
--location --insecure --header "Authorization: ${bearer_token}" \
--header "Content-Type: application/json" "${url}/backups-config" \
--data '{"encryption_key":"<encryption_key>" }'
After the encryption is set, you should trigger a new backup.
$ curl -X POST --cookie test.cookie --cookie-jar test.cookie \
--location --insecure --header "Authorization: ${bearer_token}" \
--header "Content-Type: application/json" "${url}/backups" \
--data '{"encryption_key":"<encryption_key>" }'
=>{"id": 11,"message": "job to backup is queued"}
# check the status of the backup (can be queued, running, done, failed, deleted):
$ curl --cookie test.cookie --cookie-jar test.cookie \
--location --insecure --header "Authorization: ${bearer_token}" "${url}/backups/11"
=> {"id":11,"size":272,"status":"done","triggered_at":"2022-04-12T12:00:31.047Z",
"finished_at":"2022-04-12T12:00:50.478Z", "downloadable":true}
Set the Backup Encryption Secret Manually via Backup Manager API
In some cases, you may not have access to the a9s API. Then you need to curl
the Backup Manager API directly. For this you need the backup backup_manager_password
which can be found in Credhub.
url="http://admin:<backup_manager_password>@<backup_manager_url>:3000/instances/a9s-pg"
$ curl -X PUT
--header 'Content-Type: application/json' "${url}"
--data '{"encryption_key":"my-encryption-key", "credentials_updated_by_user":true}'
=> {"message":"instance updated"}
After the encryption is set, you should trigger a new backup.
$ curl -X POST
--header 'Content-Type: application/json' "${url}/backups"
=> {"id":11,"message":"job to backup is queued"}
# check the status of the backup (can be queued, running, done, failed, deleted):
$ curl "${url}/backups/11"
=> {"id":11,"size":272,"status":"done","triggered_at":"2022-04-12T12:00:31.047Z","finished_at":"2022-04-12T12:00:50.478Z","downloadable":true}
How to Discover an Existing Backup Encryption Secret
The backup encryption secret is created automatically at the first Backup Service execution or set by the platform operation using the Backup Service API or the a9s API V1.
Using a9s Backup Manager Ruby Console (IRB)
The a9s Backup Manager includes a script that runs the Interactive Ruby Shell already configured to access the current a9s Backup Manager database. To execute this script, access the a9s Backup Manager instance, become root and execute the following command:
/var/vcap/jobs/anynines-backup-manager/bin/rails_c
Inside the IRB shell, execute the following commands:
irb(main):001:0> instance = Instance.where(instance_id: "a9s-pg").first
irb(main):002:0> instance.backup_encryption_key
How to Discover the a9s Backup Manager Service Instance ID
The backup id has the following format: <backup-manager-service-instance-id>-<timestamp>
.
The <backup-manager-service-instance-id>
is generated by the a9s Backup Manager
to identify the instance. In order to identify the correct base backup to use when
recovering an instance we need to find out the instance id on the a9s Backup Manager.
Using a9s Backup Manager Ruby Console (IRB)
The a9s Backup Manager includes a script that runs the Interactive Ruby Shell already configured to access the current a9s Backup Manager database. To execute this script, access the a9s Backup Manager instance, become root and execute the following command:
/var/vcap/jobs/anynines-backup-manager/bin/rails_c
Inside the IRB shell, execute the following commands:
irb(main):001:0> Instance.where(instance_id: "a9s-pg").first.guid
Download Files
The logical backup storage follows the structure below:
AWS S3:
- <bucket>: Or container where the backups are stored.
- <backup-manager-service-instance-id>.json: Files holding metadata about the backup.
- <backup-manager-service-instance-id>-<index>: Encrypted and split backup file.
Azure:
- <bucket>: Or container where the backups are stored.
- <backup-manager-service-instance-id>: Encrypted backup file.
Download Backup
The first step is to identify which available backup to use in you storage backend.
AWS S3:
aws s3 ls s3://<bucket>/<backup-manager-service-instance-id>
This command should list the metadata files, each file looks like <backup-manager-service-instance-id>-<timestamp>.json
.
Each .json
file is the metadata about a backup. Since we are restoring to the latest available
backup, get the name of the backup with most recent timestamp.
The name of the backup (Backup ID) is the name of the file without .json
.
It is important to note that backup files are split into multiple files containing 1GB each, this means that if your base backup has more than 1GB of size, you will need to put together these files before restoring, for the moment let's download all files belonging to a backup.
aws s3 ls s3://<bucket>/<backup-manager-service-instance-id>
To list these files.
And the following command to download all files that belongs to the backup:
aws s3 cp --exclude="*" --include="<backup-manager-service-instance-id>*" --recursive \
s3://<bucket> <tmp-base-backup-dir>
Azure:
az storage blob list --container-name <container> --prefix <backup-manager-service-instance-id>- | jq '.[].name'
This command will list all the available backups, where each file name
looks like <backup-manager-service-instance-id>-<timestamp>
. Since we are restoring to the latest
available backup, get the name of the backup with most recent timestamp.
The name of the backup (Backup ID) is the name of the file.
To download the file, execute:
az storage blob download --container-name <container> --file <backup-manager-service-instance-id> --name <backup-manager-service-instance-id>
Prepare Files
The files are split (if you are using AWS S3) and encrypted. Before restoring, you will need to prepare these files.
The first step after downloading, is to decrypt them. Once the files are decrypted, you must join them, if necessary, before starting the recovery process.
Retrieve Backup Encryption Secret
Files are encrypted before stored in you backend storage. So in order to be able to extract this files we need to decrypt them. The backup encryption secret must obligatorily be known by the platform operator.
Unfortunately, if you are not able to retrieve the credentials you will not be able to decrypt the files and go on with the restoring process.
Decrypt All Files
AWS S3:
To decrypt the backup execute the command below for all files that belongs to the backup. All split files must be decrypted together, and in the correct ascending order.
The following commands make use of the following compound command cat $(ls -1v ...)
, which assumes the use of a Linux distribution operating system. Therefore, if you are using a different operating system for your environment, it is recommended that you adjust this compound command to accordingly.
cat $(ls -1v <backup-manager-service-instance-id>-*) \
| openssl enc -aes256 -md md5 -d -pass 'pass:<backup-encryption-secret>' \
| gunzip -c > <dst-dir>/<backup-manager-service-instance-id>.dump
For example:
cat $(ls -1v b6f4c071-ef44-4af2-9608-531b4ce4823f-1548967947154-*) \
| openssl enc -aes256 -md md5 -d -pass 'pass:NYHD8MVmA55HEqoaYHpQaxfwEMcQ1wkI' \
| gunzip -c > /var/vcap/store/postgresql-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump
Azure:
Execute the following command on the backup file:
cat <file-name> \
| openssl enc -aes256 -md md5 -d -pass 'pass:<backup-encryption-secret>' \
| gunzip -c > <dst-dir>/<file-name>
For example:
cat b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279 \
| openssl enc -aes256 -md md5 -d -pass 'pass:NYHD8MVmA55HEqoaYHpQaxfwEMcQ1wkI' \
| gunzip -c > /var/vcap/store/postgresql-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump
Deploy a New a9s-pg
Deploy a new a9s-pg
instance or at least make sure
there is a new empty a9s-pg up and running available to recover the data.
Find a9s-pg Master Node
Check the IP by the domain with the following command:
$ nslookup a9s-pg-psql-master-alias.node.dc1.<iaas.consul.domain>
Then, find the Virtual Machine(VM) index or VM id:
$ bosh -d <a9s-pg-deployment-name> instances
Copy Files to the Recovery Instance
Before copying the files to your instance, make sure you have enough space to store the backup file and the created database and make sure you are restoring this dump file in the current master node. The recovery process must be started in the master node, then cloned by the standby nodes.
First, create a directory under /var/vcap/store/
$ bosh -d <a9s-pg-deployment-name> ssh pg/<master-node-index-or-id>
$ sudo su -
$ mkdir /var/vcap/store/postgresql-restore
$ chown root.vcap -R /var/vcap/store/postgresql-restore
$ chmod g+w /var/vcap/store/postgresql-restore
With the directory prepared, copy the backup file (dump file) to the VM.
In the example below the file is transfer using bosh scp
:
bosh -d <a9s-pg-deployment-name> scp <backup-manager-service-instance-id>-<timestamp>.dump pg/<master-node-index>:/var/vcap/store/postgresql-restore
Prepare a9s-pg
You can restore the dump file with the current running cluster, data must be restore on master and it will be streamed to the standby nodes.
If you choose to stop the standby nodes before restoring, remember to drop their replication slots within the master node with:
SELECT pg_drop_replication_slot(pg_replication_slots.slot_name) FROM pg_replication_slots WHERE active <> 't';
Note that restoring a .dump
file will generate new WAL files as much as the size of
the backup file, if you have continuous archiving enabled, this can use a lot of the
space since all the data in the backup file will be written to the database generating
new WAL files that can take some time to backup.
Make sure no application is connected, you might want to block new connections to the port 5432
with iptables and execute the following command to drop existing active connections:
SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid();
Stop a9s-pg Recovery Cluster (Optional)
Discover which node is the master, then stop the standby nodes. Make sure you stop the master node last.
For each standby node, execute:
$ bosh -d <a9s-pg-deployment-name> ssh pg/<index-or-id>
$ sudo su -
$ monit stop postgresql-ha
Stop only the postgresql-ha
process. The repmgr
process depends on postgresql-ha
so it will also be stopped with this commands.
Cleanup Startup Lock Directory (Optional)
A file may be left on the startup lock directory containing a PID that has been recycled
by the operating system. In this case, when trying to restart the postgresql-ha
process,
it can fail due to a startup process already running when actually another process is reusing the PID.
To avoid this issue, after completely stopping the postgresql-ha
process, check if there is
any related process running with ps aux
. If no related process is running, remove the content
of the startup locks directory:
rm /tmp/postgresql-startup-locks/*
Recover Backup
The backup is a dump file generated with pg_dumpall
.
So in order to recover, you can execute:
$ bosh -d <a9s-pg-deployment-name> ssh pg/<master-index-or-id>
$ sudo su - vcap
$ source /var/vcap/jobs/postgresql-ha/helpers/vars.sh
$ psql --quiet postgres < /var/vcap/store/postgresql-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump
Update Database Roles (Optional)
It is necessary, when the a9s-pg is recovery in another environment, hence it would have different credentials. Then it is necessary update its credentials.
$ bosh -d <a9s-pg-deployment-name> ssh pg/<master-index-or-id>
$ sudo -i
$ source /var/vcap/jobs/postgresql-ha/helpers/vars.sh
$ chpst -u postgres:vcap psql postgres -f /var/vcap/jobs/postgresql-ha/data/create_or_update_roles_and_databses.sql
a9s PostgreSQL 13 and below
$ bosh -d <a9s-pg-deployment-name> ssh pg/<master-index-or-id>
$ sudo su - vcap
$ source /var/vcap/jobs/postgresql-ha/helpers/vars.sh
$ psql postgres -f /var/vcap/jobs/postgresql-ha/data/create_or_update_roles_and_databses.sql
Start Standby Nodes (Optional)
Note: It is optional whether It was decided do not to stop the Standby Nodes previously.
After configuring the primary node, clean up the data directory on the standby nodes:
$ rm -r /var/vcap/store/postgresqlXX/*
Then execute the pre-start
script:
$ /var/vcap/jobs/postgresql-ha/bin/pre-start
At this point, data should have been cloned from the primary and it is possible to
monit start postgresql-ha
on the standby nodes, if running a cluster.
Now the cluster is ready to be used again.
Remember to cleanup /var/vcap/store/postgresql-restore
after the the cluster is up and running.
Checking Cluster Health
You can learn more about checking the cluster status here.
It is also possible to get some idea of the cluster health by looking at the metrics.