Skip to main content
Version: 31.1.0

a9s-pg Manual Logical Backup Recovery

This document describes shortly how to manually restore a logical backup on a a9s-pg deployment in another site.

Requirements

There are some requirements to accomplish this:

  • Most Important The backup encryption secret (backup-encryption-secret) and the backup manager service instance id (backup-manager-service-instance-id) must be known by the platform operator. Have a carefully look at the Retrieve Required Information section.
  • Keep preferably the same credentials on Credhub of the a9s-pg that wants to be restored when it going to be deployed the new a9s-pg to the recovery.
  • aws-cli or azure-cli: With the credentials necessary to download the backup.
  • Access to the bucket where the backup is stored: The CLI must be configured with credentials that permits at least reading the files.

If you are not familiarized with the pg_dump and PostgreSQL dump files concept, please read the PostgreSQL documentation about the subject.

Retrieve Required Information

  • Important: The platform operator must store safely the backup-encryption-secret and the backup-manager-service-instance-id because in case a a9s-pg disaster only will be able to recovery the a9s-pg with this information. Therefore, It is strongly necessary to retrieve this information as soon as the a9s-pg and Backup Manager is up and running.
  • Important: During the first Backup Service execution the backup-encryption-secret will be generated automatically. Therefore, in order to retrieve this information you may use the guidance How to Discover an Existing Backup Encryption Secret.
    • Noteworthy, the backup-encryption-secret collection must be done as soon as the a9s-pg and Backup Service are healthy, otherwise will be not able to gather this information.
  • Important: In case the backup-encryption-secret is changed, the new backup-encryption-secret must be stored safely. Therefore, save the backup-encryption-secret information somewhere of your interesting.
  • Important: Even though the backup-encryption-secret is generated automatically, it is a good idea to set your own secret during the initial installation of the a9s Data Service Framework. This can be done once the a9s Backup Manager is installed. See the Set the Backup Encryption Secret Manually section in order to achieve it.

Set the Backup Encryption Secret Manually via a9s API V1

You can set the encryption secret using the a9s API V1, see also Update Backup Configuration.

url="https://a9s-dashboard.example.com/v1/instances/a9s-pg"

curl -X PATCH --cookie test.cookie --cookie-jar test.cookie \
--location --insecure --header "Authorization: ${bearer_token}" \
--header "Content-Type: application/json" "${url}/backups-config" \
--data '{"encryption_key":"<encryption_key>" }'

After the encryption is set, you should trigger a new backup.

$ curl -X POST --cookie test.cookie --cookie-jar test.cookie \
--location --insecure --header "Authorization: ${bearer_token}" \
--header "Content-Type: application/json" "${url}/backups" \
--data '{"encryption_key":"<encryption_key>" }'
=>{"id": 11,"message": "job to backup is queued"}

# check the status of the backup (can be queued, running, done, failed, deleted):
$ curl --cookie test.cookie --cookie-jar test.cookie \
--location --insecure --header "Authorization: ${bearer_token}" "${url}/backups/11"
=> {"id":11,"size":272,"status":"done","triggered_at":"2022-04-12T12:00:31.047Z",
"finished_at":"2022-04-12T12:00:50.478Z", "downloadable":true}

Set the Backup Encryption Secret Manually via Backup Manager API

In some cases, you may not have access to the a9s API. Then you need to curl the Backup Manager API directly. For this you need the backup backup_manager_password which can be found in Credhub.

url="http://admin:<backup_manager_password>@<backup_manager_url>:3000/instances/a9s-pg"

$ curl -X PUT
--header 'Content-Type: application/json' "${url}"
--data '{"encryption_key":"my-encryption-key", "credentials_updated_by_user":true}'
=> {"message":"instance updated"}

After the encryption is set, you should trigger a new backup.

$ curl -X POST
--header 'Content-Type: application/json' "${url}/backups"
=> {"id":11,"message":"job to backup is queued"}

# check the status of the backup (can be queued, running, done, failed, deleted):
$ curl "${url}/backups/11"
=> {"id":11,"size":272,"status":"done","triggered_at":"2022-04-12T12:00:31.047Z","finished_at":"2022-04-12T12:00:50.478Z","downloadable":true}

How to Discover an Existing Backup Encryption Secret

The backup encryption secret is created automatically at the first Backup Service execution or set by the platform operation using the Backup Service API or the a9s API V1.

Using a9s Backup Manager Ruby Console (IRB)

The a9s Backup Manager includes a script that runs the Interactive Ruby Shell already configured to access the current a9s Backup Manager database. To execute this script, access the a9s Backup Manager instance, become root and execute the following command:

/var/vcap/jobs/anynines-backup-manager/bin/rails_c

Inside the IRB shell, execute the following commands:

irb(main):001:0> instance = Instance.where(instance_id: "a9s-pg").first
irb(main):002:0> instance.backup_encryption_key

How to Discover the a9s Backup Manager Service Instance ID

The backup id has the following format: <backup-manager-service-instance-id>-<timestamp>.

The <backup-manager-service-instance-id> is generated by the a9s Backup Manager to identify the instance. In order to identify the correct base backup to use when recovering an instance we need to find out the instance id on the a9s Backup Manager.

Using a9s Backup Manager Ruby Console (IRB)

The a9s Backup Manager includes a script that runs the Interactive Ruby Shell already configured to access the current a9s Backup Manager database. To execute this script, access the a9s Backup Manager instance, become root and execute the following command:

/var/vcap/jobs/anynines-backup-manager/bin/rails_c

Inside the IRB shell, execute the following commands:

irb(main):001:0> Instance.where(instance_id: "a9s-pg").first.guid

Download Files

The logical backup storage follows the structure below:

AWS S3:

  • <bucket>: Or container where the backups are stored.
    • <backup-manager-service-instance-id>.json: Files holding metadata about the backup.
    • <backup-manager-service-instance-id>-<index>: Encrypted and split backup file.

Azure:

  • <bucket>: Or container where the backups are stored.
    • <backup-manager-service-instance-id>: Encrypted backup file.

Download Backup

The first step is to identify which available backup to use in you storage backend.

AWS S3:

aws s3 ls s3://<bucket>/<backup-manager-service-instance-id>

This command should list the metadata files, each file looks like <backup-manager-service-instance-id>-<timestamp>.json. Each .json file is the metadata about a backup. Since we are restoring to the latest available backup, get the name of the backup with most recent timestamp.

The name of the backup (Backup ID) is the name of the file without .json.

It is important to note that backup files are split into multiple files containing 1GB each, this means that if your base backup has more than 1GB of size, you will need to put together these files before restoring, for the moment let's download all files benlonging to a backup.

aws s3 ls s3://<bucket>/<backup-manager-service-instance-id>

To list these files.

And the following command to download all files that belongs to the backup:

aws s3 cp --exclude="*" --include="<backup-manager-service-instance-id>*" --recursive \
s3://<bucket> <tmp-base-backup-dir>

Azure:

az storage blob list --container-name <container> --prefix <backup-manager-service-instance-id>- | jq '.[].name'

This command will list all the available backups, where each file name looks like <backup-manager-service-instance-id>-<timestamp>. Since we are restoring to the latest available backup, get the name of the backup with most recent timestamp.

The name of the backup (Backup ID) is the name of the file.

To download the file, execute:

az storage blob download --container-name <container> --file <backup-manager-service-instance-id> --name <backup-manager-service-instance-id>

Prepare Files

The files are split (if you are using AWS S3) and encrypted. Before restoring, you will need to prepare these files.

The first step after downloading, is to decrypt them. Once the files are decrypted, you must join them, if necessary, before starting the recovery process.

Retrieve Backup Encryption Secret

Files are encrypted before stored in you backend storage. So in order to be able to extract this files we need to decrypt them. The backup encryption secret must obligatorily be known by the platform operator.

Unfortunately, if you are not able to retrieve the credentials you will not be able to decrypt the files and go on with the restoring process.

Decrypt All Files

AWS S3:

To decrypt the backup execute the command below for all files that belongs to the backup. All split files must be decrypted together, and in the correct ascending order.

cat $(ls <backup-manager-service-instance-id>-*) \
| openssl enc -aes256 -md md5 -d -pass 'pass:<backup-encryption-secret>' \
| gunzip -c > <dst-dir>/<backup-manager-service-instance-id>.dump

For example:

cat $(ls b6f4c071-ef44-4af2-9608-531b4ce4823f-1548967947154-*) \
| openssl enc -aes256 -md md5 -d -pass 'pass:NYHD8MVmA55HEqoaYHpQaxfwEMcQ1wkI' \
| gunzip -c > /var/vcap/store/postgresql-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump

Azure:

Execute the following command on the backup file:

cat <file-name> \
| openssl enc -aes256 -md md5 -d -pass 'pass:<backup-encryption-secret>' \
| gunzip -c > <dst-dir>/<file-name>

For example:

cat b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279 \
| openssl enc -aes256 -md md5 -d -pass 'pass:NYHD8MVmA55HEqoaYHpQaxfwEMcQ1wkI' \
| gunzip -c > /var/vcap/store/postgresql-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump

Deploy a New a9s-pg

Deploy a new a9s-pg instance or at least make sure there is a new empty a9s-pg up and running available to recover the data.

Find a9s-pg Master Node

Check the IP by the domain with the following command:

$ nslookup a9s-pg-psql-master-alias.node.dc1.<iaas.consul.domain>

Then, find the Virtual Machine(VM) index or VM id:

$ bosh -d <a9s-pg-deployment-name> instances

Copy Files to the Recovery Instance

Before copying the files to your instance, make sure you have enough space to store the backup file and the created database and make sure you are restoring this dump file in the current master node. The recovery process must be started in the master node, then cloned by the standby nodes.

First, create a directory under /var/vcap/store/

$ bosh -d <a9s-pg-deployment-name> ssh pg/<master-node-index-or-id>
$ sudo su -
$ mkdir /var/vcap/store/postgresql-restore
$ chown root.vcap -R /var/vcap/store/postgresql-restore
$ chmod g+w /var/vcap/store/postgresql-restore

With the directory prepared, copy the backup file (dump file) to the VM.

In the example below the file is transfer using bosh scp:

bosh -d <a9s-pg-deployment-name> scp <backup-manager-service-instance-id>-<timestamp>.dump pg/<master-node-index>:/var/vcap/store/postgresql-restore

Prepare a9s-pg

You can restore the dump file with the current running cluster, data must be restore on master and it will be streamed to the standby nodes.

If you choose to stop the standby nodes before restoring, remember to drop their replication slots within the master node with:

SELECT pg_drop_replication_slot(pg_replication_slots.slot_name) FROM pg_replication_slots WHERE active <> 't';

Note that restoring a .dump file will generate new WAL files as much as the size of the backup file, if you have continuous archiving enabled, this can use a lot of the space since all the data in the backup file will be written to the database generating new WAL files that can take some time to backup.

Make sure no application is connected, you might want to block new connections to the port 5432 with iptables and execute the following command to drop existing active connections:

SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid();

Stop a9s-pg Recovery Cluster (Optional)

Discover which node is the master, then stop the standby nodes. Make sure you stop the master node last.

For each standby node, execute:

$ bosh -d <a9s-pg-deployment-name> ssh pg/<index-or-id>
$ sudo su -
$ monit stop postgresql-ha

Stop only the postgresql-ha process. The repmgr process depends on postgresql-ha so it will also be stopped with this commands.

Cleanup Startup Lock Directory (Optional)

A file may be left on the startup lock directory containing a PID that has been recycled by the operating system. In this case, when trying to restart the postgresql-ha process, it can fail due to a startup process already running when actually another process is reusing the PID.

To avoid this issue, after completely stopping the postgresql-ha process, check if there is any related process running with ps aux. If no related process is running, remove the content of the startup locks directory:

rm /tmp/postgresql-startup-locks/*

Recover Backup

The backup is a dump file generated with pg_dumpall.

So in order to recover, you can execute:

$ bosh -d <a9s-pg-deployment-name> ssh pg/<master-index-or-id>
$ sudo su - vcap
$ source /var/vcap/jobs/postgresql-ha/helpers/vars.sh
$ psql --quiet postgres < /var/vcap/store/postgresql-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump

Update Database Roles (Optional)

It is necessary, when the a9s-pg is recovery in another environment, hence it would have different credentials. Then it is necessary update its credentials.

$ bosh -d <a9s-pg-deployment-name> ssh pg/<master-index-or-id>
$ sudo su - vcap
$ source /var/vcap/jobs/postgresql-ha/helpers/vars.sh
$ psql postgres -f /var/vcap/jobs/postgresql-ha/data/create_or_update_roles_and_databses.sql

Start Standby Nodes (Optional)

Note: It is optional whether It was decided do not to stop the Standby Nodes previously.

After configuring the primary node, clean up the data diretory on the standby nodes:

$ rm -r /var/vcap/store/postgresqlXX/*

Then execute the pre-start script:

$ /var/vcap/jobs/postgresql-ha/bin/pre-start

At this point, data should have been cloned from the primary and it is possible to monit start postgresql-ha on the standby nodes, if running a cluster.

Now the cluster is ready to be used again.

Remember to cleanup /var/vcap/store/postgresql-restore after the the cluster is up and running.

Checking Cluster Health

You can know more about checking the cluster status here