a9s MongoDB Manual Logical Backup Recovery
This document briefly describes how to manually restore a logical backup from a MongoDB deployment to a second MongoDB deployment on another site.
Requirements
There are some requirements that must be met in order to accomplish this:
- Either the aws-cli or the azure-cli: With the credentials necessary to download the backup.
- Access to the bucket where the backup is stored: The CLI must be configured with credentials that permit at least reading the files.
- Access to the old database where the backup metadata is stored: This is necessary to recover the secret used to encrypt the backup files. You will need to have an available Backup Manager able to read data from the database or take note previously of the secret.
How to Discover the Service Instance GUID
In order to identify the Backup ID on a9s Backup Manager, the Cloud Foundry
service instance guid must be identified. If the Cloud Foundry CLI can not be used
(cf service <service-name> --guid
), follow the step below.
Notes:
- We need the
Service Instance GUID
to find theBackup Manager Service Instance ID
. - We need the
Backup Manager Service Instance GUID
to find the Backup ID. - The Backup ID is formed by the
Backup Manager Service Instance GUID
+-
+<timestamp>
.
Using Cloud Foundry and Service Broker Databases
Access the ccdb
(Cloud Controller Database)
database.
After connected, find the GUID of the desired service instance with:
SELECT name, guid FROM service_instances WHERE name = '<service-instance-name>';
How to Discover the a9s Backup Manager Service Instance GUID
The Backup ID has the following format: <backup-manager-service-instance-guid>-<timestamp>
.
The <backup-manager-service-instance-guid>
is generated by the a9s Backup Manager
to identify the instance. In order to identify the correct base backup to use when
recovering an instance, we need to find out the instance id on the a9s Backup Manager.
Note: The service instance ID generated by the a9s Backup Manager is not the same used by Cloud Foundry to identify the instance.
Using the a9s Backup Manager API
The a9s Backup Manager API supports querying for the backups of a given instance. You can use this method if the a9s Backup Manager for the failing site is still available.
Retrieve the a9s Backup Manager password for the admin
user:
credhub get -n /backup_manager_password
Then, trigger the following request against your Backup Manager:
curl -u <backup-manager-username>:<backup-manager-password> \
http://<backup-manager-host>:3000/instances/<service-instance-guid>
This call returns the backup and restore history. You can select which backup to use
based on the created_at
and updated_at
fields, and use the backup_id
field content.
Using a9s Backup Manager Ruby Console (IRB)
The a9s Backup Manager includes a script that runs the Interactive Ruby Shell already configured to access the current a9s Backup Manager database. To execute this script, access the a9s Backup Manager instance, become root and execute the following command:
/var/vcap/jobs/anynines-backup-manager/bin/rails_c
Inside the IRB shell, execute the following commands:
irb(main):001:0> Instance.where(instance_id: "<service-instance-guid>").first.guid
The <service-instance-guid>
is the Cloud Foundry Service Instance GUID
(How to Discover the Service Instance GUID).
Using the a9s Backup Manager Database
Use this method if your a9s Backup Manager is not available.
First, you need to access the PostgreSQL CLI at a9s-pg
deployment and then access
its internal Backup Service database. The name of the database is specified in the
backup-service manifest configured during the a9s Framework deployment.
SELECT guid FROM instances WHERE instance_id='<service-instance-guid>';
The <service-instance-guid>
is the Cloud Foundry Service Instance GUID
(How to Discover the Service Instance GUID).
Download Files
The logical backup storage follows the structure below:
AWS S3:
- <bucket>: Or container where the backups are stored.
- <backup-manager-service-instance-id>.json: Files holding metadata about the backup.
- <backup-manager-service-instance-id>-<index>: Encrypted and split backup file.
Azure:
- <bucket>: Or container where the backups are stored.
- <backup-manager-service-instance-id>: Encrypted backup file.
Download Backup
The first step is to identify which available backup from your storage backend will be used.
AWS S3:
aws s3 ls s3://<bucket>/<backup-manager-service-instance-guid>
This command should list the metadata files, each file looks like <backup-manager-service-instance-guid>-<timestamp>.json
.
Each .json
file is the metadata about a backup. Since we are restoring to the latest available
backup, get the name of the backup with the most recent timestamp.
The name of the backup (Backup ID) is the name of the file without .json
.
It is important to note that backup files are split into multiple files containing 1GB each, this means that if your base backup has more than 1GB of size, you will need to put together these files before restoring, for the moment let's download all files belonging to a backup.
To list these files, execute the following:
aws s3 ls s3://<bucket>/<backup-manager-service-instance-guid>
And the following command to download all files that belongs to the backup:
aws s3 cp --exclude="*" --include="<backup-manager-service-instance-guid>*" --recursive \
s3://<bucket> <tmp-base-backup-dir>
Azure:
az storage blob list --container-name <container> --prefix <backup-manager-service-instance-guid>- | jq '.[].name'
This command will list all the available backups, where each file name
looks like <backup-manager-service-instance-guid>-<timestamp>
. Since we are restoring to the latest
available backup, get the name of the backup with the most recent timestamp.
The name of the backup (Backup ID) is the name of the file.
To download the file, execute:
az storage blob download --container-name <container> --file <backup-id> --name <backup-id>
Prepare Files
The files are split (if you are using AWS S3) and encrypted. Before restoring, you will need to prepare these files.
The first step, after downloading, is to decrypt them. Once the files are decrypted, you must join them, if necessary, before starting the recovery process.
Retrieve Backup Encryption Secret
Files are encrypted before being stored in your backend storage. So in order to be able to extract these files we need to decrypt them.
Unless you have changed the secret used to encrypt the backup using the a9 Service Dashboard (in this case, you need to remember your secret), you will have to access the Backup Manager in the old site to retrieve the secret used to encrypt the backup.
$ bosh -d <backup-manager-deployment-name> ssh backup-manager # Sometimes called ancillary-services
$ sudo su -
$ /var/vcap/jobs/anynines-backup-manager/bin/rails_c
irb(main):001:0> Backup.where(backup_id: "<backup-id").first.credentials[:filter_plugins] \
irb(main):002:0> .select{ |plugin| plugin[:name] == "encrypt" }.first[:password]
To know more about this, you can also check here.
Unfortunately, if you are not able to retrieve the credentials you will not be able to decrypt the files and go on with the restoring process.
Decrypt All Files
AWS S3:
To decrypt the backup, execute the command below for all files that belong to the backup. All split files must be decrypted together and in the correct ascending order.
The following commands make use of the following compound command cat $(ls -1v ...)
, which assumes the use of a Linux distribution operating system. Therefore, if you are using a different operating system for your environment, it is recommended that you adjust this compound command to accordingly.
cat $(ls -1v <backup-id>-*) \
| openssl enc -aes256 -d -pass 'pass:<secret>' \
| gunzip -c > <dst-dir>/<backup-id>.dump
For example:
cat $(ls -1v b6f4c071-ef44-4af2-9608-531b4ce4823f-1548967947154-*) \
| openssl enc -aes256 -d -pass 'pass:NYHD8MVmA55HEqoaYHpQaxfwEMcQ1wkI' \
| gunzip -c > /var/vcap/store/mongodb-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump
Azure:
Execute the following command on the backup file:
cat <file-name> \
| openssl enc -aes256 -d -pass 'pass:<secret>' \
| gunzip -c > <dst-dir>/<backup-id>.dump
For example:
cat b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279 \
| openssl enc -aes256 -d -pass 'pass:NYHD8MVmA55HEqoaYHpQaxfwEMcQ1wkI' \
| gunzip -c > /var/vcap/store/mongodb-restore/b6f4c071-ef44-4af2-9608-531b4ce4823f-1569935509279.dump
Copy Files to the Recovery Instance
Note: The new instance, where the old service instance backup will be restored, must be clear/clean and should not have any secret keys associated with the service instance itself. If this is not the case, there might be a potential problem after the restoration.
Before copying the files to your instance, make sure you have enough space to store the backup file and the created database. Furthermore, make sure you are restoring this dump file in the current master node. The recovery process must be started in the master node, then cloned by the standby nodes.
First, create a directory under /var/vcap/store/
bosh -d <service-instance-deployment-name> ssh mongodb/<node-index-or-id>
sudo su -
mkdir /var/vcap/store/mongodb-restore
chown root.vcap -R /var/vcap/store/mongodb-restore
chmod g+w /var/vcap/store/mongodb-restore
With the directory prepared, copy the backup file (dump file) to the Virtual Machine.
In the examples below the backup file is transferred using bosh scp
:
Single Instance: Single dump file.
file_dump="<backup-id>.dump
bosh -d <service-instance-deployment-name> scp ${file_dump} mongodb/<node-index>:/var/vcap/store/mongodb-restore
Cluster Instance: Mongodb directory in tar.
file_dump="<backup-id>.dump
mongodb_dump_directory="${file_dump}.folder"
mkdir ${mongodb_dump_directory}
tar -xf ${file_dump} -C ${mongodb_dump_directory}
bosh -d <service-instance-deployment-name> scp -r ${mongodb_dump_directory} mongodb/<node-index>:/var/vcap/store/mongodb-restore
Backup Recovery
Single Instance Recovery
Parameters:
- username: Username of MongoDB's admin user.
bosh -d <deployment_name> manifest > <deployment_manifest>
bosh int <deployment_manifest> --path /instance_groups/name=mongodb/jobs/name=mongodb/properties/mongodb/admin_username
- password: Password of the admin user of the MongoDB.
bosh -d <deployment_name> manifest > <deployment_manifest>
bosh int <deployment_manifest> --path /instance_groups/name=mongodb/jobs/name=mongodb/properties/mongodb/admin_password
- host: Domain of the MongoDB virtual machine.
<deployment_name>-mongodb-<node-index>.node.dc1.<consul_domain>
- port: Port of the MongoDB; Default port: 27017.
- backup-manager-service-instance-id: Backup Manager service instance id. It is possible to find it in the session Discover Backup Manager Service Instance Id.
# Add if necessary
## --ssl --sslAllowInvalidCertificates - When the MongoDB uses TLS.
dump_path=<backup-id>.dump
/var/vcap/packages/mongodb/bin/mongorestore --quiet --host <host> --port <port> -u <username> -p <password> --gzip --archive=${dump_path}
Known Issues
Updating to MongoDB v5.0.x
There is a bug in MongoDB version v5.0.x, as mentioned in MongoDB Considerations.
In order to make it work the MongoDB backup needs to have any MongoDB roles.
This is a workaround to help fix it:
db = db.getSiblingDB('admin')
db.auth("<admin_username>", "<admin_password>");
db = db.getSiblingDB('dummy')
db.createRole(
{
role: "dummyRole",
privileges: [
{ resource: { db: "dummy", collection: "" }, actions: [ "find" ] }
],
roles: [
{ role: "read", db: "dummy" }
],
}
)
Restore Cluster Instance
Remove all service keys
Before starting the restoration, remove all service keys of the new deployment. We also recommend identifying the new cluster's primary node in order to avoid a scenario where your cluster ends with no discernible primary after restoring the data.
Stop All MongoDB Process at the Nodes
Notes:
- a9s MongoDB cluster dump is a
tar
file and must be placed in all nodes in the/var/vcap/store/mongodb
folder, to ensure they have the same content. - Currently, it is not possible to restore a MongoDB cluster instance from one MongoDB version to another. Hence, it is necessary to restore the dump content to the same version.
1 - Stop all nodes in the cluster (execute for each node)
bosh -d <service-instance-deployment-name> ssh mongodb/<node-index> -c "sudo monit stop mongodb"
2 - Check if all mongodb processes are stopped (execute for each node)
bosh -d <service-instance-deployment-name> ssh mongodb/<node-index> -c "sudo monit summary"
If all mongodb
processes are stopped, you can proceed with the next step.
Applying the a9s MongoDB Cluster Dump
- Restore the nodes (execute for each node)
bosh -d <service-instance-deployment-name> ssh mongodb/<node-index>
dump_folder_path=<backup-id>.dump.folder
rm -rf /var/vcap/store/mongodb/*
mv -f ${dump_folder_path}/* /var/vcap/store/mongodb
rm /var/vcap/store/mongodb/mongod.lock
Start a9s MongoDB Processes at the Nodes
Execute the following for each node:
bosh -d <service-instance-deployment-name> ssh mongodb/<node-index> -c "sudo monit start mongodb"
Change MongoDB User Credentials
Due to the fact that, when restoring the cluster service instance, the MongoDB database comes with the old service instance's credentials we need to update them in accordance to the new service instance.
- Get MongoDB User credentials from the old service instance
Note: If it is not possible to get the <old_deployment_username>
and the <old_deployment_password>
,
it is not possible to restore a cluster instance properly.
Find the <old_deployment_username>
:
bosh -d <old_deployment_name> manifest > <old_deployment_manifest>
bosh int <old_deployment_manifest> --path /instance_groups/name=mongodb/jobs/name=mongodb/properties/mongodb/admin_username
Find the <old_deployment_password>
:
bosh -d <old_deployment_name> manifest > <old_deployment_manifest>
bosh int <old_deployment_manifest> --path /instance_groups/name=mongodb/jobs/name=mongodb/properties/mongodb/admin_password
- Get MongoDB User credentials from the current service instance
Find the <username>
:
bosh -d <current_deployment_name> manifest > <current_deployment_manifest>
bosh int <current_deployment_manifest> --path /instance_groups/name=mongodb/jobs/name=mongodb/properties/mongodb/admin_username
Find the <password>
:
bosh -d <current_deployment_name> manifest > <current_deployment_manifest>
bosh int <current_deployment_manifest> --path /instance_groups/name=mongodb/jobs/name=mongodb/properties/mongodb/admin_password
- Log into the primary node. See the Find the Cluster's Primary Node section.
bosh -d <service-instance-deployment-name> ssh mongodb/<master_node_index>
sudo su -
- Access MongoDB via the MongoDB's shell client
a9s MongoDB 5.0
sudo su -
# When the Service Instance is SSL, use `--ssl` `--sslCAFile <your-ca-file-path>`.
/var/vcap/packages/mongodb/bin/mongo -u <old_deployment_username> -p <old_deployment_password> "admin"
a9s MongoDB 7.0
sudo su -
# When the Service Instance is SSL, use `--tls` `--tlsCAFile <your-ca-file-path>`.
/var/vcap/packages/mongosh/mongosh -u <old_deployment_username> -p <old_deployment_password> "admin"
- Update MongoDB's username
db.system.users.update({"user":"<old_deployment_username>"}, {$set:{"user":"<username>"}})
- Access MongoDB via the MongoDB's shell client with the new username
a9s MongoDB 5.0
sudo su -
# When the Service Instance is SSL, use `--ssl` `--sslCAFile <your-ca-file-path>`.
/var/vcap/packages/mongodb/bin/mongo -u <username> -p <old_deployment_password> "admin"
a9s MongoDB 7.0
sudo su -
# When the Service Instance is SSL, use `--tls` `--tlsCAFile <your-ca-file-path>`.
/var/vcap/packages/mongosh/mongosh -u <username> -p <old_deployment_password> "admin"
- Update MongoDB's password
db.changeUserPassword("<username>", "<password>")
Reconfigure Replication
- Log into the primary node. See the Find the Cluster's Primary Node section.
bosh -d <service-instance-deployment-name> ssh mongodb/<master_node_index>
sudo su -
- Access MongoDB via the MongoDB's shell client
a9s MongoDB 5.0
sudo su -
# When the Service Instance is SSL, use `--ssl` `--sslCAFile <your-ca-file-path>`.
/var/vcap/packages/mongodb/bin/mongo -u <username> -p <password> "admin"
a9s MongoDB 7.0
sudo su -
# When the Service Instance is SSL, use `--tls` `--tlsCAFile <your-ca-file-path>`.
/var/vcap/packages/mongosh/mongosh -u <username> -p <password> "admin"
- Reconfigure the replication configuration
It is necessary configure the host with the node domains. These domains are structured
in the format <deployment_name>-mongodb-<node_index>.node.dc1.<consul_domain>:<port>
.
cfg = rs.conf()
cfg.members[0].host = <deployment_name>-mongodb-0.node.dc1.<consul_domain>:<port>
cfg.members[1].host = <deployment_name>-mongodb-1.node.dc1.<consul_domain>:<port>
cfg.members[2].host = <deployment_name>-mongodb-2.node.dc1.<consul_domain>:<port>
rs.reconfig(cfg, {"force": true})
Check MongoDB Health
Find the Cluster's Primary Node
Check which node has the primary
property set to true.
bosh -d <deployment_name> ssh mongodb/0 -c 'echo "rs.isMaster()" | /var/vcap/packages/mongodb/bin/mongo' --results
Example: In this case, the node 0
is the PRIMARY
. Have a look at the output:
(...)
"secondary" : false,
"primary" : "<deployment_name>-mongodb-<node-index>.node.dc1.<consul_domain>:<port>",
"me" : "<deployment_name>-mongodb-<node-index>.node.dc1.<consul_domain>:<port>",
(...)
Check MongoDB status
- Log into the primary node.
bosh -d <service-instance-deployment-name> ssh mongodb/<master_node_index>
sudo su -
- Execute db.stats() in MongoDB's shell client.
a9s MongoDB 5.0
# When the Service Instance is SSL, use `--ssl` `--sslCAFile <your-ca-file-path>`.
/var/vcap/packages/mongodb/bin/mongo -u <username> -p <password> "admin"
db.stats()
a9s MongoDB 7.0
# When the Service Instance is SSL, use `--tls` `--tlsCAFile <your-ca-file-path>`.
/var/vcap/packages/mongosh/mongosh -u <username> -p <password> "admin"
db.stats()
- Check if the database information complies with database backed up.
db.stast()
output explanation - documentation
Check Replication Status
- Get into the primary node.
bosh -d <service-instance-deployment-name> ssh mongodb/<master_node_index>
sudo su -
- Execute
rs.status()
in MongoDB's shell client.
a9s MongoDB 5.0
# When the Service Instance is SSL, use `--ssl` `--sslCAFile <your-ca-file-path>`.
/var/vcap/packages/mongodb/bin/mongo -u <username> -p <password> "admin" rs.status()
a9s MongoDB 7.0
# When the Service Instance is SSL, use `--tls` `--tlsCAFile <your-ca-file-path>`.
/var/vcap/packages/mongosh/mongosh -u <username> -p <password> "admin" rs.status()
- Check if there is 1 PRIMARY node and 2 SECONDARY nodes. Output:
(...)
"members" : [
{
"name" : "<deployment_name>-mongodb-0.node.dc1.<consul_domain>:<port>",
"health" : 1,
"stateStr" : "PRIMARY",
},
{
"name" : "<deployment_name>-mongodb-1.node.dc1.<consul_domain>:<port>",
"health" : 1,
"stateStr" : "SECONDARY",
},
{
"name" : "<deployment_name>-mongodb-2.node.dc1.<consul_domain>:<port>",
"health" : 1,
"stateStr" : "SECONDARY",
"uptime" : 52519,
}
],
(...)