anynines Data Services Administrative Tasks

This document explains the most common tasks an anynines Data Service Operator should know.

Map a service instance guid to a BOSH deployment name

In order to find out which BOSH deployment belongs to which service instance, you can use the following command (jq must be installed):

$ curl --user admin:[deployer-api-password] [deployer-api-endpoint]/deployments.json | jq '.[] | select(.deployment_attributes.instance_guid == "[service-instance-guid]") | .name'

Update All Service Instances

There are three scenarios where you might want to update the existing service instance deployments:

  1. A new version of this anynines deployments repo is available and contains new BOSH releases or new configurations
  2. You uploaded a new stemcell to the BOSH director
  3. You changed the cloud-config in your setup

In case of the first two scenarios you first have to execute the templates-uploader errand so that the templates in the a9s Deployer are updated. After that you can use the deployment_updater errand to update all outdated deployments. In case of the third scenario you only have to execute the deployment_updater errand and all outdated deployments are updated.

You can simply run the deployment_updater errand by executing the following command:

$ bosh -d <deployment_name> run-errand deployment_updater

There are three configuration options in the deployment_updater to specify which deployments should be updated:

  • outdated: Only outdated instances are updated. This is the default. An instance is outdated if
    • the template of the underlying manifest is outdated.
    • one of the applied configs is outdated.
    • one of the used releases is outdated.
    • one of the used stemcells is outdated.
  • provisioned: Only provisioned instances are updated.
  • failed: Only failed instances are updated.

To change this value in a deployment manifest you have to set the property deployment_updater.update_type to one of the above mentioned values. The following Ops file can be used to apply such change:

- type: replace
  path: /properties/deployment_updater/update_type?
  value: provisioned

Update a specific Service Instance

Instead of updating all service instances of a service, it is also possible to update only one service instance. To do so, you must first find out the guid of the service instance. When using Cloud Foundry you can simply run:

$ cf service service-instance-name-in-cf --guid
34e68cdf-62cc-4ea6-a3e8-714026dba1f8

In this example, the service instance guid is 34e68cdf-62cc-4ea6-a3e8-714026dba1f8. Next you have to find out the endpoint of the Service Broker and the Service Broker admin password. Once you have this information you can trigger an update of the service instance by executing:

$ curl --user admin:[service-broker-password] -X PATCH [service-broker-hostname]:3000/v2/service_instances/[service-instance-guid] -d '{"plan_id":"6b1973db-e057-4a71-9832-a4b3f27a0d8f", "service_id": "7ee52a02-8839-43c2-a550-728ad736bbda"}'

The service_id and plan_id parameters can be fetched from the service catalog of the broker.

Interact with the Backup Manager

As an a9s Data Service operator, you can interact with the Backup Manager API in order to trigger backups and restores.

To trigger a backup of all service instances, execute:

$ curl [user]:[password]@[backup manager endpoint]/backup_agent/backup_all -d {}

To trigger a backup for a specific service instance, execute:

$ curl [user]:[password]@[backup manager endpoint]/backup_agent/backup -d "instance_guid=[service-instance-guid]" -H "Accept: application/json"

List all backups:

$ curl [user]:[password]@[backup manager endpoint]/instances

Trigger a restore:

To trigger a restore, you have to find out the internal ID which the service instance has in the backup manager database (this is not the service instance guid). You can find the ID of the service instance by executing the /instances endpoint. Further you need the ID of a backup you want to restore, this can also be found by calling the /instances endpoint. Once you have this information you can trigger the restore of the backup by running:

$ curl [user]:[password]@[backup manager endpoint]/backup_agent/restores -H "Accept: application/json; charset=UTF-8" -d "backup_id=[backup-id]" -d "instance_id=[backup-manager-internal-service-instance-id]"

Decrypt a backup whose encryption key is unknown

To decrypt an existing backup whose encryption key is unknown, access to the a9s Backup Manager is required. If you have access to the a9s Backup Manager, follow these steps to get the encryption key and decrypt the backup.

  1. Download the appropriate backup directly from the Backup Store. The backup name should be in the format <deployment-name>-<unix-timestamp>. To find the backup later in the database, you need the Created at date of the corresponding backup. You can either have a look onto the a9s Service Dashboard or convert the unix-timestamp of the backup file to a date.
  2. Connect to the a9s Backup Manager: bosh -d backup-service ssh backup-manager
  3. Become root: sudo -i
  4. Open the Rails console of the a9s Backup Manager: /var/vcap/jobs/anynines-backup-manager/bin/rails_c
  5. Find the correct encryption key for the backup. You need the date of the backup from the first step. The date must be in the format Year-Month-Day Hour:Minute:Second. The time must be UTC. As an example we use the date 2018-11-26 13:45:53: Backup.where("created_at >= ?", "2018-11-26 13:45:53").first.credentials[:filter_plugins][0][:password] Please note that Backup.where("created_at >= ?", "2018-11-26 13:45:53").first.credentials[:filter_plugins] returns an Array. The encryption plugin should be the first (index 0) but it is possible that it is in another position. This would depend on your configuration of the backup manager configuration.
  6. Decrypt the backup. As an example we use the backup file ~/Downloads/d70a4d9-1543239953810 and as password from the previous step 12345678: cat ~/Downloads/d70a4d9-1543239953810 | openssl enc -aes256 -d -pass 'pass:12345678' | gunzip -c > ~/Downloads/d70a4d9-1543239953810.decrypted

NOTE

The backup are currently encrypted using 'OpenSSL 1.0.2g 1 Mar 2016'. Since 'OpenSSL 1.1', the default digest changed from md5 to sha256. As a consequence, if you are using OpenSSL 1.1 or higher, the command will look like: cat BACKUP_NAME | openssl enc -aes256 -d -pass 'pass:12345678' -md md5 | gunzip -c > decrypted_file

Get the error backtrace from a backup or restore

If a backup or restore fails the backtrace of the error is saved in the database. With these steps you can read the error backtrace.

  1. Connect to the a9s Backup Manager: bosh -d backup-service ssh backup-manager
  2. Become root: sudo -i
  3. Open the Rails console of the a9s Backup Manager: var/vcap/jobs/anynines-backup-manager/bin/rails_c
  4. Get the Instance where the error happend instance = Instance.where(instance_id: "instance_guid").first
  5. Get the Backup that fails. Therefore you have multiple options:
  • If it's the last backup that failed: backup = instance.backups.last
  • If you know the backup name e.g. d25ed99-1543410104023: backup = instance.backups.where(backup_id: "d25ed99-1543410104023")
  • If you don't know the name look for the creation date of the Backup/Restore. The date must be in the format Year-Month-Day Hour:Minute:Second. The time must be UTC. As an example we use the date 2018-11-26 13:45:53: backup = Backup.where("created_at >= ?", "2018-11-26 13:45:53").first
  1. Finally load the message and decode it: Base64.decode64(backup.backup_agent_task.msg)

Backups of a9s-pg

The backup of the a9s-pg can now be handled with the a9s Backup Manager. See a9s_pg_backup for details.

Rotate database encryption salts

To rotate the database encryption salts of the a9s Service Broker, the a9s Deployer or the Backup Manager you have to execute the following steps. Here exemplarily for the elasticsearch-service:

  1. Duplicate the current encryption salt
OLD_SALT=`credhub get -n "/<BOSH director name>/elasticsearch-service/elasticsearch_service_broker_db_salt"`
credhub set -n "/<BOSH director name>/elasticsearch-service/elasticsearch_service_broker_db_salt_old" -t password -w "${OLD_SALT}"

OLD_SALT=`credhub get -n "/<BOSH director name>/elasticsearch-service/elasticsearch_service_deployer_db_salt32"`
credhub set -n "/<BOSH director name>/elasticsearch-service/elasticsearch_service_deployer_db_salt32_old" -t password -w "${OLD_SALT}"
  1. Regenerate the encryption salt
credhub generate -n "/<BOSH director name>/elasticsearch-service/elasticsearch_service_broker_db_salt" -t password -l 32

credhub generate -n "/<BOSH director name>/elasticsearch-service/elasticsearch_service_deployer_db_salt32" -t password -l 32
  1. Redeploy the service
bosh -d elasticsearch-service deploy elasticsearch-service/elasticsearch-service.yml
  1. Execute the errands
bosh -d elasticsearch-service run-errand migrate-deployer-api-encrypted-database-fields

bosh -d elasticsearch-service run-errand migrate-service-broker-encrypted-database-fields

Delete obsolete backup metadata files

To delete obsolete metadata files from already deleted backups you can use the delete_metadata_files script on the Backup Manager VM. Therefore execute the following steps:

  1. Connect to the a9s Backup Manager: bosh -d backup-service ssh backup-manager
  2. Become root: sudo -i
  3. Execute the script: /var/vcap/jobs/anynines-backup-manager/bin/delete_metadata_files

Rotate Consul certificates

Prerequisites

Find out BOSH director name

BOSH_NAME=`bosh env --json | jq '.Tables[0].Rows[0].name' -r`

Ensure the current CredHub CA entry is complete

You have to ensure the current CA value for the CredHub entry is not empty:

credhub get -n "/${BOSH_NAME}/consul-dns/cdns_ca" --output-json | jq .value.ca

If the previous command returns null you have to execute the following commands to copy the value of the current certificate into the value for the current CA:

credhub get -k private_key -n "/${BOSH_NAME}/consul-dns/cdns_ca" > /tmp/cdns_ca.private.pem
credhub get -k certificate -n "/${BOSH_NAME}/consul-dns/cdns_ca" > /tmp/cdns_ca.cert.pem

credhub set -n "/${BOSH_NAME}/consul-dns/cdns_ca" -t certificate -c /tmp/cdns_ca.cert.pem -p /tmp/cdns_ca.private.pem -r /tmp/cdns_ca.cert.pem

Rotate an expiring Consul CA and certificate

To rotate an expiring Consul CA and certificate you have to follow these steps:

Duplicate current CA

credhub get -k private_key -n "/${BOSH_NAME}/consul-dns/cdns_ca" > /tmp/cdns_ca.private.pem
credhub get -k certificate -n "/${BOSH_NAME}/consul-dns/cdns_ca" > /tmp/cdns_ca.cert.pem
credhub get -k ca -n "/${BOSH_NAME}/consul-dns/cdns_ca" > /tmp/cdns_ca.ca.pem

credhub set -n "/${BOSH_NAME}/consul-dns/cdns_ca_old" -t certificate -c /tmp/cdns_ca.cert.pem -p /tmp/cdns_ca.private.pem -r /tmp/cdns_ca.ca.pem

Regenerate current CA

To prevent CA rotation every year change the duration parameter.

credhub generate --duration=365 -n "/${BOSH_NAME}/consul-dns/cdns_ca" -c a9sConsulCA --is-ca -t certificate

Redeploy Environment (with old CA, new CA and old certificate)

consul-dns

Apply the following Ops file to the consul-dns deployment and redeploy the consul-dns deployment.

To prevent SSL certificate rotation every year change the duration parameter in the following Ops file.

IMPORTANT: Replace <bosh-director-name> with the director name from step 1 in all following Ops files.

- type: replace
  path: /instance_groups/name=consul/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"
- type: replace
  path: /variables/name=~1cdns_ssl/options/duration?
  value: 365

data-services

Apply the following Ops file to the x-service deployments and redeploy the x-service deployments. Run the templates-uploader errand and the force_deployment_updater errand after you redeployed the deployments.

- type: replace
  path: /instance_groups/name=spi/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"
- type: replace
  path: /instance_groups/name=broker/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"
- type: replace
  path: /instance_groups/name=deployer-api/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"

- type: replace
  path: /instance_groups/name=templates-uploader/jobs/name=template-uploader/properties/template-uploader/template-vars/~1cdns_ssl.ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"

# delete for a9s Kubernetes, a9s Prometheus and a9s LogMe
- type: replace
  path: /instance_groups/name=service-dashboard/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"

The a9s Kubernetes, a9s Prometheus and a9s LogMe deployment doesn't contain a service dashboard with a running Consul job. The OPS with this replacement must be deleted: /instance_groups/name=service-dashboard/jobs/name=consul/properties/consul/ssl_ca

a9s-pg

Apply the following Ops file to the a9s-pg deployment and redeploy the deployment.

- type: replace
  path: /instance_groups/name=pg/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"

backup-service

Apply the following Ops file to the backup-service deployment and redeploy the deployment.

- type: replace
  path: /instance_groups/name=backup-manager/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"
- type: replace
  path: /instance_groups/name=backup-monit/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"

service-guard

Apply the following Ops file to the service-guard deployment and redeploy the deployment.

- type: replace
  path: /instance_groups/name=guard/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"
- type: replace
  path: /instance_groups/name=guard-worker/jobs/name=consul/properties/consul/ssl_ca
  value: "((/<bosh-director-name>/consul-dns/cdns_ca_old.ca))((/<bosh-director-name>/consul-dns/cdns_ca.ca))"

Delete Consul certificate

credhub delete -n /cdns_ssl

Redeploy environment (with old CA, new CA and new certificate)

Redeploy the environment after the Consul certificate has been deleted.

IMPORTANT: Remember to apply the appropriate Ops file from step Redeploy Environment (with old CA, new CA and old certificate) to the corresponding deployment.

IMPORTANT: Do not forget to update the service instances using the deployment_updater errand.

Redeploy Environment (without old CA)

Redeploy the environment after the Consul certificate has been deleted.

IMPORTANT: This time it is important to NOT apply the Ops file from step Redeploy Environment (with old CA, new CA and old certificate).

IMPORTANT: Do not forget to upload the service templates with the templates-uploader erannd and update the service instances using the force_deployment_updater errand.