Skip to main content
Version: Latest

Safely Evaluating RabbitMQ 4.x Minor Updates

RabbitMQ can contain breaking changes with minor updates, especially regarding the application libraries for RabbitMQ. It could be necessary to update or reconfigure the used RabbitMQ client libraries to ensure compatibility with the new RabbitMQ minor version.

As each new version of anynines-deployment could introduce minor updates for RabbitMQ in a9s Messaging, any such update should be cautiously tested with the used applications.

For any potential breaking changes, see the Changelog or Upgrade file for each new anynines-deployment release.

This guide explains how to safely test a new RabbitMQ version for a9s Messaging to ensure that the client applications work as expected.

Key Facts

Maintenance platform updates: When the Platform Operator installs a new version of anynines-deployment, the a9s Messaging Service Instances will automatically receive rolling, in-place upgrades of RabbitMQ for minor versions.

Block maintenance updates (per Service Instance): It is possible to temporarily block the maintenance updates on Service Instances to avoid unintended version upgrades while the new version is tested with the client applications. For more information, see Using a9s Service Dashboard.

No in-place rollback self-service: Once an instance has been upgraded in-place, it can only be downgraded by the Platform Operator under certain circumstances.

Blue-Green Testing

Using strategies similar to Blue-Green Deployments to test the new minor version of RabbitMQ helps keep the production applications stable, while the new RabbitMQ version can be safely validated on a separate Service Instance.

The following steps explain how a new minor version of RabbitMQ can be tested with the client applications:

1. Block Maintenance Updates

As the first step, the maintenance updates need to be blocked on the production Service Instance. This will keep the production Service Instance stable while the tests are conducted. For more information, see Using a9s Service Dashboard.

caution

If the maintenance updates for a9s Messaging are disabled for specific Service Instances, then not only the updates for the RabbitMQ version are paused, but also the updates to colocated framework components (e.g. a9s Backup Agent) are delayed. For more information, see Using a9s Service Dashboard.

2. Apply New Rabbitmq Minor Update

If there is a new anynines-deployment release available that includes a minor version update for RabbitMQ, it must be only rolled out to the environment after the maintenance updates were blocked for all a9s Messaging Service Instances.

3. Create Testing Service Instance

Create a new a9s Messaging Service Instance after the new anynines-deployment release is rolled out to the environment. The new Service Instance will automatically include the new minor RabbitMQ version.

4. Migrate Queue Definitions

The queue definitions need to be migrated to the new Service Instance with the latest RabbitMQ version. The migration of the queue definitions can be conducted with one of the methods that are described in the Forking and Migration page.

5. Test Application - Service Instance Compliance

This step helps ensure that production applications are compatible with the newly deployed Service Instance.

Depending on the environment, as well as the applications' implementation, the newly deployed Service Instance can be used with staging applications, in order to verify compatiblity between clients and the new RabbitMQ version. Alternatively, some existing production apps can be connect to the new Service Instance in blue-green fashion.

6. Cut-over Production Traffic

After the compatibility between applications and the new Service Instance is successfully verified, there are basically two approaches to switch traffic entirely to Service Instances that use the newer RabbitMQ version:

  • Approach 1: Perform an in-place update of the existing production Service Instance and delete the newer Service Instance that was created for testing.
  • Approach 2: Bind the production applications to the newer, testing Service Instance.
    • In this case, it might be necessary to remove any leftovers from testing.
    • In addition, the existing Service Instance can be kept alive to have a fallback if any problems are observed later.
info

The rest of this guide explains the second approach in more detail. If the first approach is chosen, there are no further considerations.

Migrate Applications to The Newer Service Instance

The traffic can be redirected from the existing production Service Instance to the new Service Instance by switching the Service Bindings of both Instances, or updating the application configuration with the credentials of a Service Key from the new Service Instance.

info

In general, the producer applications need to be connected first to the new Service Instance to stop publishing new messages to the existing production Service Instance. The consumer applications are then connected only after verifying that all in-flight messages are processed, to ensure that no messages are lost during the traffic redirection.

caution

The required steps to migrate applications depend on your environment and the applications themselves. This guide aims to provide a general idea, but the steps must be adapted to your situation.

Migrate Producer Applications

First, connect all producer applications to the new Service Instance. This can be done either by binding the consumer applications to the new Service Instance using the CF CLI or by updating the configuration of the applications to use the new Service Instance.

Migrate Consumer Applications

After all producer applications are connected to the new Service Instance, it needs to be checked if the consumer applications have processed all messages on the old Service Instance. This can be done by either querying the RabbitMQ HTTP API or accessing the Management UI. For more details, see the next section.

After it is ensured that all messages are processed in the old Service Instance, the consumer application can then be redirected to the new Service Instance by binding the consumer applications to it using the CF CLI or by updating the configuration of the consumer applications to use the new Service Instance.

Use the Management UI

If the Management UI is enabled for the a9s Messaging Service Instances, it can be used to inspect all queues for unprocessed messages. For more information on how to connect to the UI, see Use RabbitMQ Management Dashboard.

In the Overview tab of the Management UI, there is a section called Queued Messages that provides information about the number of messages in all queues.

Query the RabbitMQ HTTP API (via an SSH tunnel)

The RabbitMQ HTTP API can be used to verify that all messages were processed. To access the HTTP API, a SSH tunnel and a user with the administrator role needs to be created.

The admin user can be created by executing the command:

cf create-service-key my-messaging-service my-key -c '{"roles": ["administrator", "management"]}'

An application then needs to be bound to the new Service Instance and it can be accessed by:

cf ssh a9s-messaging-app -L 15672:<service-instance-url>:15672

The HTTP API can be queried from inside the bound application to check for any unprocessed queues:

curl -s --insecure -u <user>:<password> https://localhost:15672/api/queues | jq -r '.[] | select(.messages > 0) | "\(.vhost) \(.name): \(.messages)"'

7. Verify Stability

The production applications should be monitored for some time to ensure there are no further issues with the new RabbitMQ minor version. If the applications are operating as expected, the existing production Service Instance can be decommissioned if necessary, or kept as a fall-back.

Rollback: If issues appear after the traffic cut-over to the new Service Instance, applications can be pointed back to the existing production Service Instance and the issues can be investigated. During a rollback, all producer instances need to be redirected first before the consumer applications to ensure no messages are lost.