Why We Use the Consul DNS System
The a9s Consul component provides an internal DNS system to the other a9s Data Services, such as a9s PostgreSQL. The a9s Data Services use this internal DNS system to do the following:
- Make service bindings independent from IP addresses: When developers bind an a9s service instance to their Cloud Foundry app, they can provide a hostname instead of an IP address. Developers then do not need to recreate the service binding if the IP address of a node in the service cluster changes.
- Provide an infrastructure-independent failover mechanism: Consider apps that use a9s PostgreSQL. They connect to a9s PostgreSQL clusters using the hostname of a single master node that accepts write requests. The node that is currently the master node changes often. With a9s Consul DNS, the cluster can use a hostname that always resolves to the IP of the current master mode. This means the service binding requires no change when the master node changes.
- Act as a service registry: The a9s Data Services use a9s Consul DNS as a service registry for their internal components. The a9s Data Service Framework (DSF) also uses Consul watches to realize a choreography pattern within the microservice architecture.
- Provide infrastructure-independent load balancing using a9s Consul DNS round robin capabilities.
Why a9s Services Do Not Use NGINX/HAProxy
Although proxies can solve the issue of IP propagation, they also introduce a layer of complexity. For a service such as PostgreSQL, drivers expect only one node to contact, which makes a proxy a single point of failure and removes most of the advantages of high availability. Another idea is to deploy multiple proxies within a hot-standby setup, which adds even more complexity, as an IaaS specific failover needs to be implemented.
Why a9s Data Services Do Not Use Static IPs
Although it sounds tempting to use the IPs of service instances in service bindings, there is a major downside to that. Depending on the IaaS, it might not be possible to resurrect VMs with static IPs if the compute node containing it is currently down. The a9s DSF aims to support all network types. This includes dynamic networks in which crashed VMs get new IP address when they are restarted. In this case it would be necessary to recreate all service bindings.
Why we are not using Cloud Foundry's TCP router
Cloud Foundry's TCP router is a great component, which came up after we developed and used the a9s DSF. This framework is the core of all a9s Data Services, so we already had a solution in place. However, we had some internal discussions on whether to use the TCP router within the a9s DSF or not. As we see advantages and disadvantages in both solutions (using the a9s Consul DNS vs. using the TCP router) we consider the provisioning of both solutions in the future.
A realization of this could provide two different types of hostnames in the service bindings. One filed ("consul_hostname" or "consul_hostnames") that contains the hostnames which are resolved via the a9s Consul DNS and one field that contains the hostnames where the traffic is routed through the TCP router ("tcp_route" or "tcp_routes"). The platform operator could choose which hostname type is passed in the default "host" or "hosts" field of the service binding credentials. The applications could then pick the type of hostname(s) which is most appropriate.
We did not investigated in a prototype yet but we want to share our first thoughts on that.
- Using the TCP Router for an infrastructure independent load balancing would also be possible.
- Using the TCP Router for an infrastructure independent failover would also be possible.
- It introduces an additional hop in the network since the traffic is first routed to the TCP router and from there to the service instance. Using the a9s Consul DNS solution the traffic is directly routed to the service instance without any other component in between.
- A high available TCP Router setup requires a high available load balancer as an additional component. If you don't want all service instances to become available on the internet you need a high available load balancer that is not available on the internet.
- Since we want to provide dedicated service instances the TCP router would be a shared component through which the traffic of all service instances is going through. Using a dedicated TCP routers for each service instance would also be possible. This means there could be a service plan that provisions dedicated service instances but use a shared TCP router. Another plan could provisions a dedicated a TCP router for each service instance. For the last option it is required to dynamically setup a load balancer.