It is important and evident for systems and applications to be enterprise-grade ready and always available for businesses. The different aspects of Enterprise Readiness are to provide zero downtime maintenance, quality and operations guided by Industry standards while ensuring that the platform provides capabilities to build resilient applications and business continuity with high availability and disaster recovery. But it may not be entirely possible to keep them available all the time as there can be downtimes during planned outages for upgrading & maintaining the applications or unplanned outages such as application failure due to a bug. Keeping the availability goals, it is required to deploy and configure services and applications across multiple instances and availability zones and also plan to implement resilient principles during cloud-native application development.
SAP’s offering for High Availability & Resiliency
There are a few ways to reduce these downtimes, and one approach is to set up a multi-region architecture. Let’s first see SAP’s offerings for SAP BTP Services, and we will deep dive into the need for the multi-region architecture in detail, followed by the links to reference step-by-step tutorials.
Platform level – Availability Zones
SAP BTP services support the Availability Zones (AZ) for single region high availability. Availability zones (AZ) are single failure domains within a single geographical region and are separate physical locations with independent power, network, and cooling.
As shown above, multiple AZs exist in one region and are connected with each other through a low-latency network. SAP BTP service/application instances are deployed in multiple AZs in a single region, and if there is an issue within one of the AZ, the other AZ serves the users. This comes by default for most SAP BTP services without any need for additional infrastructure or development with configurations during setup.
Application level – Scaling of the application.
You can increase the availability of your application deployed in the SAP BTP cloud foundry environment by running multiple instances manually or using the SAP Application Autoscaler service. This SAP BTP service automatically increases and decreases the application instances based on the maintained metrics. Then Cloud Foundry distributes these instances in different AZs, continuously monitors the health of these instances, and restarts the ones considered unhealthy.
Learn more about these features from SAP Help.
The high availability achieved using the AZ concept is limited to a single region. This will not be sufficient for mission-critical applications if there is an outage across the region(unplanned) or during major upgrades and maintenance. It is also possible that the customer has a global presence, and their users are located across different regions. This will increase the latency if the application’s data center and the users are not located in the same region.
This can be addressed by implementing the multi-region architecture, and the key components needed to achieve it are:
- Geographic Redundancy – Underlying critical elements of the applications/systems are deployed in different regions, so another region’s system can take over in case of a regional outage.
- Health Monitor – Continuous monitoring of the critical components to react to failures.
- Failover Mechanism/Load Balancer – Intelligent system to switch to the active component in case of failure.
In this setup, we create a common URL from a custom domain instead of the URL provided by the SAP BTP services/applications. We then deploy the application across multiple regions(subaccounts) and then leverage load balancer configurations to route the requests intelligently from the custom domain URL to the healthy application based on the health checks. In case of a failover, the switch to the healthy application will be seamless as the URL accessed by the user remains the same. In the background, the requests will be routed to the healthy service based on the maintained configuration.
With the help of geographic redundancy, even if there is an outage across the whole region, the other region’s application will be served to the users, eliminating the single point of failure. The same architecture can also address global users’ latency issues or divide the load on the services equally between different regions.
High-Level Implementation Steps:
For implementing this use case, key design considerations are mentioned below.
- Applications/Service configured in two subaccounts.
- A custom domain URL as the single point of access to the service.
- Configuring & mapping the custom domain to the application/service routes using SAP BTP Custom Domain service.
- Using DNS-based traffic managers such as Azure Traffic Manager or AWS Route 53 as the intelligent component to monitor the health of the services and route the user requests when failover to the available service in another region.
- (Recommended) Keeping tenants in sync using SAP Cloud Transport Management service
While implementing this architecture, consider the subscription costs for services set up in different subaccounts and ensure that the applications are always in sync between the two subaccounts. For a seamless switch in failover scenarios, it is required to configure SSO for the subaccount using SAP IAS.
We have created the multi-region architectures with detailed step-by-step documentation(GitHub) for the different SAP BTP services & applications using Azure Traffic Manager & AWS Route 53.
- Multi-region HA for SAP Launchpad service using Azure Traffic Manager.
- Multi-region HA SAP Launchpad service using AWS Route 53.
- Distributed Resiliency on CAP application using SAP HANA Cloud Database & Azure traffic manager.
- GitHub: (WIP)
- Blogpost: (WIP)
- Distributed Resiliency of CAP applications using Amazon Aurora Database & Amazon Route 53.
- GitHub: (WIP)
- Blogpost: (WIP)
- Multi-region HA SAP Cloud Integration using Traffic Manager.
Although the SAP BTP Cloud Foundry application & platform-level services help in providing high availability & scaling for the applications, the cloud applications are still dependent on a single region cloud. So it is crucial to build mission-critical applications using multi-region architectures to tackle the issues such as region-wide failure or geographic latency.
Follow this blog post for future updates on the multi-region reference architectures, blogposts & tutorials. Let us know if you have tried a similar setup in your landscape or would like to learn more about this to implement in your landscape.