It happens sometimes that there is an unexpected load and your application starts to hiccup. Yes, sure, somebody should always monitor and operate it. However, what if you would like people to focus on other tasks, while being at peace of mind that your cloud application is elastic?
So, let me describe in a few easy steps how you can create your own automatic application horizontal scaling system. This is done with a set of APIs. Oh, what is this? You’re being impatient? Then scroll down and grab the code directly from GitHub as a Maven project, have a look at it, and deploy it on your account. Good luck!
Now, for the rest of the people that are reading this, what we’re going to do is very simple: we shall create a tiny application that runs on SAP BTP, monitors whatever app of your choice (given that it belongs to you and is deployed on SAP BTP), takes a set of rules, runs them against your defined thresholds, and decides whether action is needed for starting or stopping additional instances of the application.
An architectural sketch of the described application looks something like this:
Let’s go through each component.
Front-end Management Console
You could design your app such that you need to specify which application to scale from the front-end. Of course, it’s up to you to decide what the front-end will be actually doing. You could imagine that the elastic scaler would look much like a service rather than an app, but for starters you’re just fine.
The UI we’re creating will use JQuery and Twitter Bootstrap libraries and will be very simple: 2 fields for specifying the subaccount and application to scale and a number of controls for updating the scaler’s back-end and for starting or stopping the monitoring. The UI looks like in the picture below, and in GitHub you’ll find the sources for the needed HTML and JS files that will go under the webapp folder.
We need at least a servlet for our scaler. This servlet will get the front-end requests and will talk directly to the Scaling Setup Manager, the heart of our application. We’ve designed the ScaleCentral class as a singleton. It holds data about the application being monitored while acting as a central proxy for the scaling actions and commands.
The rest of the mechanism works like described in the picture. For triggering various actions, we use HTTP request parameters. Thus, we’ll use a parameter named action that may take the following values:
- – “query” – this value should actually always be set for the initial back-end call because it sets the account and application that will be monitored (we enforce this from the front-end). Therefore, the following parameters should also be set: accountName and applicationName.
- – “start” – this value indicates that the monitoring of the specified application starts. A different thread is started that checks every 5 seconds (timer can be changed) if the application needs to be scaled up or down or simply run as it is. During the monitoring, we’re constantly pinging to back-end to get the status of the application in order to display it on our web-page (MonitorServlet). You could also use web sockets.
- – “stop” – this value indicates that the monitoring will stop.
- – “startApp” – this value indicates that a new process will be started for the application.
- – “stopApp” – this value indicates that a process will be stopped for the application.
There are a few interesting aspects that I’d like to mention further. One refers to the rule engine. The rule engine here is very rudimentary, but it does the job, at least for the applications that we’ve tested. The rule engine always checks whether scaling up or scaling down is possible. The monitored parameters for which it performs the checks and their thresholds (for down and up-scaling respectively) are defined in the params.properties file and they are used in conjunction (logical AND). For instance,
means that the rule engine will check the metric named cpu_utilization and it will return true for the down-scaling rule if the value is below 25 and true for the up-scaling rule if the value is above 50.
But where does the rule engine gets the values of these parameters from? Alright, now comes the second point I am going to discuss.
First, in order to get the current application metrics, we are using the Metrics API (https://api.hana.ondemand.com/monitoring/v1/documentation). This API is one of the platform-embedded APIs that prove to be very helpful for developers. It is REST-based, and it brings information about the status and metric details about a Java application and its processes. It also does it for all the running applications without affecting their performance. Pretty neat, right?
While speaking of platform APIs, whenever you need to scale up or down the monitored application, you need to trigger an action like process start or process stop. For that, you use an API that allows you to get information about an application, to start, or to stop its processes (although you could do several things more, like deploying, un-deploying or changing applications). This is namely the Java Application Lifecycle Management (Java ALM) API.
The code is available in GitHub and pretty much speaks for itself as it is not very complicated. You can use it as you wish, play around with it, or enrich it.
My hope is that you’ve learned how to create a very lightweight elastic scaler for applications using not too much code with the help of the platform-embedded APIs.