Machine Learning in SAP HANA is a great thing. With the two embedded Machine Learning libraries (PAL and APL) to choose from, which support all the core Machine Learning tasks, you’re more than well equipped to build data driven, intelligent solutions in SAP HANA.
Now, Machine Learning in SAP HANA becomes even better, when being coupled with an application. Only then, data driven insights can be integrated and leveraged where they are most needed – as part of business processes. While SAP S/4HANA offers a large selection of predefined intelligent scenarios and provides a standard tool kit for custom Machine Learning enhancements (LINK), SAP BW/4HANA customers sometimes struggle with starting Machine Learning initiatives on top of their existing Data Warehouse.
My colleague Tobias Wohkittel and I joined forces to help overcome these struggles. With this blog post, we intend to provide specific guidance with regards to how Machine Learning can be easily integrated with core SAP BW artifacts, leveraging all the standard tools provided with SAP BW/4HANA. Integration will be the focus of this article and we’ll keep the Machine Learning part very simple and based on default values.
You can find all our development artifacts, including the sample data, our Jupyter notebook and the ABAP code snippets in our samples GitHub repository. It is based on an SAP BW 7.53 system, running on SAP HANA SP05, and our Python API version 2.11.22010700.
Let’s back up for a moment and see, what we are about to do specifically.
Machine Learning has 2 core processes: Training a model and applying a model to receive forecasts / predictions (often referred to as Inference). We’ll bring both to live here. In doing so, our goal is to stay as close to SAP BW standards as possible with minimum custom coding. SAP BW has several great modeling objects and tools, that we will utilize for that:
The Advanced DataStore Object (aDSO) is the standard data-holding object, comparable to a database table. A Transformation defines rules to be executed against this data while moving it between two aDSOs. A Data Transfer Process (DTP) helps us trigger these Transformations. And lastly, there is ABAP as our native script language, and specifically ABAP Managed Database Procedures (AMDPs), that build the bridge between ABAP and SAP HANA, by wrapping SQLScript in ABAP code. You will find all of them in the below overview of our target architecture.
Looking at the picture, you will see that training and inference of the model both look very similar and follow the same scheme:
- An aDSO storing our input data
- A Transformation that handles either training or inference of the model
- A DTP to trigger the dataload between our source and target-aDSO
- An aDSO that receives the results, either the trained model or the derived predictions
Now, if you have a bit of SAP BW background, this scheme might look odd to you. In typical Data Warehousing scenarios and ETL workloads, a Transformation would only apply minor changes to the data, by performing calculations, string operations, conversions, and the likes. Here, input and output of the Transformation seem to be completely disconnected, but in the end, Machine Learning is in fact just one very complex transformation of data and the good news is: it works great and helps us stay within standard tooling.
With this plan all laid out, we are ready to take a look at the actual system now.
For this demo, we will use some made up salary data and predict the corresponding job level, an employee should be associated with, based on some demographic and employment data. The picture below shows an extract of this data.
From here on, we will leverage the tight integration between SAP HANA and SAP BW/4HANA, that allows our Data Scientist to work with native features in SAP HANA as well as Python, while our SAP BW expert keeps maintaining all structures with standard SAP BW tools.
To mimic a standard reporting scenario, the Data Scientist is only given access to a query based on the training data instead of the raw data table. This query automatically translates into an SAP HANA Calculation View that can be accessed using the Python client for SAP HANA Machine Learning. So, experimenting and working with the data from SAP BW feels natural to the Data Scientist. Since our target variable “T-Level” holds a few distinct values, this is a multi-class classification. The Hybrid Gradient Boosting Trees algorithm from the Predictive Analysis Library is perfectly suited to solve that challenge. As said, the Machine Learning part is not our focus here, so please refer to our other posts on how to best work with the libraries.
Once the best-fitting parameters for the case are determined, these need to be converted into SQLScript code and corresponding ABAP structures. This currently requires manual development effort. You can find some basic guidelines in this blog post.
You may also take our code example as a reference for starting you own project. We took a bit of a shortcut, by leveraging the AMDP generator from the hana-ml Python API (LINK), which is intended to support integration with SAP S/4HANA but provides us with an automatically generated AMDP code snippet that is close to what we need. It currently only supports Unified Classification calls, so it works well for us. For other algorithms/interfaces, it could serve as a starting point to get a feeling for the required ABAP structures and adapt as needed.
Let’s quickly revert to our architecture planning, to see what we actually need.
Our BW expert has already created the target aDSO for the model data.
Our training code is stored in an ABAP class that we derived from the generator or could have developed by ourselves. The next step is to set up the Transformation. It will have an expert routine (means custom coding), which holds the specific parameter for the model training and calls the training method of our training class. We currently just hardcoded the parameters in the expert routine, but of course, storing them in a separate table would be a suitable option as well.
The results of the training are then written into the model aDSO. To trigger execution of the Transformation, we are using the created DTP. For training, this DTP need to always run in “Full” mode, meaning all data is considered for the training, not only new ones, since we typically want to train with all data. Keep in mind, that the package size should be set to the maximum value (which is 2.147.483.647 records) or anything higher than the number of training records, to make sure all data come in one package. Otherwise, we would again just train with a fraction of the training data.
This is how our objects look in the system. You will see the 2 aDSOs and as well as the Transformation and DTP.
While the structure of the training data aDSO is defined by the respective data, the model aDSO follows the structure, defined in the reference for the PAL library (LINK).
Please note, that we have added a timestamp in the table. This will enable us to perform basic lifecycle management of our model, since we can easily identify the most recent version as well as its predecessors. Once executed, the aDSO will be populated with model contents and all consecutive runs will add new model versions with current timestamps.
To keep things easy, this is all we persist from the model training, but of course PAL provides us with all the standard statistics and metrics for such a model training. You will find respective structures in our ABAP code that would allow you to store this information alongside the core model table. In doing so, you could easily set up monitoring for long-term model performance and identify potential model degradation.
For now, we’ll assume our model is all good and we can move on to applying it to new data. As discussed before, this Inference process follows the same scheme as Training:
Our SAP BW expert has set up an aDSO for the newly incoming data, that we want to derive predictions for. It has the same structure as the Training aDSO but doesn’t contain our target column “T-Level”, of course. He also created a result aDSO to store our predictions. It holds information of the predicted class, as well as a confidence score and reasoning information for that decision.
The Transformation between the two calls the respective function from our external ABAP class, but in this case also reads the model content from the model aDSO we have populated before. The DTP for that Transformation runs in “Delta” mode here since we only want to get predictions for newly entered data. We could set it to “Full” mode as well, to generate fresh predictions for all cases with every run of the model.
The below picture shows, how our project structure looks now.
With that, we have everything together for the core Machine Learning workflow. Since the results are again stored in an aDSO, we have all the flexibility to continue working with them. Of course, they could be integrated in a Query or Composite Provider in SAP BW to become part of the Data Warehousing data flows and reporting tools. Since the aDSO, as most SAP BW/4HANA objects, translates into a native object in the SAP HANA database, we could also utilize it in any other reporting tools, that work directly with SAP HANA, like SAP Analytics Cloud.
If you would like to try out our scenario on your own, you can find all our components on our hana-ml sample Github. It holds the Jupyter notebook used for experimenting, the ABAP code for both expert routines and the training and predict class, as well as our sample data.
If you would like to learn more about the advanced analytical capabilities of SAP HANA, take a look at our community page for the SAP HANA smart multi-model capabilities.
Feel free to let us know your thoughts or questions in the comments below or reach out to Tobias Wohkittel or me directly for any questions.