SAP Data Intelligence to Train, Export, Serve & Inference Machine Learning Models

In this blog post, You will learn how to Train, Validate, Export, Serve & Inference a simple Machine Learning model using SAP Data Intelligence, Our primary objective here is to experience various features of SAP Data Intelligence product and not to build a best Machine Learning model, therefore lets take a very simple machine learning use case to experience.

1. Manage DataSet

In this section you will learn how to upload and manage a dataset that will be used in the training phase later.

Dataset for this tutorial can be downloaded here

2. Train Model

In this section, you will learn how to create a Machine Learning Scenario and spin up a Jupyter  Notebook instance to train any machine learning model.

Install Required Python Libraries

!pip install scikit-learn==0.22.2 !pip install seaborn==0.10.0

Imports

import os import re import csv import random from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.metrics import accuracy_score from sklearn.metrics import confusion_matrix import joblib from sklearn.model_selection import train_test_split import pandas as pd import sapdi import shutil import matplotlib.pyplot as plt import seaborn as sns import requests import json from sapdi import tracking %matplotlib inline

Load Data from DataLake via DataSet Manager

ws = sapdi.get_workspace(name='suresh-ws') dc = ws.get_datacollection(name='gender-collection') with dc.open('gender.csv').get_reader() as reader: df = pd.read_csv(reader)

Split Male and Female Data for Visualizing

is_male = df['male(1-male 0-female)']==1 is_female = df['male(1-male 0-female)']==0 male = df[is_male] female = df[is_female] female = female.head(-13) print(male.sample(3)) print(male.shape) print(female.sample(3)) print(female.shape)
male_weight = male[['weight(kg)']] male_height = male[['height(cm)']] print("{} {}".format(male_weight.shape, male_height.shape)) female_weight = female[['weight(kg)']] female_height = female[['height(cm)']] print("{} {}".format(female_weight.shape, female_height.shape))

Visualize Male and Female Weights(Kg)

plt.figure(figsize=(18, 6)) x_range = [range(0, 247)] plt.scatter(x_range, male_weight, color='r', alpha=0.5, s=125) plt.scatter(x_range, female_weight, color='g', alpha=0.5, s=125) plt.xlabel('Range') plt.ylabel('Weight') plt.show()

Visualize Male & Female Height(Cm)

plt.figure(figsize=(18, 6)) x_range = [range(0, 247)] plt.scatter(x_range, male_height, color='r', alpha=0.5, s=125) plt.scatter(x_range, female_height, color='g', alpha=0.5, s=125) plt.xlabel('Range') plt.ylabel('Height') plt.show()

Extract Features and Target

y = df.pop('male(1-male 0-female)') print(y.sample(5)) X = df print(X.sample(5))

Visualize Feature Correlation

sns.heatmap(X.corr(), annot=True) plt.show()

Split Train & Test Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Training – SVM

model = SVC() model = model.fit(X_train, y_train) train_accuracy = model.score(X_train, y_train) * 100 test_accuracy = model.score(X_test, y_test) * 100 print('Accuracy of Training Set: {:.2f}'.format(train_accuracy)) print('Accuracy of Test Set: {:.2f}'.format(test_accuracy)) y_pred = model.predict(X_test) cm = confusion_matrix(y_test, y_pred) metrics = { "training_accuracy": train_accuracy, "test_accuracy": test_accuracy } run = tracking.start_run(run_collection_name="gender") tracking.log_metrics(metrics) tracking.set_tags({"algo": "SVC"}) tracking.end_run()

Training – DecisionTreeClassifier

model = DecisionTreeClassifier() model = model.fit(X_train, y_train) train_accuracy = model.score(X_train, y_train) * 100 test_accuracy = model.score(X_test, y_test) * 100 print('Accuracy of Training Set: {:.2f}'.format(train_accuracy)) print('Accuracy of Test Set: {:.2f}'.format(test_accuracy)) y_pred = model.predict(X_test) cm = confusion_matrix(y_test, y_pred) metrics = { "training_accuracy": train_accuracy, "test_accuracy": test_accuracy } run = tracking.start_run(run_collection_name="gender") tracking.log_metrics(metrics) tracking.set_tags({"algo": "DecisionTreeClassifier"}) tracking.end_run()

Training – RandomForestClassifier

model = RandomForestClassifier() model = model.fit(X_train, y_train) train_accuracy = model.score(X_train, y_train) * 100 test_accuracy = model.score(X_test, y_test) * 100 print('Accuracy of Training Set: {:.2f}'.format(train_accuracy)) print('Accuracy of Test Set: {:.2f}'.format(test_accuracy)) y_pred = model.predict(X_test) cm = confusion_matrix(y_test, y_pred) metrics = { "training_accuracy": train_accuracy, "test_accuracy": test_accuracy } run = tracking.start_run(run_collection_name="gender") tracking.log_metrics(metrics) tracking.set_tags({"algo": "RandomForestClassifier"}) tracking.end_run()

Training – AdaBoostClassifier

model = AdaBoostClassifier() model = model.fit(X_train, y_train) train_accuracy = model.score(X_train, y_train) * 100 test_accuracy = model.score(X_test, y_test) * 100 print('Accuracy of Training Set: {:.2f}'.format(train_accuracy)) print('Accuracy of Test Set: {:.2f}'.format(test_accuracy)) y_pred = model.predict(X_test) cm = confusion_matrix(y_test, y_pred) metrics = { "training_accuracy": train_accuracy, "test_accuracy": test_accuracy } run = tracking.start_run(run_collection_name="gender") tracking.log_metrics(metrics) tracking.set_tags({"algo": "AdaBoostClassifier"}) tracking.end_run()

Explore Training & Test Accuracy Captured via Tracking API’s

sc = sapdi.get_current_scenario() run_data = tracking.get_runs(scenario = sc,notebook = sapdi.scenario.Notebook.get(notebook_id="gender.ipynb")) lst = list() for r in run_data: lst_data = list() lst_data.append(r.tags.get("algo")) for m in r.metrics: lst_data.append(m.get("value")) lst.append(lst_data) mdf = pd.DataFrame(lst, columns =['algo', 'train_accuracy', 'test_accuracy']) mdf

Use Metrics Explorer to Visualize Training & Test Accuracy Captured via Tracking API’s

3. Save & Export Model

In this section, You will learn how to save the trained model as pickle file and export as ZIP content which will be used as an artifact to deploy and serve the model in the next phase.

Prepare Required Directory for Save & Export

curr_dir = os.getcwd() exporter_content = os.path.join(curr_dir, "exporter_content") exported_content = os.path.join(curr_dir, "exported_content") zip_file_path = os.path.join(curr_dir, "exported_content/gender_1.zip") unzip_folder_path = os.path.join(curr_dir, "exported_unzip_content") print(exporter_content) print(exported_content) print(zip_file_path) print(unzip_folder_path)

Save Model

if os.path.exists(exporter_content) and os.path.isdir(exporter_content): shutil.rmtree(exporter_content) os.makedirs(exporter_content) joblib.dump(model, 'exporter_content/gender.pkl')

Create Required Dependencies for Serving the Model

%%writefile exporter_content/pip_dependencies.txt scikit-learn==0.22.2 joblib==0.14.1

Create Predictor Class for Serving the Model

%%writefile predictor.py from sapdi.serving.pymodel.predictor import AbstractPyModelPredictor import joblib import json class GenderPredictor(AbstractPyModelPredictor): def initialize(self, asset_files_path): self.classifier = joblib.load(asset_files_path+ '/gender.pkl') def predict(self, input_dict): age = input_dict.get("age") weight = input_dict.get("weight") height = input_dict.get("height") real_value = list([[float(age), float(weight), int(height)]]) predicted = self.classifier.predict(real_value) res = int(predicted[0]) return {'result': {'gender': res}}

Test the Predictor Class

from predictor import GenderPredictor predictor = GenderPredictor() predictor.initialize('exporter_content/') payload = {"age": 37, "weight": 75, "height": 167} predictor.predict(payload)

Export Model for Serving using SAP DI SDK

if os.path.exists(exported_content) and os.path.isdir(exported_content): shutil.rmtree(exported_content) os.makedirs(exported_content) from sapdi.serving.pymodel.exporter import PyExporter from predictor import GenderPredictor exporter = PyExporter() exporter.save_model( name = "gender", model_dir_path = exported_content, func=GenderPredictor(), source_path_list=[os.path.join(curr_dir,"predictor.py")], asset_path_list=[os.path.join(curr_dir, "exporter_content/gender.pkl")], pip_dependency_file_path=os.path.join(curr_dir, "exporter_content/pip_dependencies.txt")) if os.path.exists(unzip_folder_path) and os.path.isdir(unzip_folder_path): shutil.rmtree(unzip_folder_path) os.makedirs(unzip_folder_path) shutil.unpack_archive(zip_file_path, extract_dir=unzip_folder_path)

Create Artifact

from sapdi.artifact.artifact import Artifact, ArtifactKind, ArtifactFileType artifact = sapdi.create_artifact( file_type=ArtifactFileType.FILE, artifact_kind=ArtifactKind.MODEL, description="Gender Model", artifact_name="gender", file_name=os.path.basename(zip_file_path), upload_content=zip_file_path ) print('Model artifact id {}, file {} registered successfully at {} \n'.format(artifact.artifact_id, zip_file_path,artifact.get_uri()))

4. Serve Model

In this section, you will learn how to deploy the exported model artifact into the SAP Data Intelligence platform which will expose REST endpoint for making real time inference request.

Here we are using Model Serving Operator in the graph which is specially designed to serve any complex machine learning models in a scalable way.

  • in the model serving operator configuration, use “mlserving-1.1” as value for the filed Model Runtime, leave the rest to default.

5. Inference Model

In this section, you will learn how to inference the deployed model through the exposed REST endpoint.

Inference Model

  • replace the content “<REST API URL>” with a valid REST endpoint url which you will get from the previous section on successful deployment
  • replace the content “<XXXXX>” with a valid Base64 encoded user credential to access the deployment.
payload = {"age": 37, "weight": 75, "height": 167} url = "<REST API URL>" headers = { 'Content-Type': 'application/json', 'X-Requested-With': 'Fetch', 'Authorization': 'Basic <XXXXX>' } response = requests.request("POST", url, headers=headers, data = json.dumps(payload)) interpret = response.json().get("result").get("gender") print("MALE" if interpret==1 else "FEMALE")

Conclusion:

Successfully we managed to build an end to end Machine Learning pipeline starting from Data Upload, Data Visualization, Model Training, Model Metrics, Model Export, Model Serve & Model Inference.