The previous blog provided an overview of how to use AI Core & Launchpad to perform necessary configuration & train ML models. This blog covers the model serving & manage aspects of the ML lifecycle.
The process for serve is like train. Again, we have an ML code that resides in the docker image that we will push to our cloud repository. Again, we also have the yaml file which will contain the configuration details for our AI workflow. This time around, instead of execution, we will set up a deployment. This makes the model available for inference from external client applications. We will demonstrate this with Postman.
Step 3.1 Publish serving docker
Again, the docker contains 3 files:
- ML code where we use Flask as a serving engine. We read the model, set up the steps to be followed for the post request, read the data from the flask request json and send the prediction as output.
- The packages that the ML code has dependencies on in a file called requirements.txt.
- Docker file with instructions on how the image should be created (create folders for ML code, copy the code to relevant folders, install dependencies in docker & provide permissions to execute the script in the folder).
Step 3.2 Create serve workflow
Much like the yaml file in the training, the serving yaml is a configuration file for the sequence of steps AI Core needs to follow at the time of deployment.
- In metadata,
- we provide a name which is an execution ID that should be unique across all workflows being referenced in your GitHub repository.
- we provide a scenario name & executable name that you will reference later during configuration to identify this AI workflow.
- In spec,
- template/metadata/labels: We mention the infrastructure to be used in resource plan. The starter plan brings 1 CPU and 3GB of Memory. Refer the full list of values you can provide to understand infra options from CPU to GPU. If you do not mention the resource plan (as in the training template) the “Starter” plan is used by default.
- template/spec/predictor/imagePullSecrets/name: We provide the name of the docker registry you added in the Configure step.
- template/spec/predictor/containers/image: We provide the path reference to the docker image in our private repository
- template/spec/predictor/containers/env/value: We provide the model which will be input during configuration and passed at runtime to the docker image
Step 3.3 Deploy using configuration
In this step, like in training we create the configuration. Instead of the execution in training, we create a deployment instead.
- We provide the following information to setup configuration:
- Scenario name, template name and version number defined in the yaml file (aka serving AI Workflow)
- We do not have input parameters in this use case, so we hit next
- Map the model artifact that will be passed to the docker image based on information in the yaml file
- Once the configuration is created, we create a deployment. Once the deployment is running, the url is available for inferencing by client applications.
Step 3.4 Test inference
Time to test if an application, external to AI Core can access the model via the deployed url! We use postman her, but you could use Swagger, Insomnia or any other solution of choice.
- We copy over the deployment URL from AI Core and suffix it with “/v2/predict”.
- Ensure headers has AI Resource Group set to “default” and Authorization set to “Bearer <token>”
- Add payload in json format to the body, with data in the format the ML code is expecting
- Hit send to receive the prediction response from AI Core
There are currently 2 features that are handy to monitor deployed models, that are described briefly below.
Step 4.1 Monitoring metrics
We can add a few lines of code to capture metrics in the ML code. This makes monitoring these metrics possible in AI Launchpad.
- To the ML code, we add the following snippets:
- We import some packages from the ai core sdk that allow tracking metrics
- We initialize the aic_connection variable. The parameters required by this function are passed in at runtime.
- We use the metrics logging method to capture number of observations
- We insert some code to perform k-fold validation with 5 folds. This is essentially a way of breaking the train dataset into 5 chunks. We train each time on 4 folds and test on the 5th fold that has been held out. Eventually we discard this model, but the mean of the evaluation scores are indicative of the quality of the model. If the scores fluctuate a lot, it also means the model is not consistent. In this code, we use R2 (aka coefficient of determination) as the evaluation score
- We fit the final model on the train data and also log the R2 score on the test data into metrics
- We log the feature importance values into metrics.
- Lastly, we save some tags for the execution into metrics, to indicate that “k fold validation” was performed and “R2” scores were calculated.
- The rest of the steps are the same as with training the model. We build the docker image & push it to the cloud. The yaml file does not require any edits, so we leave it untouched. We create a new configuration and corresponding execution.
- Once the execution is completed, you can see the Metrics Resource tab populated with values from the execution.
- You can also click on all executions and click on view metrics to visualise the metrics with simple charts.
Step 4.2 Updating deployments
It happens often that a deployed model needs to be updated because a better model is available. In such cases, if we stop the old deployment and create a new one, the deployment URL changes. Subsequently all client applications will need to update their calls with the latest deployment URL. To avoid this, AI Launchpad allows a deployment to be updated, such that only the configuration is changed to a new one, while keeping the deployment URL unchanged. To enable this, we follow the below steps
- Create the updated configuration using the new yaml file or new model
- Go to current deployment and click on update, to map the latest configuration. Clicking on update again re-deploys the model, while retaining the old deployment URL
This blog is a summarised version of our detailed tutorials, intended as a visual glimpse of how the end to end process of ML Ops looks like on SAP AI Core & AI Launchpad.
In essence, SAP AI Core and SAP AI Launchpad are SAP BTP services to confidently deploy and manage AI models that natively integrate with SAP applications. It helps centralise AI lifecycle management with support for training, deploying and monitoring AI models in production. There are some particularly interesting features in the roadmap, so stay tuned and watch out for the next updates from our product team!
Suresh Kumar Raju for reviewing the videos and suggesting very useful edits.
Priyanshu Srivastava for help with understanding concepts of the product.
Dhrubajyoti Paul for help with understanding the serving tutorials.