EDUCAÇÃO E TECNOLOGIA

Project “Sailor” – Enabling Machine Learning Extensibility

Project ‘Sailor’ is a brand-new offering developed in close collaboration with customers and seasoned data scientists. In the past, we have repeatedly encountered shortcomings during data exploration, rendering that initial part of our projects cumbersome and time-consuming. Our goal is to provide maximum flexibility in terms of accessing and combining data. Data scientists can formulate their respective information need, whilst staying within their own frame of reference.
Project ‘Sailor’ is suitable for data scientists with SAP, partner organizations, and in advanced customer settings. The software enables them to develop their own Machine Learning flows, based on data from SAP systems. In the first release, the focus is on use cases in the scope of SAP Predictive Asset Insights (using SAP Internet of Things and Asset Central as backends). We plan to extend the scope to other applications going forward.
We have developed a Python SDK to make data exploration easier and more fun. Project ‘Sailor’ hides the complexity of data access, authentication, and consuming data via APIs from the data science users. In our experience, few data scientists will enjoy working on low-level technical details such as UAAs (Cloud Foundry User Account and Authentication Server), token exchange (of HTTP- and JSON-based Security Token Service), and the like. In a realistic environment, even small and exploratory data requests may easily require ten lines of code. To make matters worse, the APIs are routinely conceived with developers in mind, rather than data scientists. Project ‘Sailor’ hides those details, along with the complexity of building API requests.

from sailor.assetcentral import find_equipment equipment_set = find_equipment(model_name='my_model_name', location_name='PaloAlto')
equipment_set.as_df()

Without ‘Sailor’, the data scientist would have to deal with the API access herself:

from rauth import OAuth2Service
import json
import pandas as pd service = OAuth2Service(name='AssetCentral', client_id='client_id', client_secret='client_secret', access_token_url='access_token_url')
access_token = service.get_access_token( method='POST', decoder=json.loads, key='access_token', data={'grant_type': 'client_credentials'}) session = service.get_auth_session( method='POST', decoder=json.loads, data={'grant_type': 'client_credentials', 'access_token': access_token})
session.headers = {'Accept': 'application/json'} result = session.request( 'GET', 'https://<endpoint>/services/api/v1/equipment?%24filter=modelName' '%20eq%20%27my_model_name%27%20and%20location%20eq%20%27PaloAlto%27&' '%24format=json')
equipments_df = pd.DataFrame(result.json())

Doing data science on real-life raw business data is a highly creative process. Previously available tools have imposed unnecessary limitations e.g. by providing pre-aggregated data only, along with preconceived analytics. ‘Sailor’ moves beyond those boundaries by building a bridge over to the raw data, facilitating exploratory data analysis and promoting improved machine learning models.

‘Sailor’ provides a set of tools which can be used to read data seamlessly from SAP Asset Central and SAP IoT. The data scientist can then proceed with exploring the data, visualizing relevant content, and building models for the use case at hand.

Architecture%20of%20project%20Sailor

Architecture of project ‘Sailor’

In order to use ‘Sailor’, you will need access to SAP Asset Central and SAP IoT Services along with your own installation of Python. To get started with ‘Sailor’, you can just use ‘pip install sailor’ in your Python environment. The required Python packages will be automatically installed as part of the ‘Sailor’ installation. Pandas DataFrames are our standard format, so the pandas library should be imported as well. In your subsequent configuration of SAP backends and their respective APIs, you have the option of specifying either a JSON string or a YAML file.

You can then identify the equipment of interest and apply various filters, e.g. by location. You can take advantage of several convenience functions which provide representations of objects as pandas DataFrames, and which facilitate plotting and charting of your data. ‘Sailor’ contains numerous options for visualizations. Our software also allows you to investigate sets of equipment which are modelled jointly as systems, and equipment events pertaining to your use case, e.g. work orders or notifications. To give an example, you can easily visualize the distribution of (maintenance) notifications across equipment and time:

notification_set.plot_overview()

Visualizing%20the%20distribution%20of%20notifications

For many use cases like anomaly detection, failure prediction, or remaining-useful-life prediction, you may want to investigate a machine’s sensor data, customarily stored in the form of indicators (i.e. descriptions of measured values). Additionally, these indicators can be used to conduct analyses across multiple pieces of equipment. ‘Sailor’ enables you to retrieve timeseries data for indicators applicable to your use case in a straightforward manner. You can then visualize and plot the data via transformation into pandas DataFrames. For your convenience, ‘Sailor’ provides functions for typical plots as part of the package. Moreover, ‘Sailor’ enables you to literally use the same method as a starting point for visualizations, as well as for building machine learning models. For instance, you can first gain a preliminary understanding of the time series data in question and then continue with training an isolation forest for detecting anomalies.

Project ‘Sailor’ allows you to easily access data from your SAP Digital Supply Chain software products for data science projects like predictive maintenance or master data analysis. Once your data is available, you are free to choose how to work with it. ‘Sailor’ comes with several predefined functions to support you in exploring your data. Adding to that, you can create custom plots or even build your own machine learning models. You can learn more about the specific steps on our tutorial page.

‘Sailor’ provides you with a set of functions out of the box, but most importantly it facilitates flexibility and extensibility. Our goal is to make it as convenient as possible for you (as a professional data scientist) to focus on the use case, the algorithms and the machine learning models. 

For more information on ‘Sailor’, please refer to:

We welcome all contributions either in form of issues, code contributions, questions or any other formats. For those, please reach out or refer to the Contributing Page in the documentation.