Manufacturing analytics: The case for integrated data management and upfront IoT data capture cost estimation

IoT data capture in production lines is quite an effort. Not all production and test machines are equipped with sensors or open APIs to publish machine data. Additional sensors might need to be installed and existing production machines ‘upgraded’. The network in the factory might not be fast enough to extract all the data. Therefor normally high investments are needed to extract IoT data. Therefore, an IoT data project should be based on a well-founded business case.

Looking at the broader picture IoT data need to be combined with data from Business Systems like Enter Resource Planning systems, in order gain new insights for the business and enable overall performance optimization of businesses.

Before spending money to implement an IoT-solution you should run pilot projects on data from pilot machines to build the business case to invest in IoT-data capture.

Further I propose that for use cases which require the combination of data from IT-Systems (i.e. SAP ERP, SAP S/4 HANA) and OT (Operational Technology data from the shop floor) an integrated data (lake) approach is to be preferred to an data lake only approach. (The data lake still to be leveraged to store raw data, time series, and unstructured data like pictures.)

SAP delivers the components to enable the integrated data (lake) approach. Customer have applied it.

Disclaimer: SAP offers out of the box manufacturing analytic capabilities in standard solutions (i.e. SAP Digital Manufacturing Cloud) and also predictive/machine learning capabilities ( i.e. SAP Predictive Asset Insights).This blog is focused on use cases which are not covered in standard offerings.

In my role as solution advisor for platform and data management I have been involved in discussions with customers implementing Industry 4.0 scenarios.

Most scenarios I discussed with customers required the combination of ERP and IoT-data. These are two important data sources in manufacturing:

  • data from the shop floor managed by Operational Technology and OT teams in factories
  • business data managed by IT departments – i.e. ERP

In my point of view these are the scenarios which promise the highest value for the business. Enabling the optimization of the business process across the silos. I.e. addressing use cases like predictive maintenance and predictive quality data from both sources are required. The ERP system typically stores information on customer orders, maintenance operations, materials und OT systems will provide all sensor und machine IoT data. Based on this data scientists will i.e. build models to predict machine failure, which can be leveraged to develop better plans for maintenance and even could be leveraged to create replacement orders for spare parts in the ERP system automatically. Machine errors during production can be avoided. Less money is spent on maintenance and machines will produce less defects. And more business benefits can be achieved.

Most customers are not yet on the level to apply machine learning but are focusing on enabling production workers and production line managers to display for example correlations of production quality data and IoT-Data, enabling them to optimize machine set points for the best quality output.

Most of the customers I discussed with where planning to implement an IoT-Solution and store the machine data in a data lake provided by a hyperscaler as a first step. They were investing in pilot programs leveraging hyperscaler technology.

Storing as much as possible data of an organization in a data lake is an established approach to enable data scientist to run machine learning experiments and discover insights for the business and detect golden nuggets.

This makes total sense, when planning to store raw data, time series, and unstructured data like pictures, for running machine learning models, but there are some caveats along the road from implementation to value with this data lake only approach, especially when the combination of IT and OT-data (see above) is required.

  • Capturing data from a production line requires high investments (hardware, network, IoT-software, implementation services, …). A lot of money is spent to get all production and quality check machines in the production line equipped with sensors and get these data and the machine data uploaded to the data l Only for the data scientist to find out, that most of the data is irrelevant as input for machine learning models.
  • Besides the OT-data IT-Data need to be loaded and kept in sync with the data lake, where it is stored out of context of the transactional system and business process.
  • Data lake data can’t be directly leveraged by business users like production line managers. Aligned data models of source systems (OT and IT) need to be developed by experts and make available in business analytic software. In hyperscalers this requires the implementation often another set of services.
  • The same is true for data scientists leveraging data in a data lake, they spent most of their effort for data preparation, aligning different data models of source systems (OT and IT) and data cleansing. Based on the cleaned and aligned data golden nuggets are detected and machine learning models with high confidence level developed.
  • But the real value for the business can only be realized when business insight and/or the developed machine learning models can be applied in the source systems. Or less technical expressed, integrated in the business process, either for automating processes or enabling the business user to make better decisions. For this to work there must be a loop back from the data lake to the source system build into the system architecture. This is often not supported with the adopted data lake solution.

Specially the last point on value realization is hard to achieve in a data lake only approach.

The main amendments to a data lake (only) approach I propose are the following:

  1. Develop analytic dashboards and/or run machine learning on a relevant set of data from pilot installation. For storing raw data, time series, and unstructured data like pictures a data lake can be leveraged. Build the business case for investment in IoT-data capture in production lines.
  2. Develop an aligned data model combining business data and ‘other’ data, i.e. machine data.
  3. Store the data based on the aligned data model and leverage virtualization technologies for aligned data models cross silos of data in business systems, avoiding data duplication. The data lake can be leveraged to store raw data, time series, and unstructured data like pictures.
  4. Provide analytical dashboards and/or apply machine learning models on the aligned data model and build integration to the business systems.

Rather discussing the approach in a generic way, I show how customers running SAP Solutions for ERP (SAP ECC, S/4HANA), and optionally for MES (SAP MES) can implement the discussed approach with products from the SAP portfolio.

1. Develop analytic dashboards and/or run machine learning on a relevant set of data from pilot installations and build the business case for data capture

For Machine Learning SAP offers a wide portfolio of technologies. To get an overview follow the SAP community on machine learning: Machine Learning and Artificial Intelligence | SAP Community Here is an detailed blog of what SAP HANA Cloud has to offer in the machine learning context: SAP HANA Machine Learning Resources | SAP Blogs

SAP Internet of Things portfolio can be leveraged to acquire shop floor data.

It is not required to use SAP tools to do training for machine learning models. This can make a lot of sense, when data form SAP systems play a dominant role for model development. But the main point I want to make is that training for machine learning should be done on test data to build the business case for investments in data capture. For storing raw data, time series, and unstructured data like pictures a data lake can be leveraged.

2. Develop an aligned data model combining business and ‘other’ data

The aligned data models are developed in the pilot phase. For business data the data model in SAP S/4 HANA and/or SAP MES can be reused. Customer leveraging SAP Internet of Things can reuse the data model provided in this area.

3. Store the data based on the aligned data model and leverage virtualization technologies

SAP HANA Cloud can be leveraged for to implement the cross domain models, since SAP HANA Cloud supports out of the box integration to SAP Systems (S4/HANA, SAP Business Warehouse, SAP MES,..), and offers connection to non-SAP Cloud system. Data Models of the SAP Source System can easily be leveraged, avoiding loading data into a data lake and remodeling the data.

Besides this SAP HANA Cloud offers an integrated Relational Data Lake to store structured and unstructured data, and an integration to hyperscaler storage solutions. Here you find an overview.  SAP HANA Cloud, data lake Overview Especially SAP HANA Cloud Data Lake can be leveraged to store the high value data defined by the business case discussed above. Hyperscaler storage solutions can be leveraged in this context to store the sensor raw data and images produced by quality machines in the production line.

SAP HANA Cloud offers strong virtualization capabilities (Smart Data Integration and Smart Data Access) data models with virtualized access to SAP and Non-Systems can be build.

Smart Data Access (SDA) | SAP Blogs

Implementing SDI with SAP HANA Cloud… concept and approach…. | SAP Blogs

4. Provide analytical dashboards and/or run machine learning models on aligned data model

The aligned data model can be directly connected to SAP Analytic Cloud, which gives business user an easy access to the data and enables i.e. manufacturing analytics based on a powerful frontend. SAP Analytic Cloud provides a comprehensive, feature rich, business user oriented analytic solution, it brings together analytics and planning in a single solution in the cloud. Business users are enabled to customize own stories and analysis. More information on SAP Analytic Cloud can be found here: SAP Analytic Cloud

SAP HANA Cloud Machine Learning models can be run on the aligned data model either directly in SAP HANA Cloud or leveraging other parts of the SAP Machine Learning portfolio:  SAP HANA Machine Learning Resources | SAP Blogs/ Machine Learning and Artificial Intelligence | SAP Community

Scores of machine models can be stored in SAP HANA cloud and added to reports and analytic stories in SAP Analytic Cloud.

Since SAP HANA Cloud is part of the SAP Business Technology Platform all services in the SAP BTP can be leveraged to act on Data in SAP HANA Cloud. Application and Extension can be built in the SAP BTP.

%28Diagram%20provided%20SAP%20Customer%20Advisory%20/%20SAP%20IoT%20Architect%20team.%29

(Diagram provided SAP Customer Advisory / SAP IoT Architect team.)

The system architecture above shows an exemplary architecture with SAP components.

The various colors depict different data types like device raw data (best stored in hyperscaler storage), pictures form quality test machines (best stored in hyperscaler storage), production data (managed in a Manufacturing Execution System), structured data (device data, relevant for analytics (stored in SAP HANA Cloud), master data (maintained in SAP S/4 HANA, best accessed via virtualization in SAP HANA Cloud), picture metadata with reference to pictures (stored in SAP HANA Cloud).

The aligned data model is maintained in SAP HANA Cloud (in the middle) which connects via connection service to the IT/Business Systems (S/4 HANA) (on the right) leveraging virtualization and replication capabilities of SAP HANA Smart Data access (SDA) and SAP HANA Smart Data Integration (SDI). Data in SAP Manufacturing Execution and SAP Manufacturing Integration and Intelligence can be accessed as well by the connection service in the SAP Business Technology Platform.

On the left the systems running in a production side are displayed. The SAP Plant Connectivity (SAP PCo) client and/or other SAP IoT edge components can be leveraged to push structured data (light blue) to SAP HANA Cloud and – in this example – in parallel upload picture data and the device raw data to a hyperscaler object store.

SAP Analytic Cloud (in the middle) is leveraged by business user and business analysts to access the data with great visualization capabilities. A business user can drill down to display the pictures taken by the quality machines.

Customer Example

Osram Conti is leveraging this set up to support their intelligent plant initiative.

Here is a video where the customer talks about the project: (59) Learn How OSRAM Continental Created an Intelligent Plant – YouTube

Details on the implementation of the aligned data model are explained at from 16:44 (https://youtu.be/S4GtUBGnd10?t=1004)

Before spending money to implement an IoT-solution

  • to capture IoT-data
  • and to store these in a data lake,

in order

  • to make these available for data scientists
  • or for enabling manufacturing analytics,

you should run pilot projects on data from pilot machines,

  • to find out which data adds the biggest value to the business
  • and/or is relevant as input for machine learning models.

in order to build the business case to invest in

  • IoT devices
  • an IoT-Solution
  • data cleaning efforts for relevant data.

Further I propose that for use cases which require the combination of data from IT-Systems (i.e. SAP ERP, SAP S/4 HANA) and OT (Operational Technology data from the shop floor) an integrated data management approach is to be preferred to an data lake only approach. A combination of both approaches might be considered.

SAP Solutions support the best practices I proposed in this blog.

  • Develop analytic dashboards and/or run machine learning on a relevant set of data from pilot machines and build the business case for data capture
  • Develop an aligned data model combining business and ‘other’ data
  • Store the data based on the aligned data model and leverage virtualization technologies
  • Provide analytical dashboards and/or run machine learning models on aligned data model

Overall, when starting or re-staring an Industry 4.0 initiative it is important to start with a high-level determination of the most promising use cases, and then start pilot projects in line with what I outlined above.

For further question I encourage you to lean on the SAP community for support. I.e. here you can find questions and answers raised in the context of big data: i.e. Big Data

I would love to get your feedback and insights in a comment!!