What is a good Machine Learning problem

Are you exploring whether you should use machine learning to solve your own business’ problem?  Are you wondering whether automation is a machine learning problem? Looking for where to get started to leverage SAP’s Artificial Intelligence products?

What is Machine Learning?

At this point, you probably have heard about the terms Artificial Intelligence (AI) and Machine Learning (ML) a lot.  If you are not deep into the AI/ML technical world, chances are you may hear people using AI and ML interchangeably. Furthermore, you might hear people talk about algorithms and models, and people also tend to use these words interchangeably.   At this point, you feel confused about all these fancy terms.

If you watch Netflix, you likely will find the Netflix application or website recommends lots of videos that fit your appetite, and it does the job better and better over time.  If you use a smart phone, you could talk to Siri or Google Assistant to get some questions answered.  If you drive, you most likely have heard about self-driving cars.  These are all AI, simulation of human intelligence in machines that are programmed to “think” like humans and mimic their actions.

ML is a subfield of AI and is one way to use AI.  The key word here is learning, which happens with lots of data.  Text, images and numbers are all data. In business, sales orders, invoices, customer’s written or oral feedback about products and services are all data.  When you load lots of data to a computer program and choose a model to fit the data, the program will come up with some prediction.  A prediction can be, for example, how do my sales look like in the next month, how many of my customers are likely going churn, is this service ticket an IT ticket or an incoming invoice that needs to be processed?

We mentioned model in the context of ML.  To understand what a model is, we need to understand what an algorithm is too.   Algorithm is about how you solve a problem, for example, how you sort 10 books on your bookshelf, then how you quickly locate the book you want to read.   In machine learning, algorithms “learn” from data, meaning they perform pattern recognition on data.  A model is the output of a machine learning algorithm on data. In an extremely simplified way, an algorithm applied to data generates a model that can be saved as a ”thing”.

Figure 1. What is a machine learning model

Rules vs Machine Learning

You may still wonder what exactly does learning mean here.  It helps if we compare rules with learning to better understand them.

Here is the scenario.  Your company buys materials from vendors.  On daily basis you get many purchase orders (POs) and invoices to process.  You want your software system to scale up the processing in a smart way.  To outsiders, it could be challenging to differentiate POs from invoices, because they both include basic order details, shipping information, vendor details and price.   However, you as a business insider know they are different.  You know that “If the document is sent by a buyer to a vendor for tracking and controlling the purchasing, then it is a PO.  If the document is sent by a vendor to a buyer as an official payment request, then it is an invoice.”   When you are using “if…then…” logic to make decisions, you are using rules.

Machine learning doesn’t use these rules at all.  Instead, you feed a machine learning model with data.   In this example, the data is examples of POs and invoices.  For simplification purpose, assume you don’t have other types of documents other than POs and invoices. To differentiate them, you label document 1 as PO, document 2 as invoice, document 3 as PO, etc.   Because what you are looking for is a categorical result, you should use a model that has classification algorithm in it.  First, the model learns by “looking at” many labeled examples.  Then, when you feed the model with a piece of new data (a new PO or invoice it hasn’t seen before), it should be able to tell if this is a PO or an invoice.

In some cases, a problem can be solved either by rule-based system or by machine learning based system.  In many other cases, you will find it very hard to describe things by rules, then you need to consider machine learning.  We will come back to this point with more details.

A rule-based system is a simple kind of AI.  It requires a set of facts, a set of rules, and it has an inference engine.  Using the same example we discussed,

A PO meets the following rules…these are facts I have about this new document …therefore, this new document is (or is not) a PO.

Figure 2. Example of rule-based problem-solving

A machine learning based system is more advanced and is more dynamic.  It also requires a set of facts (a much larger set), it doesn’t require a set of rules (domain expertise), then it runs an algorithm on the facts (data) to train a model.  You can use this model to do the prediction.

There is plie of mixed business documents (POs and invoices) on my desk. I learned from their labels, so I know POs have this pattern, and invoices have a different pattern…now I am looking at a new business document, I decide to put it to the PO document bucket.

Figure 3. Example of ML-based problem-solving

So, when should you use rule-based problem-solving vs ML-based problem-solving?  Here is a summarized comparison.

Figure 4. When to use what problem-solving method

What is a good ML problem?

Before jumping into the actual problem solving, a good practice is slowing down a bit and ask yourself – “what is the problem I am trying to solve?”.  Making sure you think through this can help you save a lot of time at later stage.

For example, are you looking for a recommendation of products and services that are relevant to your personal preference or your own business?  Do you want to automate service ticket distribution and processing?

Here are the 3 steps you may want to go through to determine your ML approach,

  1. Distinguish automation vs learning

In business scenarios, we talk about automation a lot because we want to scale our business.  If a person can process 50 invoices a day, using automation may increase the number by 10x or more a day.  ML can help automate your business process, but not all automation problems are learning problems.  Choose ML only when your problem cannot be solved by a rule-based approach.

  1. Evaluate if you have the right data

If you believe ML is the right approach, then do you have the right data to solve the problem?  A “right” data here means the data contains the knowledge that is needed to solve the problem.  If you want your system to smartly distinguish POs vs invoices, it won’t help if all you have is sales related emails or conversation records.

  1. Choose the right ML solution

When you have determined that ML is the right choice and you know you have the right data, then you can start building a workable ML solution.

Firstly, you need to have a clear goal, e.g., I want a model that predicts the sales volume of my bikes next month.  Secondly, you need to have a business outcome in mind, e.g. I can better stock my product inventory, and by “better”, you should define metrics to evaluate it.  Thirdly, you need to be able to choose the right ML solution by looking at the type of output you want. Is it a categorical value (good vs bad), a number (price?) or a grouping pattern?

There are a lot more details regarding how to build ML solutions. We can address them in separate blogs or through customer projects.

What SAP can offer you to solve ML problems?

In this blog, we discussed what is machine learning, the difference between rule-based problem-solving and ML-based problem-solving, and how to determine if you should use ML to solve your problems.

Now, how can SAP help you solve machine learning problems?

SAP has a big AI product portfolio, and ML capabilities are included in these products.  They can be consumed as an embedded service, an API endpoint, a low-code or no-code ML pipeline, etc. For example, SAP Data Intelligence is a data management tool, but it supports machine learning scenarios.  You can create graphic pipelines to run ML algorithms on your data or write your own Python code in the embedded Jupyter Labs environment.

Under the AI portfolio, there are a variety of AI Business Services and Conversational AI that can solve your ML problems.  Here is a good overview of SAP’s AI portfolio that introduces what kind of problems each AI product can solve for you.

With all the above, if you are an existing BTP customer and you’d like to know how to get started with AI/ML, hear more about AI/ML best practices, or review your solution design or run experimental projects with us, feel free to reach out to me and my team.