Predicting bad SAP performance (Part 2)

Now that I have made the decision to predict bad SAP performance, I need to specify better what I mean by “bad SAP performance”.

End users frequently complain about specific transactions or even whole systems being unresponsive. Often it gets difficult to pinpoint the root cause. In my project here, I want to focus on a global performance issue. So virtually all users are likely to experience them, and in any transaction or workflow.

Maybe it helps to identify the extreme cases SAP systems can be in. A SAP system could be:

  • completely idle
  • perfectly busy
  • completely stuck

In reality, it will be somewhere in between these extremes, so I can draw a triangle:

There are two dimensions here: the workload/throughput and the system health.Let’s have a closer look at how SAP systems often behave in case you continue increase the workload well beyond what the SAP system sizing anticipated. This gives an idealized path of how the state of a SAP system will change:

When a SAP system is started, and now batch jobs are scheduled and no users are logged on, it is at the bottom left, or completely idle. With users logging on or batchjobs starting to run, the workload increases. At the beginning, there is linear scaling. Each additional task simply gets executed like the others before. However, if the workload increases too much, then a SAP system will inevitably hit one or more bottlenecks. Somewhere around this point the throughput reaches a peak, ideally close to the point at the peak of the triangle. If the workload increases above that point, then the performance degrades and the overhead mounts quickly. The SAP system will be more and more busy with unproductive self-administration. This could get so bad that a system gets completely stuck and we reach the point at the bottom right.

Of course, the extreme cases will hardly ever be reached, and any SAP system is located somewhere inside the triangle. There will be mixtures, like for example the batch jobs are idle, the dialog work processes are all hanging due to some deadlock and ICM tasks reach an optimal throughput.

So coming back to my question on predicting bad performance: I want to train a machine learning model that can predict if the SAP performance will move from the “green” area to the “red” area in the foreseeable future. A system being busy with self-administration will show a general performance issue, which gets noticed by (almost) all of its users.