How to notice and handle faulty CPI iFlow states automatically?

Dear community,

some of you are already using my proposed architecture to externally monitor iFlow executions and messages with Azure Workbooks by forwarding the Message Processing Log. But what if your iFlow or CPI tenant fails altogether? In case you rely on regular CPI Cockpit check-ins by your Admins it can take days or require a customer call because the integration does not send any data anymore to make you aware.

SAP recovers your BTP and CPI tenant after an outage but your iFlows can remain in stopped state.

How can we do better?

  1. Implement high availability with multiple CPI tenants using my other CPI blog series on this topic
  2. Monitor your CPI iFlow state externally using the CPI REST API. Multiple tools offer such an integration. Have a look at the list here to get an overview. For SAP native you might want to look at SAP Solution Manager.

Since I have talked about the first entry at great length already and some of you don’t want the extra effort and cost of having additional instances of CPI, let’s have a look at a simpler solution. The orchestrator for my example will be a low-code implementation with Azure LogicApps.

Fig. 1 Architecture overview

This way you get notified pro-actively, rather than counting on lucky “catches” during Admin Cockpit check-ins.

Option 2a shows a very basic notification path, that posts an error message in your shared CPI Admin mailbox or shared Teams Channel. This way you make sure you reach someone as quickly as possible. If you want to go all in, you may even provide an adaptive card in Teams or an Actionable Message in Outlook, that allows immediate re-deploy of the problematic iFlow.

I wouldn’t recommend that without additional checks to avoid integration mismatches though. At least make sure that your integration target can deal with duplicate messages and “broken” delivery chains.

Option 2b anticipates logging all iFlow states to notice patterns over time as well as a detailed view on the state of individual iFlows. This goes beyond simple outages. My provided Workbook does not contain a visual for the iFlow state yet. Would you like me to add it? Or would you choose Option 2a in any case?

The flow kicks off with a simple scheduling trigger that runs every hour. Based on your individual needs you might lower or increase that time window. The scope boundary groups a set of Actions to enable me to act upon connection errors (e.g., when BTP, CPI or the REST API is down completely).

Fig.2 Screenshot of LogicApp start config

To be able to leverage the CPI REST API, we need to create an CPI runtime instance with plan “api”. We finish the configuration by creating a service key and supplying the token URL and credentials to the http call on the LogicApp.

Fig.3 Screenshot of CPI runtime and service key config

With that OAuth2 bearer token we can finally call the CPI REST API “IntegrationRuntimeArtifacts“.

Fig.4 Screenshot of CPI REST API call

I force the response format to be JSON to process efficiently in subsequent LogicApp actions. The response contains an array with all your deployed iFlows and their state. Have a look at the mentioned API spec for details.

Next, you need to decide if you want to act on all states or limit to STOPPED, ERROR or the likes only. For my test I put everything except normal operation state “STARTED”. If you don’t have any experience with your states yet, I would recommend that.

Fig.5 Screenshot of condition

The for-each-loop contains the json payload for each iFlow of the response array and gets fed into the “Send Data” action (Azure Log Analytics Data Collector).

Fig.6 Screenshot of CPI API example response

And finally forwarded to the CPI admin channel on Teams. Add Outlook action in case you prefer shared mailboxes. I would recommend a messaging client like Teams for the push notifications, because they are often being utilised and monitored more actively compared to email.

As mentioned before the scope boundary ensures any API connection errors get handled as well and forwarded to the same receivers.

Fig.7 Screenshot of Teams action to handle API errors

That’s it! You just enabled external monitoring for your iFlows and increased your integration uptime, due to shorter windows of “stopped” iFlows for instance. 😊

Not too bad, huh? Today we saw why external monitoring of your CPI tenant and iFlows is worth considering and how to come up with an easy logic to handle outages. If you feel brave and understand your iFlows very well, you might even offer integration to re-start stopped iFlows after an outage directly from within Team or Outlook.

Any further inputs from you @Developers, @Admins and @Architects?

Find the related GitHub repos here.

As always feel free to ask lots of follow-up questions.

Best Regards

Martin