EDUCAÇÃO E TECNOLOGIA

SAP BTP Integration Suite – Scrape and visualize data as you please

This pandemic is getting annoying: no more social contacts, no more shops, no more restaurants, no more sport. Wait: actually our local government (Hessen) has allowed Fitness club to stay open under strict rules, one of them being that only one person per 40m2 is allowed in their premises. That actually means, that the fitness club is regularly “full”. Because I don’t like to wait, I wanted to see when exactly the most “free slots” are available. The information about free slots is available online, over a standard HTML page.

But after all, this is just another REST API, just in HTML format, so I decided to scrape that information in order to persist and analyse it. I could have chosen a Hana Database, Grafana, Azure PostgreSQL, etc., but I wanted to build something “quick&dirty” since the pandemic should be over soon (or we will soon be forbidden entrance to fitness clubs).

So I decided to use the Cloud Integration service to regularly scrape the number of free slots from the HTML page and the push it to a Google sheet using Open Connectors.

In this blog, I will describe this process in detail, assuming that you know the SAP Integration Suite a little bit already. Also, note that you can implement this use case using the Integration Suite Trial.

Note that this is just a “quick and dirty” demo, but shows off how easy it is to use the Integration Suite to also support the needs of new apps built in an agile way.

Create a Google sheet

First of all, please create a spreadsheet using the account of your choice in Google sheets.

Two things need to be written down after you have created the Google spreadsheet: it’s ID and (optionally) the name of the worksheet. You can get this over the URL and through the Google UI.

Your Google spreadsheet gets saved automatically, so nothing more needs to be done here.

Create the Google Open Connector instance

In regards to the SAP BTP Integration Suite, let’s start with the easiest part: creating an Open Connector instance for Google Sheets.

To do so, open up your Integration Suite splash-screen and click on “Open Connectors” (if it’s not available yet, activate it in the “capabilities”).

In your Open Connectors, create a new authenticated instance of Google Sheet v4:

  1. Go to Connectors
  2. Search for Google Open Connectors
  3. Hover with your mouse over the Google Sheets V4 Open Connector and click on the “Authenticate” link

Name your Open Connector as you want and start the creation of the instance by logging in to Google with your preferred user.

At the end of the instance creation, navigate to the API docs of the Open Connector instance.

Here you can discover and test all the resources of the Open Connector instance.

I personally like to use that central documentation because you can test und understand the API calls before you implement them in Cloud Integration.

What we actually want to do here, is to insert rows into a spreadsheet, containing the current date and time, as well as the amount of “free spaces” (grabbed from the HTML page). So the API Call we will do, is a POST of a new row.

Locate the “/spreadsheets/{id}/worksheets/{worksheetName}/rows” resource.

  1. Open it by clicking on the “POST” method and click on the “try out” button
  2. Insert the ID of your spreadsheet in “ID” parameter.
  3. Insert the name of your worksheet (if you have not renamed it, it should be “Sheet1”) in the “worksheetName parameter.
  4. Insert the following text in the “RowBody” parameter:
{"values": ["22.04-12:50", 0]}

All this should like like that on your screen:

Now scroll down and click the “Execute” button.

As you can see, we have a “200” response and if you check in your spreadsheet, a row was added.

We will now move to the Cloud Integration part.

Before we do so, either copy the CURL request from the Open Connectors documentation, or leave the tab open for the coming steps.

Create the Google Sheet Open Connector credentials

As you may know, you must centrally define credentials in the Cloud Integration service. The same applies to Open Connector credentials.

To do so, navigate to the Monitor and Security Material section of your Cloud Integration tenant.

Now click on Create and User Credentials.

This is where you need to be a little careful.

  1. In the credentials creation window, select OpenConnectors as Type.
  2. Enter the name and the description as you want. In my case I have used GoogleSheet_CREDS and Credentials for Google sheet open connector.
  3. Now get ready to copy&paste: use the CURL command from before to copy and paste the three values in the fields User, Organisation and Element.
  4. Deploy the credentials.

It may be a good idea to remember the credential name for later.

Create and configure the integration flow

Now comes the fun part: dragging and dropping predefined flow steps 🙂

Navigate to your Cloud Integration service and create a new integration flow in a package of your choice.

Open the integration flow and go into the Edit mode.

Now, delete the Sender object as well as the Start element.

We will now insert a start timer to start the integration flow on a regular basis.

  1. Click on the Event icon in the menu.
  2. Select the Timer start event and drop it in the flow.

Now configure the timer to start the integration flow on a regular basis, eg. every 10 minutes from 8:00 until 22:00.

Once this is done, add a “Request-Reply” flow step between the start and end of your iflow.

  1. Click on the Call icon in the menu and select External Call.
  2. Click on Request-Reply and drop it on the flow.

With the connector tool, connect the start with the request-reply step, and the request-reply step to the end.

Add a Receiver to your iflow, under the integration flow.

Again, using the connector tool, connect the request-reply to the receiver. Chose the HTTP adapter when doing so since this is the call to the HTML page.

In the properties define the HTTP call as follows:

  1. Address: https://www.venicebeach-fitness.de/clubs/lifestyle-fitness-plus/heppenheim.html
  2. Method: GET
  3. Authentication: None

Note that this is where you enter the URL of any website you want to get information from – in this case it’s the URL of a local fitness club.

Also: you may need to import the SSL certificate from the website into your Cloud Integration tenant, which is done by downloading it from your browser and importing it in the security artefacts of Cloud Integration.

Let’s now get the information we are interested in from the HTML page. To do so, I’ll simply use standard string functionalities of Groovy. No doubt that you can do better than that: my purpose is just to illustrate a simple use case of the integration suite, not to explain Groovy best-practices in detail 😉

Add a Groovy Script step into your integration flow from the Transformation / Script menu.

Position it after the request-reply step and beautify the flow.

Now add a new script using the contextual menu of your script flow step.

Paste the following code in the script you just generated.

In this script, we are looking a specific string (“sind noch <strong>”) from the HTML. Once we found it, we know that the amount of free spaces is located just after it so we store the location (“index”) of the character.

Then, we create the JSON payload (we could have used JsonBuilder but creating a string was quicker) that the Google Sheet Open Connector is expecting, using the current day and time, as well as the substring located between the indexes calculated upfront.

I also logged the information just to make sure everything runs fine – but this is optional.

At the end of the script, we write the JSON payload to the “message” of the integration flow, which will then be sent to the Receiver (Google Sheet).

import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap; def Message processData(Message message) { // String to find: "In diesem Studio sind noch <strong>27 Plätze</strong> frei." def String payload = message.getBody(java.lang.String); def indexOfPlaetze = payload.indexOf('sind noch <strong>'); indexOfPlaetze = indexOfPlaetze + 18; def indexOfPlaetzeEnd = indexOfPlaetze + 1; def String currentDateTime = new Date().format( 'dd.MM-HH:mm',TimeZone.getTimeZone('CET')); def String plaetze = payload[indexOfPlaetze..indexOfPlaetzeEnd]; // Build payload for Google Sheets: {"values": ["22042021-0954","22"]} def OCNBody = "{\"values\": [\"" + currentDateTime + "\", " + plaetze + "]}"; message.setBody(OCNBody); //Logging def messageLog = messageLogFactory.getMessageLog(message); messageLog.addAttachmentAsString("OCN body", OCNBody, "text/html"); return message;
}

Note that we may have used other libraries to parse the HTML file based on the DOM: but again, we are doing it quick and dirty.

Also: if you want to be a bit more proficient with writing your code, you can use the GroovyIDE tool. It lets you write and test your groovy code online, hence you don’t need to deploy your integration flow to test and debug. Very useful.

Last step: send the payload to Google Sheet. To do so, create a connection between the end event flow step and the Receiver system.

When prompted, select “OpenConnector” as adapter type.

In the properties of the adapter, enter the following details:

  1. The API URL from your CURL command until “api-v2”, eg.: https://api.openconnectors.trial.eu10.ext.hana.ondemand.com/elements/api-v2
  2. Credential Name: is the name of the credentials you have created at the very beginning, eg.: GoogleSheet_CREDS
  3. Resource: from the API URL from your CURL command, copy the part of the path starting with “/spreadsheets”, eg.: /spreadsheets/1yG7sh_P886BpPlCH8T3cJwwgGnHTcLV27s_HapDfh3o/worksheets/Data/rows
  4. Method: POST

One last technical thing: we are sending 2 HTTP headers twice which is confusing the Open Connector Adapter. Hence we need to remove these before hitting the OpenConnectot instance.

This is easily done through a Content Modifier. In the integration flow menu, click on the Transform and Content Modifier icon.

Position the content modifier just after the script step and beautify your flow.

In the content modifier properties, click on the Message Header tab.

Now add two headers to be deleted: Content-length and Transfer-encoding.

Your integration flow should look something like this:

That’s it! Now Deploy your integration flow. As soon as it is deployed, it will magically fill in your sheet!

Now it is up to you to build your graphical visualisation and to take care of emptying/extending the Google Spreadsheet, but I hope you got the main idea!

Stay safe!