Import of multiple files into SAP HANA Cloud from a cloud storage (Amazon S3) using SAP Data Intelligence 3.1

I hope you haven’t missed the announcement of SAP Data Intelligence, trial edition 3.1 posted by Dimitri Vorobiev last week! Please do not miss as well exercises from SAP TechEd hands-on sessions that were published too. One of them describes steps to read and load data into SAP HANA.

In my post, I would like to share a slightly different approach: with no data, but control flowing between operators. I will use the same setup and scenario from my last post where I was scripting the import of multiple files into SAP HANA Cloud from S3 cloud storage.

In a nutshell, I want to do automate a load of multiple files stored in a single Amazon S3 bucket into the corresponding tables in the SAP HANA Cloud database. In my exercise, I work with 25 files generated for TPC-DS. Some of these files have significant sizes, but further optimization of their load is not in the scope for now.

The setup…

…is taken from the previous post, assuming the files are generated and stored in the S3 bucket, the PSE in the database is all configured, and we can use the same S3Reader user credentials (but do not try to copy/paste this key/secret, as I recreated them after the publication 🙂 ).

The graph…

… has at least 3 operators:

  1. The one to list all required files from the S3 bucket,
  2. A custom operator to build SQL statements to import data files + truncate, in case of reloads of this initial step to populate tables,
  3. HANA Client to execute SQL statements.

But I included two more operators to detect and terminate execution when the last IMPORT is done.

Btw, I used two different programming languages for custom operators — JavaScript and Python — not for any other reason than showing possibilities. The logic could (and in normal situations as well should) be done without multiplying variations in one solution.

List files

This operator works only with connections defined in the Connection Manager, so I had to configure the connection to the S3 bucket there first.

Please note RegEx-based patter in the filter. It would allow you to reload only selected files.

Enjoy the exploration of these trials!