SAP Data Warehouse Cloud and Google Big Query with large datasets

For customers that have an existing (or starting a) footprint of Google BigQuery for data analytics and requiring to blend these datasets with SAP data – the SAP Data Warehouse Cloud provides an ability to cater for the large BigQuery datasets and also maintaining the SAP data within the SAP Data Warehouse cloud without having to move the data.

The process we went through to test this, using a public dataset was in summary:

  1. In Google Cloud Cockpit, created a large Data Set and generated a key for external access
  2. In SAP Data Warehouse Cloud, connected to the BigQuery dataset, and accessed the objects
  3. In SAP Data Warehouse Cloud created a view over BigQuery datasets with a parameter to connect larger datasets in BigQuery

In some more details…

  1. Access BigQuery, and found a decent public dataset as an example. I have had some good history with GDELT as a source, and found some data that was of reasonable size. So copying it to our local project using the BigQuery Data Transfer:

Google%20Big%20Query%20Data%20Transfer

Google Big Query Data Transfer

The dataset I used, had the data set size of 170.39Gb:

With a schema:

As some preparation work for accessing the data in DWC;

  • Created the  BigQuery Service Account and created a key

  • Generated the Service Account Key:

2. In Data Warehouse Cloud noting the BigQuery Key, BigQuery Project Name

Firstly, Import the Google Certificate (under System/Configuration)Google%20Big%20Query%20Cert

Create the  BigQuery connection, using the key generated previously:

Data%20Warehouse%20Cloud%20Connection

Data Warehouse Cloud Connection

Now the fun begins. Given we have roughly 170Gb of data in BigQuery, there is some learnings that can be shared about things we have tried:

  • BigQuery datasets can be fully replicated into DWC if the dataset is smaller

For a dataset of 2.6Gb in GBQ, here it is replicated in DWC at 1GB on Disk in DWC. The full replication took 16 mins to copy.

DWC%20Replication%20with%20Big%20Query

DWC Replication with BigQuery

  • Trying to fully replicate datasets larger than 10Gb fails with the error: Error:

{“origin”:”history”,”code”:”71000129″,”message”:”SQLException; REST API: requestType=BigQuery_FetchQuery, statusCode=403, msg=Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors”,”timestamp”:”2022-09-01 11:04:21.6113190″}

3. Applying a filter to access more manageable datasets

  • For large data set, it was also taking too long to use the BigQuery remote table for querying and accessing the data. So a simple workaround is to apply the filters within Data Warehouse Cloud:

DWC%20View%20with%20Parameter

DWC View with Parameter

In our example, we used the location as the Parameter, as it was the easiest to sort by, and to test the results.

So some results:

Running the table scan limiting with 1 parameter directly in BigQuery gives:

Google%20Big%20Query%20results

BigQuery results

Initial Testing in SAP Data Warehouse Cloud:

The DWC resultset for the Data Preview (1000 record) in 11 seconds. This is with a dataset which hasn’t been cached and the total records is about 900,000 records.

A really quick SAP Analytics Cloud Story based on the SAP Data Warehouse Cloud live connection:

SAP%20Analytics%20Cloud%20Story

SAP Analytics Cloud Story based on BigQuery data

Running the same query in SAP Analytics Cloud via SAP Data Warehouse Cloud :

SAC%20Performance%20Analysis

SAC Performance Analysis

There is a lot more different enhancements that can be done, like dynamic filters based on the story context and actual integration with SAP data. But I just wanted to demonstrate that it’s quite easy to make BigQuery data accessible in SAP Data Warehouse Cloud without moving the data.