Often I see confusion because people in IT and business discuss about tools and projects but not having a common understanding about what the data architecture in place is used for. Today we find a lot of different data architectures available like:
- Data Warehouse
- Data Lake
- Data Lakehouse
- Data Fabric
- Data Mesh
These are the basic design patterns which can be combined, multiplied and adapted in different ways. Furthermore around these basic patterns we have tools and solutions enhancing this architectures like Data Catalogs, Data Integration and Transformation, Data Governance and a lot more.
The Data Warehouse is a design pattern for managing enterprise data. Data architectures are getting more and more complex and it can be helpful to concentrate on the basics to better understand the value they deliver for enterprises and business users.
As the Data Warehouse is a concept with more than 30 years, it is still evolving and especially the cloud DWH providers like Snowflake, Databricks or the hyperscalers show dynamic innovations and many further software services like dbt or LookML are available to build modern Data Warehouses.
To simplify the discussion I drawed the following overview:
Picture 1: Generalized view of a Data Warehouse and it’s advantages
What we see is that a Data Warehouse is a central data platform consolidating structured data from different domains to deliver an harmonized and integrated view of data for historical data analysis. Data Warehouses evolved in many ways, so that today it is more than just a database delivering data in a performant way. Over the years we have seen better flexibility through features like virtual modeling, processing, streaming data processing or handling of semi-structured data (e. g. JSON) and serving as an virtual layer for federated analytical data sources.
As new technologies and design pattern arose, several times the dead of data warehouse was declared. But it is still here and I don’t see the end of need for data warehouses in the next time.
Coming back to SAP, I see it as the classical strong domain here in data management by SAP. This may be because SAP itself delivers solutions around business data, which creates typically structured data. But I remember well that in my own experience I alread implemented a combined reporting with unstructured text data and structured data based on a SAP BW 3.5 using SAP TREX (Text Retrival and Extraction) in 2006 and would say it worked well for this time. Today we see SAP HANA including multi-modal data processing capabilities like graph, text and documents (JSON) to extend the range of data types to create value with data.
Picture 2: Basic Data Warehouse approaches by SAP
Today we see four approaches from SAP to deliver a Data Warehouse-like experience:
- S/4HANA Embedded Analytics – Analytical Apps can make use of two different approaches within SAP S/4HANA.
- Virtual Data Models with ABAP Core Data Services – A way to create virtual analytical models by stacking views with dimensions and facts to be used by Fiori apps, SAP Analytics Cloud, BW query-like or in ABAP reports. As business logic and semantic is modeled virtual, this works like a virtual data warehouse for operational data. This approach is the successor of HANA Live which was based on HANA Information Views (Calculation Views nowaday).
- Embedded BW – An approach using the BW application as part of SAP NetWeaver beeing part of SAP ERP for years. The Embedded BW is still a BW 7.5 and content is in some cases based on HANA Calculation Views for actual data (e.g. based on ACDOCA), planning data (e.g. based on ACDOCP) or master data, too. Furthermore external data can be loaded (even if not recommended to a greater extend) and BW logic can be used to historize changes in data (slowly changing dimensions, bi-temporal modeling).
- SAP Data Warehouse Cloud – Just focusing on the application layer of SAP DWC we see two general layers.These modeling layers are enhanced by functionalities for governance like authorization, object search, data lineage and impact analysis and the capabilities to share data between Spaces (containers with own resources and users) and external data. Furthermore it is possible to extend this application-based modeling via an Open SQL Schema and accessing HANA Deployment Infrastructure (HDI) containers enabling to work on HANA database and Data Lake level.
- Data Layer – Focus on integration of different data sources and technical transformation work of data.
- Business Layer – Focus on business-oriented modeling, perspectives and semantic
- SAP BW/4HANA – As a rather technical solution it provides a packaged Data Warehouse solution, highly optimized for SAP data sources but open for every data. Leaveraging the LSA++ (Layered Scalable Architecture) it enables everything from a simplified Data Warehouse via Logical Data Warehouse approach to a full blown complex Data Warehouse solution delivering a lot of Services for modeling and operating an Enterprise Data Warehouse. BW is well now for it’s BI-Content delivering a lot of predefined metadata to quickstart implementation in a lot of different business domains. The range of capabilities can be extended by making use of mixed modeling with the underlying HANA database itself. Furthermore planning and consolidation capabilities can be added through the use of SAP Business Planning and Consolidation Embedded or BCS – Business Consolidation System (BCS/4HANA).
- SAP HANA SQL Data Warehouse – a complete open, database-based approach with freedom of modeling. In this context we often see modern modeling approaches and processes like Data Vault modeling and DevOps supporting an more agile style of modeling, operating the system and adapting to change in a faster way. This make sense as this approach is recommended for a broader range of non-SAP data sources like IoT and cloud data sources.
So we see a freedom of choice for the best fitting Data Warehouse approach with SAP – operational, cloud or on-premises, more or less SAP specific. We also see possibilites to extend these approaches to comprehensive data platforms by data tiering, extending with data lake technologies and make use of the multi-model capabilities based on the high performance of SAP HANA database which is in every case today the core technological component.
An interesting thing is, that in the SAP Data Warehouse world you can find same components enabling same or similar capabilities. So is every approach here based on SAP HANA which means also that Smart Data Access (SDA) and Smart Data Integration (SDI) is typically in place for the data integration part and HANA Calculation Views can be used in every of this approaches. SAP made even ABAP available for the cloud and make use of it for SAP Data Warehouse Cloud at least within BW Bridge. OData can be consumed or exposed on most of the solutions and if you want to extend your data management for mass data somehow you will find SAP IQ technology. Last but not least SAP Analytics Cloud shows naturally a good integration to all of them as the analytics frontend of this time in the SAP world.
As introduced, there is more to come as a Data Warehouse, highly optimized for Business Intelligence use cases, but possibly not solving every challenge your company face with data. But it has still his unremovable place within the companies system and data landscape.
For sure there would be more to say about Data Warehousing. I hope this perspective on Data Warehouse in the context of SAP helps to bring your conversations with other data guys on a better level and make it a little bit independent from the last trends (have you heard about Metrics Layer or Reverse ETL?), or product marketing, or that this UI is better than that, or this solution is the death of this solution and so on. Because all this is nice but typical what you need is creating value with your data and this works only if you focus on the basics from time to time.