As mentioned in my Data Fabric blog, the data and analytics landscape in companies become more and more distributed and decentralized concepts as Data Mesh are on the rise. If we see data as an asset, as the new oil or want to make our company data driven, tools to get an overview are more and more important.
Data Catalogs could be the kind of software delivering what is needed here.
A from my perspective good definition of Data Catalog is from Gartner:
“A data catalog maintains an inventory of data assets through the discovery, description and organization of datasets. The catalog provides context to enable data analysts, data scientists, data stewards and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.”
– Gartner, 2017
Now if you screen the market you find a lot of – often marketing-driven – terms, concepts and meanings here. Some speake about Data Discovery as a main task to gain value from (distributed) data. Lately I have seen a study challenging the term Data Intelligence, what is seen from some vendors as an advancement of Data Catalogs.
In practice introducing a data catalog in a company can be a high effort depending on sources, complexity of data assets, people skilling up and bring into new roles (Data Owner, Data Stewardship, …), processes for data validation and data protection and so on.
A lot of companies start just with a spreadsheet or tools like MS OneNote to provide information about there data and analytics assets in the company. And this could be sufficient depending on your complexity and organization.
On the other hand we have solution specific metadata handling and search capabilities and specialiced solutions. If we have a look into the SAP specific portfolio we find different approaches from an (non-technical) user perspective:
Fig. 1: Overview of end user metadata usage in SAP’s Data & Analytics solutions
For sure there are further solutions working strongly with metadata like SAP Information Steward or SAP Power Designer. They are relevant in this context but more bound to a specific group of users like Data Stewards or Data Modelers.
Maybe you know SAP communicated new developments integrated into SAP’s cloud-based Unified Data & Analytics Solution Portfolio. Currently there is some discussion about project Data Suite to redesign the full portfolio and break it down into a more flexible, service-based offering. Maybe on effect we could see is the usage of Data Flow capabilities from SAP Data Intelligence into SAP Data Warehouse Cloud. But not much further information is public available here. If we have a look on
- SAP Analytics Cloud
- SAP Data Warehouse Cloud
- SAP HANA Cloud
- SAP Data Intelligence Cloud
all these solutions have own approaches, terms and concepts to handle data assets today. Therefore to bring that together for a unified management with a streamlined Data Catalog-approach could be interesting. The last information I found is:
Fig. 2: ONE Catalog Vision of SAP
From time to time I have a look into the roadmap for solutions. You can found that for the current Data Catalog offering within SAP Data Intelligence (DI) there is not much on the roadmap but the tighter integration of capabilities with DWC, SAC and S/4HANA. But my understanding is, DI Data Catalog capabilities will be transfered to the new Data Catalog solution over time. The roadmap announces the Business Data Catalog based on SAP Data Warehouse Cloud:
Fig. 3: SAP road map for Data Cataloging capabilities in SAP Data Warehouse Cloud
This means for 2023 we can look forward for new features, functionalities and the chance of a better usage of our data assets in an overarching approach using a new Data Catalog solution. This approach seems to focus on SAP solutions especially those in the cloud. But let’s see what comes.
As companies not only have SAP systems and there are a lot of specific Data Catalog products out there, harvesting data from also a lot of non-SAP systems. SAP is not going into a blue ocean. What SAP ever did best was to integrate SAP with SAP. The question is, will this be enough? Are there approaches like leveraging add-ons and partnerships like SAP already does with BigID or is possibly a concept of Catalog of Catalogs via open interfaces a doable way?
What is your perspective? What is your approach of keeping an overview of your data and analytics assets in your company to create value from data?