Summary. In this article of the series “Give Data Purpose Weekly” I explore the question of whether “Data Mesh” is just another buzzword or whether it is a data strategy trend that needs to be taken seriously. Additionally I outline SAP’s position on the topic, embedded in a larger context of value creation from data.
Companies of all sizes and industries are currently deeply alerted by the topic of “Distributed Data Mesh”. Established brands, backed by global corporations with centuries-old company histories, are just as much a part of this as start-ups. The common assumption that only companies whose existence began with a different understanding of the value creation potential of data are interested in Data Mesh is in my opinion therefore incorrect. Representative and serious surveys on the acceptance of Data Mesh are not yet available, so this reflects my experience out of many customer conversations about Data Strategy.
A deeper examination of the concepts behind the Data Mesh approach quickly leads to the insight that the topic is primarily to be classified as a socio-technological one. A significant part of implementation success depends on appropriate organizational measures such as ownership of data domains or the existence of an enterprise-wide data strategy. Above all, the lack of agility and the lack of coordination between producers and consumers in the data value chain should be addressed. According to the theory of Data Mesh, this is mainly due to inherent design errors in centralized approaches such as Data Lake, Data Warehouse or Data Lakehouse.
The following list contains some of these weaknesses, but does not claim to be complete:
- Lack of domain knowledge and resource bottlenecks in centrally organized data management / engineering teams
- A lack of communication between producers and consumers, which has the direct consequence that data artifacts are not only produced too late, but also of questionable quality and often do not meet expectations
Whether Data Mesh can compensate for the deficits mentioned will become apparent when companies that have implemented their first data products can take stock. Not every company has the level of maturity that is needed to jump into “Data Mesh.” In this respect, a traditional centralized approach can be optimal to take the first steps on the way to becoming a data-driven intelligent sustainable enterprise. From my point of view, skipping this first stage of evolution is not recommended, since the elementary requirements of a deeply rooted data culture are missing. It is therefore advisable to start with a determination of the degree of maturity with regard to the ability to derive added business value from data in order to set the right course. In addition, a thorough check should be carried out to determine whether the complexity in data provisioning for which Data Mesh is intended is present at all.
When it comes to give data purpose, SAP has always been closely aligning technology with data strategy aspects such as data democratization, data competency, data culture, data literacy and so on. From our point of view, data management can only make a significant contribution if it is viewed in relation to an enterprise’s respective visions, business objectives, and initiatives, thus generating the necessary corresponding interest of management.
In our #GiveDataPurpose Trusted Data workshops, we therefore work closely with customers to emphasize on this correlation leveraging “Design Thinking” elements to derive a modern data management architecture. We are guided by the award-winning “SAP Outcome Driven Data Strategy” methodology, which leads specialist departments and IT to a deeper understanding of the interaction between responsibilities, ownership, accountability, cultural aspects and technology.
With the experience out of many workshops, we see that there is still a great need for information when it comes to the organizational and technical design of roles, responsibilities and data management capabilities in companies of all sizes and branches. Even at those with a very high level of maturity that are already on their way to managing data as a product, I notice great uncertainty about the right strategy. One customer for example reported isolated efforts to describe data products in great detail to make them more consumable. This is undoubtedly a piece of the puzzle and a fundamental part of the Data Mesh approach, but metadata alone is far from being the only one In my opinion, it is important to keep an eye on balancing all elements based on technology, resources and objectives. It is important to support all roles involved in the data value creation process with suitable, integrated and intelligent tools.
Our modern Business Technology Platform for the intelligent sustainable enterprise brings all the capabilities to turn data into business value. Data Mesh and other current trends in data management such as Data Fabric, DataOps or Cloud Data Ecosystems etc. are supported as well as already established approaches.
We summarize the platform services that are particularly relevant for the data value chain as SAP Intelligent data mesh, as illustrated in the following graphic:
The four basic principles of Data Mesh and their benefits are briefly described below in order to map the capabilities shown in Figure 1 to them. Since a full discussion would go beyond the scope of this article, I can only cite what I consider to be the most important points as examples. In case you wish a more complete technology mapping, please contact us directly.
Domain-oriented decentralised data ownership and architecture
The postulated decentralization of responsibilities and data ownership is roughly oriented by the lines of business, such as sales, purchasing, finance, etc.. This creates teams that work independently and take full end-to-end ownership of their domain data. They are responsible for both the operational source data and the analytical endpoints, as well as all measures for their creation. The underlying architecture must ensure differentiation from other domains in order to enable high agility while at the same time creating the possibility of sharing data products and ensuring the necessary scalability.
The “SPACES” concept of SAP Data Warehouse Cloud supports this principle by assigning domain teams to self-contained, separate areas in which they can work independently on their data products. The resulting artefacts can be shared with other teams to enable cross-domain collaboration.
The principle of “domain ownership” also includes the requirement of coordinating semantic domain knowledge with data management expertise. Domain knowledge about customers, suppliers, employees, financial frameworks, materials, supply chains, etc. has long been managed by our customers in operative SAP business systems and ensures smooth processes. We developed an universal data model called “One Domain Model” which helps our customers to provide trusted master data and to integrate non-SAP and SAP applications.
Self-service modeling and the semantic layer of SAP Data Warehouse Cloud make this knowledge usable with regard to the later data product. In SAP business systems such as S/4HANA, domains such as customer, supplier or material consist of a referential network of many individual tables. Reusability of the inherent business semantics in SAP Data Warehouse Cloud reduces the effort required to generate data products and to achieve the desired agility.
Data as a Product
Among other things, this principle describes properties of endpoints such as findability, comprehensibility, addressability, security, interoperability, trustworthiness, accessibility and value at its own. In this context, the Data Marketplace of SAP Data Warehouse Cloud should be mentioned as an example to make data products available for both internal and external data exchange.
Since, in theory, a data product should contain both the code for the purpose of creation and of access together with metadata and data plus the infrastructure, further capabilities are needed. Pipelines that are used to integrate and transform data from operational systems are part of SAP Data Intelligence Cloud Orchestration and Processing. Another is the Data Quality functionality of SAP Data Intelligence Cloud, which can be used to measure and improve data quality.
Self-serve data infrastructure as a platform
In order to achieve the underlying goal of minimizing the development time of data products, the infrastructure needs to be highly abstracted from its use. Technical details such as different interfaces, protocols, etc. shall be kept away from the data specialists of the domain teams. Conversely, platform specialists should not need to have any domain knowledge, since they would first have to laboriously acquire it. Data products are both developed and run on the same platform. The optimal toolbox for data product developers and managers therefore includes a comprehensive list of capabilities, from which I would like to take the already mentioned ability to implement and orchestrate data pipelines as an example, without forgetting to mention others such as metadata management, data cataloging, business glossary and data lineage.
If artificial intelligence is to be used while building data products, the required models can either be used from AI solutions on SAP Business Technology Platform or developed in the preferred programming language and embedded. To reduce the time required for data and feature engineering, the individual steps of every preparation are recorded for reuse.
Federated computational governance
In order to enable the interoperability of self-sufficient data products, there is a need for comprehensive standardization based on general rules that apply to both data products and their interfaces. These are created through collaboration of the decentralized domain and central platform teams with regard to a uniform and cost-effective use of the data management infrastructure. The interoperability of the entire ecosystem must be ensured in order to be able to combine data products.
Following this principle, topics such as data privacy, protection and governance, which where traditionally managed centrally, move into the domain teams. They now have overall responsibility for their data products, including the protection of personal identifiable information. SAP HANA Cloud provides state-of-the-art technology for domain-specific protective measures such as anonymization and thus also ensures legally compliant use of data products.. An example is generalization (k-anonymity) with which, for example, customers are grouped together to prevent a human being from being identified by generalizing quasi-identifying attributes like age, gender, address etc.
Differential privacy adds mathematical noise to obfuscate personal identifiable information such as salary or sales proceeds at a granular single record level. Statistical functions applied to such alienated data produce the same result as an application to the original data, with a small deviation. This functionality is the prerequisite for being able to train mathematical models in compliance with data protection in the first place, despite personal data.. With approximately the same prediction probability, the models show a higher immunity to targeted attacks.
In summary, a technology platform that meets all of the requirements that arise from Data Mesh offers significantly higher prospects for a successful implementation than a collection of specialized patch-work products. Uniform architecture and technology standards that apply to data products can only be implemented, adhered to and monitored with a platform that is integrated from the operational source to the analytical layer. Otherwise, the resulting fragmentation will lead back to exactly the deficits that we actually want to eradicate with Data Mesh.
Also Data Mesh projects cannot avoid being subjected to economic feasibility studies in order to be able to assert themselves against other promising projects in the enterprise. Reusing semantic business knowledge for the construction of data products makes a significant difference here, both in terms of costs as well as with regard to the desired higher agility and scalability.
Due to the high popularity and demand, we have expanded the GiveDataPurpose Trusted Data Workshop mentioned above and are pleased to be able to offer it with a focus on Data Mesh with immediate effect.
Interest? Then contact us right away.