Data Mesh: A Decentralized Approach to Data Architecture
Data mesh has sparked a lot of interest across data, IT and business professionals, and its emergence isn’t just a passing trend but a significant change in how organizations handle large volumes of data.
Traditional data setups, characterized by centralized data lakes and isolated data systems, often fall short in the face of increasing data volumes and evolving business demands.
Here’s where the data mesh paradigm steps in - a decentralized approach that promises to revolutionize data management.
However, with any fundamental change, many questions arise. In this case, the key one being: is it worth transitioning to a data mesh?
In this article, we explore why businesses are embracing the data mesh architecture and how it has the potential to transform our approach to data.
Unpacking Data Mesh
In the past, data was a relatively scarce resource, which justified the investment of time and money into centralizing it. Companies paid millions of dollars to consolidate all the data required for IT operations into a single, large-scale database in order to have a single source of truth.
With the big data boom, companies faced the challenge of managing and harmonizing exponentially growing data volumes. Their capacity to adapt swiftly and effectively to the rising influx of both internal and external data sources, and integrate them with their existing data, became limited.
As a result, companies struggled to extract meaningful insights from their data and identify new potential uses, such as developing innovative products or services for their customers. And they still do.
The data mesh concept attempts to address these challenges with its four main principles:
Decentralized data ownerships: Individual business units or domains have ownerships of the data they produce and use.
Data as a product: Data is treated as a product and is recognized as having its own lifecycle, quality standards, and consumer base.
Self-serve data infrastructure: All teams have access to a self-service infrastructure for data processing and storage. The team generating the data is considered the data owner and must prepare its data in such a way that other data consumers in the company can use it easily via self-service options.
Federated Governance: While data ownership is decentralized, data governance remains a shared responsibility. Federated governance highlights the need to balance autonomy with a coherent set of standards and practices.
The biggest advantage of this architecture is that the data-producing departments naturally know their data best. Hence, it is easier for them to derive insights from it and develop new use cases.
The role of the data scientists and engineers also changes: they are no longer acting as go-between for the data-producing and data-consuming teams, but they become part of the data-producing team.
They learn the domain knowledge necessary to support their team in the best possible way when preparing data products. This simplifies and speeds up the entire data preparation process.
What makes the data mesh so appealing is its decentralized approach, which plays a key role in promoting data democratization within organizations.
Imagine it as breaking down the barriers around data, allowing it to flow freely to all corners of a business. By distributing data ownership and expertise across different teams, data mesh enables individuals within an organization to access, understand, and use data effectively, regardless of their technical background. It transforms data from an exclusive resource reserved for experts into a shared asset accessible to everyone, creating a more informed, agile, and empowered workforce.
What You Need to Know About a Successful Data Mesh Implementation
Understanding the concept of decentralized data ownership lays the foundation for an organization to truly become data-driven. However, this transformation doesn't occur overnight. Adopting the principles of the data mesh paradigm at an organizational scale is a journey that may span months, or even years, and might appear overwhelming initially. Here, we share a few key considerations if you're contemplating a shift to a data mesh.
Understanding the Data Mesh Philosophy
The founder of data mesh architecture, Zhamak Dehghani, defined data mesh as a “decentralized sociotechnical approach to share, access, and manage analytical data.”
Data mesh itself is not a technology, architecture, operating model, or data governance model. It’s all these things, considering both social and technological aspects within organizations. There is no off-the-shelf software solution that you can purchase, deploy, and then instantly achieve a successful data mesh implementation. The implementation process is highly personalized and unique to each organization.
Data mesh is a foundational shift in how organizations perceive and manage data, and it is centered around humans. Changing the way people think about working with data is a gradual process. The best advice for initiating this change would be to start small and build from there.
Navigating Data Governance
The more decentralized the architecture, the more attention you need to pay to data governance. Without proper governance, there's a risk of data being poorly curated, inconsistent, or inaccurate. Data governance policies and procedures help ensure that data is of high quality, reliable, and trustworthy. Implementing a federated governance model can help ensure that individual domain guidelines are still aligned with broader organizational data principles, striking a balance between autonomy and standardization.
As data mesh promotes the idea of a catalog of available datasets, effective data governance is required to ensure this catalog is updated and maintained, so data consumers can find the right data when they need it.
Scaling the Architecture
When we talk about scalability in the context of a data mesh, we're primarily referring to the ability of the architecture to handle increasing volumes of data, users, and data domains without compromising performance, data quality, or manageability.
As we said previously, in a data mesh, each domain team is responsible for its data domain. As we add more teams and domains, the data mesh architecture must be able to accommodate the additional domains. This requires a flexible and extensible architecture that can adapt to new data domains without causing disruption.
And as data sources multiply, the data mesh must have scalable mechanisms for ingesting data from various sources. This might involve scaling data connectors, ETL processes, or data streaming pipelines to handle the increased data ingestion rate.
Does Adopting Data Mesh Makes Sense For My Organization?
With the key considerations of data mesh in mind, let's determine if making the switch to a data mesh approach is the right fit for your organization.
In her book on data mesh, Dehghani provides a framework with eight distinct areas for evaluating an organization's readiness to embrace data mesh; organizational complexity, data-oriented strategy, executive support, core data technology, early adoption, modern engineering, domain-oriented organization, and a long-term commitment.
While all of these aspects are important, we will focus on the areas where organizations tend to fall short and potentially disqualify themselves.
Organizational Complexity
Moving towards a completely decentralized model requires operating at a scale where decentralization makes sense. Large organizations, with thousands of employees spread across multiple countries or legal entities will inevitably reach scaling bottlenecks. In these organizations, decentralization will naturally emerge, often accompanied by the presence of central business functions responsible for coordinating and aligning distributed teams.
If you do not have the size or complexity to benefit from the data mesh, you might find greater value in a centralized approach, such as using a data warehouse or data lakehouse. Transitioning to a data mesh represents a significant leap in maturity that might be unattainable given your current resources and skill sets.
Executive Support
Having the buy-in from the top management is crucial for any initiative and data mesh is no different. As it calls for a cultural shift and demands a fundamental change in how organizations perceive data, it cannot be forced upon the organization solely by the technical team. This change relies on executive commitment to maintain discipline and view data mesh as a strategic organizational transformation.
Domain-Oriented Organization
Data mesh operates on the assumption that well-structured domains within an organization already have their own technical teams. On the other hand, the traditional operating model of centralized data engineering relies on a shared pool of technical resources, which are distributed among different business teams. If your organization follows this centralized shared model for IT, you might face challenges in terms of both the availability of skilled people and the necessary leadership to successfully implement data mesh at this point.
Finally, data mesh is not a silver bullet for all data management problems. It comes with its own share of challenges and considerations and before you go on this journey, you should fully assess your organization’s need and readiness to adopt a data mesh.
Subscribe to our newsletter
You’ll receive insights, strategies, and best practices that help you succeed in adopting and implementing AI & Data. Only what matters. Once a month.