The specialist for data virtualization and integration, Denodo, has dealt with the topic of unified data warehousing (DWH) and data lakes (DL). This unified approach enables complex enterprise data to be processed.
The best possible use of all company data is central to competitiveness. Data warehouses or data lakes are often used to store this information. Because of the increasing complexity of company data, mainly due to more unstructured or semi-structured information, these architectures are increasingly reaching their limits. Denodo advises connecting DWH and DL systems in a uniform platform (“Unified DWH/DL”) using data virtualization.
While data warehouses are specialized for analysis purposes and structure, cleanse, and curate data from specific sources, data lakes are primarily used for the central storage of raw data to later be used for advanced analytics or machine learning. Both architectures can be operated on-premises or in the cloud. According to the TDWI report “Building the Unified Data Warehouse and Data Lake,” about half of the companies (53 percent) have a data warehouse on-premises, while 36 percent use one or a data lake in the cloud.
A good third (36 percent) of all companies already use a data lake to supplement the existing data warehouse. This enables, for example, the processing of multi-structured data or the analysis of IoT information and the integration of the results in reports or visualizations. According to Denodo, the next step is the “Unified DWH/DL”: Here, the two concepts merge strongly on several levels and include functional services that are identical for all data types, workloads, applications, and use cases.
This approach is also known under Enterprise Data Architecture, Hybrid Data Architecture, Modern Warehouse Architecture, or Multiplatform Data Architecture. Eighty-four percent of the companies surveyed already consider the topic essential. They use such systems, for example, to break down data silos and create a single source of truth. This allows, among other things, AI-supported analyzes and data-based insights in real-time.
Also Read: What Is Data Analytics With AWS
Integration can be approached in several ways, Denodo points out. On the one hand, physical consolidation is possible by moving the data or moving it to a new repository. According to Denodo, however, this leads to redundant data storage and does not make economic sense due to physical data movements. Virtualization using a logical data layer is more advantageous: The data stays at its place of origin. A unified semantic model enables access to all information in the data warehouse or data lake. The storage location of the data no longer plays a role, and the data infrastructure is future-proof and investment-proof.
Denodo emphasizes that not every company necessarily has to switch to a “Unified DWH/DL.” However, one could hardly avoid this step as soon as existing systems no longer cover the needs.
It is important not to rush through the process. A step-by-step implementation allows the optimization of processes and data and the hiring or further training of employees. Data warehouses or data lakes are often used to store this information. Because of the increasing complexity of company data, mainly due to more unstructured or semi-structured information, these architectures are increasingly reaching their limits. According to Denodo, however, this leads to redundant data storage and does not make economic sense due to physical data movements.