Updated March 3, 2023

Introduction to Data Virtualization

Data Virtualization provides a platform to manage heterogeneous data sources, without having to know the format how it is stored and the location where it is stored, by virtually combining all the relevant data into meaningful insight, without copying or moving the data and present it to users in real-time in the way it is required.

This model replaces the Extract/Transform/Load layer (ETL) of the Data warehousing process, does not have a storage layer, and provide only the relevant data to the users from any of the sources it virtually combines thus meeting security standards/ Governance requirements fully.

In the virtualization landscape, the multiple data sources like a traditional database, data warehouse, data lakes, data marts are allowed to function as it is and the data remains stored in one place and consumed by users through the virtualization layer thus achieving single source of truth policy in the organization.

In this article let’s study how virtualization works and its applications/uses.

How does Data Virtualization work?

Data virtualization acts as an interface between several data sources an Organization has and the users by integrating them virtually. The functionalities can be explained in five steps and they are

Data Abstraction layer

The complexity of data like the protocols, format, dependencies, and location is hidden from the users.
Provides a single virtual view of all the connected data sources
Access is provided to data in any source as per the authentication policy

Nil replication Nil relocation

No physical storage with nil data
Data is extracted only when demanded and any Dynamic requirements of the users are met
In the absence of time-consuming of the ETL process, Overall development time is

Information in real-time

Logical views are created for the data present in all the
Data is made available to the users in real-time using the logical views
Transformation of data and required quality checks happen flexibly without any latency and

Data services in self-service mode

Allows users to connect all the internal and external data by themselves thru virtual
Enables users to configure their own data model as per business models
Connects any type of data and provides access to all the data present in the IT landscape

Security and Governance

Data access is provided only to authenticated
Security and governance procedures fully followed for cloud and on-premises access
Manages metadata of multiple data points and provides clear visibility of data

Applications of Data Virtualization

Data Virtualization is a modern asset to any organization as it is capable of handling growing challenges in data management like Caching, Query pushdown, data catalog, and pain points in handling multiple data sources. Its applications include

Logical Data Warehouse

Data Virtualization provides a perfect framework for this warehouse. Conventional data sources, as well as data sources of big data like Hadoop, data lakes, NoSQL database, are logically connected through this virtualization route and a logical data warehouse is built. In this logical structure, any query by users is viewed as a query on a single database. Multiple protocols like ODBC, JDBC, REST, and APIs are used to exchange data from input sources.

Big data Analytics

Data Virtualization facilitates connections to NoSQL databases, Hadoop, and data lakes of big data platforms along with conventional data sources and creation of logical views and use modern visual analytics and Business intelligence tools to get business insights. It takes minimal time and effort in setting this application as it does not involve any database creation and data migrations.

Data Services

This platform simplifies the ways and means of accessing the data from varieties of data sources irrespective of its complexities and unique characteristics. Logical views are created for data sources individually and merged into an integrated framework and presented in such a way to make it appear to the users as all the data are flowing from a single database. Data transformations, massaging, Joins, and filters are factored in the logical view creation.

Data virtualization techniques are extensively deployed in data services applications where data is culled out from multiple sources and presented to users.

Data Catalog

This tool helps in building a robust inventory of existing data assets as a data catalog for an organization without having to migrate them to a new platform. This catalog allows faster data access for the Business team, data analysts, data scientist team with very few technical details of the data and accelerates business decisions.

Uses

Data Virtualization plays a key role in building a state of art IT landscape in an
This platform offers abundant flexibility and delivers results in the shortest possible time. It enhances time to market appreciably.
It provides agility to the business to thrive and take on their competitors
Users are agnostic to the complexity in the back end data sources and its interfaces
Highly scalable and connection to any new data sources can be added on the fly without any disruption in
Data Sources in cloud and API services can be easily integrated into this platform
These platforms are easy to maintain due to its simple design
The reduced total cost of ownership (TCO).
Plenty of self-service options to the users in the information recovery journey

Advantages

Eliminates data errors as the data is stored once used in many places
Adds minimal load to the system as the data is accessed only from its source
Faster access to data in real-time
Little development effort and reduced time to implement
No major storage space is required
Enhanced Security and highly Governance complaint

Disadvantages

Indiscreet queries by users or wrongly tuned query may impact the performance of this platform
Managing change is a laborious process in a federated environment as the likely impact had to be factored before incorporating any
This platform is a high potential single point of failure. One needs to build a strong backup for
Any use cases involving Batch processing can result in major conflicts within
Resources allocation among the data sources can run into rough weather and cause
It is not an ideal platform to store historic snapshots and a data warehouse is good

Conclusion

Around 10 vendors offer data virtualization platforms and some of the big players in this list are IBM, Oracle, Red hat, Tibco, Informatica, and Denodo. The selection should be based on the application that is going to consume the data, the performance level required, and security requirements.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage