Updated March 17, 2023
Introduction to Azure Data Factory Integration Runtime
Azure data factory integration runtime is a compute infrastructure used in azure synapse pipeline and azure data factory to provide capabilities of integration across multiple environments of the network. In Azure, the data factory and synapse pipeline define the action which we perform. Linked service in azure defines the compute service of the target data store. Runtime integration provides the bridge between azure linked services.
Key Takeaways
- The trigger pipeline stores the instance metadata. The pipeline connects to data stores and computes services in Azure regions to move data between compute services.
- At the time of creating an instance of a data factory or workspace of the synapse, we need to specify its location.
What is Azure Data Factory Integration Runtime?
The azure data factory runtime integration provides the reference of linked service activity as well as the computing environment where we can run the activity or dispatch the same. It allows the activity that was performed into the possible region to the compute service or target data store for maximizing performance and allowing flexibility for meeting compliance requirements.
Using the management hub, we can create runtime integration into the azure synapse UI and azure data factory, or we can connect from data flows activities, and datasets. The integration runtime is basically a compute infrastructure used to provide data integration capabilities across various network environments. The self-hosted integration is executing copy operations between the private network and the cloud data store.
How to Create Integration Runtime?
The below step shows how we can create the azure integration as follows. First, we are connecting to the Azure portal.
1. In the first step, we are login into the azure portal by using the specified credentials of the azure portal as follows.
2. After login into the azure portal, we need to click on create a resource tab to create the azure data factory.
3. After opening the create resource tab, we are opening the integration runtime to create new integration as follows.
4. After clicking on a new tab, a new window opens. In that, we need to select the runtime setup of integration as follows.
5. After selecting the runtime setup of integration, we will define the name, type, and region of integration as follows.
6. After defining the name, type, and region of integration, we will define the data flow.
7. After defining the data flow, we will edit the linked service and add our integration into it.
8. After adding the integration, we can check that azure data factory integration is created in the dashboard.
Azure Integration Runtime SSIS
The azure integration runtime is a fully managed cluster of azure virtual machines that were used to run our SSIS packages. We can bring our own SQL database or a managed instance of SQL for the SSIS catalog. We can increase the power to compute the node size of scale by specifying the number of nodes in the cluster. We can manage the running azure cost of our integration by starting and stopping the demand for requirements.
While using familiar tools such as a SQL server management studio and server data tools such as on-premises SSIS to deploy and manage the existing SSIS package with little change. We use the azure SQL database server to manage data by using IP/Firewall network service endpoint rules to connect managed instances or the azure SQL database server.
We are creating an SSIS DB instance on behalf of a single database as part of an elastic pool of managed instances. We can access the SSIS through the public network or through a virtual network.
Types of Azure Data Factory Integration Runtime
Azure data factory offers three types of integration runtime; we need to choose the type as per our needs and need to choose as per network environment capabilities and data integration capabilities.
Below is the type of integration runtime as follows:
- Azure – Azure supports data flow, activity dispatch, and data movement into the public network. It will support data flow, activity dispatch, and data movement with private link support. This type is commonly used when creating azure data factory runtime integration.
- Self-hosted – Self-hosted supports activity dispatch and data movement into the public network. It will support activity dispatch and data movement in private link support. This type is frequently used when creating azure data factory runtime integration.
- Azure-SSIS – When the Azure SSIS package is executed, it will support the public network. The public link will be supported when the Azure SSIS package is executed. This type is used when developing the SSIS package.
Integration Runtime Location
The integration runtime location defines the back-end compute and the SSIS package execution, which we have performed.
We can set the below location as follows:
- Azure IR location – We can set an Azure IR location region in which activity execution of dispatch takes place. The effort is made automatically to copy the activity, which detects the sink data store location, then we use IR in the same region. When copying data to an Azure blob in the west US, the blob is detected in the US region, and the copy activity is performed in the IR. The synapse region workspace is used for the data flow of IR. During the activity, we can see which locations are having an impact.
- Self-hosted IR location – The self-hosted IR location is logically associated with the synapse workspace or data factory that was used to support the functionalities that we provided. As a result, there is no explicit property of self-hosted IR. When performing data movement, self-hosted IR extracts data from the source and writes it to the destination.
- Azure SSIS IR location – To achieve the performance, the azure SSIS location must be chosen. The location of our Azure SSIS does not have to be the same as that of our data factory, but it must be the same as that of our SQL database. If we do not already have a database, we must create one in the same location where the virtual network is created. Using the same method, we are creating an Azure SSIS IR for the same location in order to reduce data movement and associated costs.
Command
In the below example, we are creating the azure data factory integration runtime by using the command as follows.
1. In the first step, we are launching the windows PowerShell in our local system as follows.
2. After launching the PowerShell, now in this step, we create the variable and copy and paste the same in the script as follows.
$SubscriptionName = "Azure_sub"
$ResourceGroupName = "Azure_grp"
$DataFactoryLocation = "EastUS"
$SharedDataFactoryName = "Azure_df"
$SharedIntegrationRuntimeName = "Azure_IR"
$SharedIntegrationRuntimeDescription = "Azure integration runtime"
$LinkedDataFactoryName = "Azure_LDF"
$LinkedIntegrationRuntimeName = "Azure_LDFR"
$LinkedIntegrationRuntimeDescription = "Azure integration runtime linked source"
3. After creating the script now in this step, we login into the azure portal through windows Powershell as follows.
Select-AzSubscription -SubscriptionName $SubscriptionName
4. After login into the azure portal now, in this step, we are running the following command to create the data directory as follows.
Set-AzDataFactoryV2 -ResourceGroupName $ResourceGroupName `
-Location $DataFactoryLocation `
-Name $SharedDataFactoryName
Integration Runtime Network Environment
Azure integration runtime is used to connect compute services and data stores to publicly accessible endpoints. To enable the virtual network managed by the Azure runtime to connect data stores for the purpose of using the private link service in the network environment.
We have options in the synapse workspaces to limit outbound traffic from the managed virtual network of integration runtime. All ports in Azure Data Factory are open for outbound communications. The azure SSIS integration runtime is integrated with the virtual network to provide outbound communication controls.
Conclusion
Azure data factory integration runtime is a compute infrastructure used in azure synapse pipeline and azure data factory to provide network integration capabilities across multiple network environments. The self-hosted integration performs copy operations between the private network and the cloud data store.
Recommended Articles
This is a guide to Azure Data Factory Integration Runtime. Here we discuss the introduction, how to create integration runtime, types, location, and command. You can also look at the following articles to learn more –