Azure Data Factory
Running your ETL/ELT flow to a cloud-based Data Warehouse is one of the purposes using Data Factory. This is one of the reasons why it could be a good idea to use ELT (Extract Load Transform) instead of ETL (Extract Transform Load). Often the source data is stored elsewhere on prem or on another site. The network connecting the cloud is often the bottle neck makes faster if we could compress and parallelize the data loading and after we the data in a lake or blob storage we transform the data.
Azure Data Factory (ADF) is as it sounds like based on a factory assembly lines. The assembly line is called Pipeline. In the Pipeline consists of workstations for the assembly process to create the ETL/ELT flow.
-
Ingest
-
Orchestrate
-
Transform Data
A simple example could be copy data from a CSV file into a database table in Azure SQL. To create a pipeline for this we need firs to know where and how to get the data, is stored and store this information in a Liked Service. The data then needs to be structured for mapping
The CSV file represents the source that could be a blob storage.
Linked server is needs to store key and other connection related connection information to be able to connect to source.
The dataset is used to organize the data so a correct mapping can be used in the copy activity.
Then it’s the same thing towards the target environment. In this case SQL Server.
A tutorial is coming up soon showing this example.