Azure Data Factory Blob to SQL tutorial
The goal with this tutorial is to create a simple pipeline that move data from an CSV file stored in a blob storage into a Azure SQL table.
If you do not have an AZURE subscription yet you can get an Azure account here
For download the CSV file I’m using click here. AWEmp.csv
The finished product is going to look like the picture
Basic preparation
Adding resource group
First we need to create a resource group.
From home in Home in your Azure subsription select Resource groups
Select Create. Select Subscription, Name and region.
Press Review + create in the bottom of the screen
In the Resource group we are going to put the Ingest area and the source file as well as the database where we keep the destination table. Let’s begin with the blob storage
1. To begin with we need a Storage Account from the Resource Group press the “Create resources” button.
2.Select “Storage Account”
3. In the Basic form:
Select subscription, Resource group, Name and Redundancy. See my choices.
Observe the menu in the top of the page here you have a menu you could set values for more Advanced features etc. You can also use the Next button to go through those settings in the basic form. Select subscription, Resource group, Name and Redundancy. See my choices. Observe the menu in the top of the page here you have a menu you could set values for more Advanced features etc. You can also use the Next button to go through those settings. In this tutorial we use the default values except for the Basic values.
4. Click Review + create
5. Create
Add Blob to resource group
6. If you haven’t downloaded the file AWemo.csv you should do it now.
Go to the storage account click on Blob service
7. Click on + Container.
8. If you haven’t downloaded the file AWemo.csv you should do it now.
Go to the storage account click on Blob service
9. Click on + Container.
10. Click on Upload and select AWEmp.csv
If you have succeeded with the previous steps you have created a Blob storage with an ingestion file.
(If you go to your blob container, click on the blob you just created you can right click and select “View/Edit” to see the content of the file)
Adding Database
Now we need a target database. To do this you need to go to create a new resourse. On the top left corner click on the three stripes and + Create resource
Search for “SQL database” and select it.
On the Create SQL Database select the same resource group as you created in previous steps or if you already have a server on another resource group. Add a Database name in my case ADFExamplesDB. In the other boxes I chouse to use as small DB as possible. Use the wizard in the bottom to add other characteristics for the Database. I used the default settings.
Setting up a Database server you only have to add and remember an admin account and a location and a server Name. don’t forget that you might want to use the server to other databases in the future naming it
Adding Table to database
Now you are ready to add a table where you could store the result.
1. Click “Go to resource” On the left menu you should have a “Query editor (preview)” button. This is a cool tool that could help you managing the database without leaving Azure.
2. Login using your admin account created in previous step.
3. Add the Create table script:
CREATE TABLE [dbo].[AWEmployee](
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](50) NOT NULL,
[MiddleName] [nvarchar](50) NULL,
[Title] [nvarchar](50) NULL,
[HireDate] [nvarchar](10) NULL,
[BirthDate] [nvarchar](10) NULL,
[LoginID] [nvarchar](256) NULL,
[EmailAddress] [nvarchar](50) NULL,
[Phone] [nvarchar](25) NULL,
[MaritalStatus] [nchar](1) NULL,
[EmergencyContactName] [nvarchar](50) NULL,
[EmergencyContactPhone] [nvarchar](25) NULL,
[Gender] [nchar](1) NULL,
[BaseRate] [money] NULL,
[VacationHours] [smallint] NULL,
[SickLeaveHours] [smallint] NULL,
[DepartmentName] [nvarchar](50) NULL,
[StartDate] [nvarchar](10) NULL,
[EndDate] [nvarchar](10) NULL)
4. Press run or “shift enter”
Azure Data Factory
Setup Data Factory
Congratulate now you are prepared for Data Factory. The steps until now is to setup environment and many of these resources will normally be reused over and over again
First another preparation step (Create Azure Data factory).
-
Press home so you see Azure services. If you do not see the Data factory in the selection use the search bar on top by typing “data fact..”
-
Select Data factrories.
-
Press + Create
-
Use resource group created for this tutorial in my case ADFExamples, Region should be the same as used for the other resources in my case “North Europe”, Name I used ADFExampleDF, and version V2
-
Click “next : Git configuration”
-
In my case I clicked configure Git later.
-
All other cases I used the default values.
-
Press “Go to resources”
-
Now you can go to ADF studio (Azure Data Factory Studio) by clicking the on the factory Icon.
Yeeha! now we are in Azure Data Factory
Add Azure resources to ADF
Now is a good time to go back to the design to better understand what Data Factory actually is meant for. In previous steps we have created a blob storage and put a file there and created a SQL instance with an empty table.
Now is a good time to go back to the design to better understand what Data Factory actually is meant for. In previous steps we have created a blob storage and put a file there and created a SQL instance with an empty table.
By clicking on the pen symbol, you could start adding components to the Factory. Let’s start with creating a Pipeline for our load process.
This is done by clicking the three dots under the Pipeline symbol and select “New Pipeline”. In the properties set a Name for yours. My chois is ADFLoadBlobToSQL that tells us what this pipeline is.
As you see in the overall picture, we need to interface with blob storage as well as the database. This is done using a Linked service. You can find the Linked services under “Manage” in the menu.
Create linked services for blob storage
1. We start with creating a link to the blob. Click “Create linked service” button and try to find blob storage in the data store. I used the search bar and wrote blo to find it.
2. Click on the blob Icon and fill in the form Name could be a good idea to use a reusable name to be able to use the same linked service in the future. In this tutorial I am not going into more complex authentication methods. You should be able to select your Azure subscription in the dropdown as well as your Storage account. Now you should be able to test your connection. If successful click "Create"
Create linked services for database
We now need to repeat the same thing for the destination table. But this time create a linked service towards SQL server.
1. Click “+ new” on the linked services form.
2. Select an Azure SQL Database.
3. We are using the connection string method where you add your Azure subscription created earlier.
4. Test connection (if this is the first time you have created a linked service to an SQL database you probably have to open your firewall explained here).
5. When connection is successful click “Create”.
Create Datasets
Its time to structure the data in order to map data in the coming copy function. This is done in datasets.
1. In Data Factory click on the pen and select new Dataset from the three dots in Dataset.
2. The first dataset is towards the blob so select “Azure Blob Storage” (I find it when I selected Azure in the top menu).
3. Then we need to select what format the file we are going to pick has. Select DelimitedText. And press continue.
4. I name the DataSet GetAWEmployeeCSV.
5. Find your Linked service in the dropdown.
6. You should now be able to find your file using the folder icon under File path.
7. Fill the checkbox after “First row as header” Because the file has headers in the first row.
8. Use Import Schema from connection/store and click OK.
9. Use default settings on next screen except from Column delimiter here you chould change it to “Semicolon”
10. Test it by pressing Preview data it should look like in the picture below
Guess now we must do the same thing with the database, create a dataset to the table we created earlier.
1. In Data Factory click on the pen and select new Dataset from the three dots in Dataset.
2. This time we chose Azure SQL Database.
3. I name it DatasetSQL.
4. Find your Linked service in the dropdown.
5. You should now be able to find your Table using the dropdown under Table Name
6. Fill the checkbox after “First row as header” Because the file has headers in the first row.
7. Click OK.
8. Test connection by clicking Test connection.
9. You can also preview data to check the empty table.
Create Copy Data Activity
The only thing missing in the overall picture is the Copy Data function taking the data from the CSV dataset and parse it to the Table Dataset.
1. Go to back to you Pipeline in Activities/Move & transform select “Copy data” drag and drop to the canvas to the right. And on the general tab give it a proper name.
2. On the Source tab select your CSV dataset. You can test by clicking “preview data”.
3. On the Sink tab you specify your target destination with selecting the SQL dataset.
4. On mapping use Import schema to map columns. There also is a possibility to help format datatypes like date etc. under Type convention settings” In my example I added an example that doesn’t have any meaning in this mapping.
5. Click Validate to check for errors and then run Debug
6. If status is “Succeeded” you can Push all.
7. To test that the result came correctly to your database ether go to sink, open dataset and “Preview data” or go to your Azure, Home, your database server, select your database and use your query editor under table you should have the table. Right click the table select top 1000. And you should get 296 rows of data.
8. Now you can create a trigger for the loading, the trigger can also be scheduled and so on. This is just a tutorial, so it is not necessary to set up a trigger and some logics are missing handling Truncate or delete of the table. So, if you run the flow more than ones you are getting a lot of duplets.