top of page

SQL to Parquet using Python 3.7.6

First read SQL server using Python3 and PyODBC.  

Start out downloading PyODBC via pip from pypi.org. “pip install pyodbc”. I created a file called getsql.py. I used AdventureWorksDW2017 downloaded from Microsoft in this example. This function will return a list of customers that will be used in creating anParquet file. 

import pandas.io.sql

import pyodbc

import pandas as pd

conn = pyodbc.connect('Driver={SQL Server};'

                      'Server=name of the sql server;'

                      'Database=AdventureWorksDW2017;'

                      'Trusted_Connection=yes;')

query= """SELECT 

        GeographyKey,   

        case when Gender = 'F' then 0 else 1 end Gender, 

        cast((DATEDIFF(DAY, BirthDate, GetDate()) / 365) as int) Age,

        cast(YearlyIncome as int) YearlyIncome 

        FROM DimCustomer

        """

To create the Parquet file I used Pandas.

Run pip install pandas. Then its easy to just read the query into to the file compressed to gzip  (small and fast).

 df= pandas.io.sql.read_sql(query, conn)

df.to_parquet('TrainingData.gzip', compression='gzip')

bottom of page