Supercharging MS SQL Server with Python

Supercharging MS SQL Server with Python

  • 790

Supercharging MS SQL Server with Python .I hope it may help some other users find ways to incorporate some Python into their SQL routines. I personally use this almost daily, it is so simple yet incredibly powerful.

At work, I use SQL a lot. It is not without annoying nuances and limitations, but in the end, it is the foundation of all data professions. For that reason, it is an absolute necessity for anyone working in the land of data. Being competent in SQL is incredibly important.

Although SQL is great, why settle for great? Why don’t we supercharge SQL?

The limitations of SQL stem from it being a declarative language, meaning we tell SQL what we want and SQL will fetch that from a specified database. For many data extraction or simple data manipulation tasks, this is all that is needed.

But what if we want more?

That is exactly what I will show you how to do in this article.

It Starts with a Foundation

import pyodbc
from datetime import datetime

class Sql:
    def __init__(self, database, server="XXVIR00012,55000"):

        # here we are telling python what to connect to (our SQL Server)
        self.cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
                                   "Server="+server+";"
                                   "Database="+database+";"
                                   "Trusted_Connection=yes;")

        # initialise query attribute
        self.query = "-- {}\n\n-- Made in Python".format(datetime.now()
                                                         .strftime("%d/%m/%Y"))

mssql_connection_short.py

This code is the foundation to supercharging MS SQL server. Once this is built we can connect to SQL from Python with just:

sql = Sql('database123')

Easy right? There are a few things going on, so let’s dissect the code.

class Sql:

The first thing to note, we are containing this within a class. I have found this approach logical as in this format we add or remove processes for this specific database. This will make more sense once seeing how this works.

We initialise our class:

def __init__(self, database, server="XXVIR00012,55000"):

Myself and my colleagues almost always connect to the same server. So I set this common server name as the default argument for server.

Your server name can be found in the ‘Connect to Server’ dialog box or at the top of your window in MS SQL Server Management Studio:

This is image title

Next, we establish our connection to SQL:

self.cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
                           "Server="+self.server+";"
                           "Database="+self.database+";"
                           "Trusted_Connection=yes;")

This is made incredibly easy thanks to the pyodbc module. We simply pass a connection string to the pyodbc.connect(...) function, more details on this can be found here.

Finally, I like to create a query string within the Sql class which is updated with every query passed to the class:

self.query = "-- {}\n\n-- Made in Python".format(datetime.now()
                                              .strftime("%d/%m/%Y"))

This allows us to keep a log of our code and also doubles as a more readable output to colleagues that may be more comfortable reading SQL code.

Note that I will exclude the update to _self.query_ sections of the code in the following code snippets, please check the full code in the links provided if needed.

Building Blocks

There are a few essential functions that I find incredibly useful and use almost daily. These all focus on transferring data in or out of a database.

Let’s start with the following directory:
This is image title
For our current project, we need to:

  • import these files to SQL server
  • union them into a single table
  • dynamically create multiple tables based on categories within a column

After fleshing out our SQL class a little more this will be as easy as:

import sys
sys.path.insert(0, r'C:\\User\medium\pysqlplus\lib')
import os
from data import Sql

sql = Sql('database123')  # initialise the Sql object

directory = r'C:\\User\medium\data\\'  # this is where our generic data is stored

file_list = os.listdir(directory)  # get a list of all files

for file in file_list:  # loop to import files to sql
    df = pd.read_csv(directory+file)  # read file to dataframe
    sql.push_dataframe(df, file[:-4])
    
# now we convert our file_list names into the table names we have imported to SQL
table_names = [x[:-4] for x in file_list]

sql.union(table_names, 'generic_jan')  # union our files into one new table called 'generic_jan'

sql.drop(table_names)  # drop our original tables as we now have full table

# get list of categories in colX, eg ['hr', 'finance', 'tech', 'c_suite']
sets = list(sql.manual("SELECT colX AS 'category' FROM generic_jan GROUP BY colX", response=True)['category'])

for category in sets:
    sql.manual("SELECT * INTO generic_jan_"+category+" FROM generic_jan WHERE colX = '"+category+"'")
    

pysqlplus_example.py

Let’s start at the top.

push_dataframe

def push_dataframe(self, data, table="raw_data", batchsize=500):
    # create execution cursor
    cursor = self.cnxn.cursor()
    # activate fast execute
    cursor.fast_executemany = True

    # create create table statement
    query = "CREATE TABLE [" + table + "] (\n"

    # iterate through each column to be included in create table statement
    for i in range(len(list(data))):
        query += "\t[{}] varchar(255)".format(list(data)[i])  # add column (everything is varchar for now)
        # append correct connection/end statement code
        if i != len(list(data))-1:
            query += ",\n"
        else:
            query += "\n);"

    cursor.execute(query)  # execute the create table statement
    self.cnxn.commit()  # commit changes

    # append query to our SQL code logger
    self.query += ("\n\n-- create table\n" + query)

    # insert the data in batches
    query = ("INSERT INTO [{}] ({})\n".format(table,
                                              '['+'], ['  # get columns
                                              .join(list(data)) + ']') +
             "VALUES\n(?{})".format(", ?"*(len(list(data))-1)))

    # insert data into target table in batches of 'batchsize'
    for i in range(0, len(data), batchsize):
        if i+batchsize > len(data):
            batch = data[i: len(data)].values.tolist()
        else:
            batch = data[i: i+batchsize].values.tolist()
        # execute batch insert
        cursor.executemany(query, batch)
        # commit insert to SQL Server
        self.cnxn.commit()

mssql_push_dataframe_short.py

This function is contained inside our SQL class. It allows us to easily push a Pandas dataframe to the SQL database.

This is particularly useful when needing to upload a lot of files. However, the true power in allowing Python to import data to SQL comes with Python’s flexibility.

Importing a specific tab across a dozen Excel workbooks into SQL can be a nightmare. But with Python it’s easy. Now we have built a function that allows us to use Python to read those tabs and import them into SQL for us.

manual

def manual(self, query, response=False):
    cursor = self.cnxn.cursor()  # create execution cursor

    if response:
        return read_sql(query, self.cnxn)  # get sql query output to dataframe
    try:
        cursor.execute(query)  # execute
    except pyodbc.ProgrammingError as error:
        print("Warning:\n{}".format(error))  # print error as a warning

    self.cnxn.commit()  # commit query to SQL Server
    return "Query complete."

This function is actually used in the union and drop functions. It simply allows us to execute SQL code as simply as possible.

The response argument gives us the option to extract the output of our query to a DataFrame. Allowing us to extract all unique values from colX in our generic_jan table using just:

sets = list(sql.manual("SELECT colX AS 'category' FROM generic_jan GROUP BY colX", response=True)['category'])

union

Now we have built the manual function, creating the union function is easy:

def union(self, table_list, name="union", join="UNION"):
    
    # initialise the query
    query = "SELECT * INTO ["+name+"] FROM (\n"

    # build the SQL query
    query += f'\n{join}\n'.join(
                        [f'SELECT [{x}].* FROM [{x}]' for x in table_list]
                        )

    query += ") x"  # add end of query
    self.manual(query, fast=True)  # fast execute

pysqlplus_union_short.py

This simply loops through a list of table names we provide via table_list, building a UNION query for all table names given. This is then executed with self.manual(query).

drop

We have the ability to upload a very large number of tables to SQL Server. This is great but can quickly overpopulate our database. To remedy this, we will create a drop function:

def drop(self, tables):

    # check if single or list
    if isinstance(tables, str):
        # if single string, convert to single item in list for for-loop
        tables = [tables]

    for table in tables:
        # check for pre-existing table and delete if present
        query = ("IF OBJECT_ID ('["+table+"]', 'U') IS NOT NULL "
                 "DROP TABLE ["+table+"]")
        self.manual(query)  # execute

Again, this function is incredibly simple thanks to the manual function. This allows us to either drop a single table by providing a string to tables or to drop multiple tables by providing a list of table names to tables.

When combined these incredibly simple functions allow us to massively expand the capabilities of SQL Server by leveraging the strengths of Python.

If you are interested in seeing more please check out the GitHub repo for pysqlplus. This is a small project and is simply built to enhance interaction with MS SQL Server for my own workplace.

Nonetheless, I hope it may help some other users find ways to incorporate some Python into their SQL routines. I personally use this almost daily, it is so simple yet incredibly powerful.

Thanks for reading!