Python for the SQL Server DBA

Python is increasingly used by DBAs as a general-purpose scripting language, despite the pressure to adopt Microsoft's PowerShell. They find it easy to learn, powerful, and reasonably intuitive. Here Timothy Wiseman, a working DBA, explains the attraction of Python and gives a general introduction to the language, suitable for anyone who has been wondering whether to give it a try.

What is Python?

Python is a dynamic, interpreted language which was originally designed by Guido van Rossum with code readability as one of its primary goals.  As part of that, one its most unusual features is the use of meaningful white space. Python uses code blocks that are shown and enforced through indentation.  Although some of its extensions and libraries are specific to one operating system, the core language is available for all major operating systems and most Python scripts will run unaltered from one system to another.

Why would a SQL Server DBA be interested in Python?

The Microsoft T-SQL programming language that is built into SQL Server along with the capabilities of included tools such as SSIS and SSRS form a remarkably versatile platform without any third party tools whatsoever.  In fact, T-SQL itself is Turing complete.

However, they are not the best tool for all tasks.  For instance, SQL Server by itself does not have full support for regular expressions. While it is fully capable of interacting with the file system there are generally better tools for any complex manipulation of the operating system.  Pure SQL is also not the strongest option for complex mathematics or complicated file munging.

Python is one tool capable of handling those tasks.  It is certainly not the only option, with Microsoft C# and PowerShell also being excellent options for some of these tasks.  Perl and Lisp are amongst many other options, each with their own strong points.  

Python has several strong points which make it worthy of attention.  It is free and open source software, but far more significantly it is easy to learn and easy to read even for people who are not Python programmers.  It is capable of being used to write full scale applications and server software, but for most DBAs it is more significant that it is excellent for jotting off quick utility scripts to solve a specific problem.  For instance, it is excellent for writing scripts to reorganize files into a new directory structure and  automatically verify that all the automated backup files actually are where they are supposed to be each morning. 

How Python can be used with SQL Server

 Python can be used to parse data that is not in a standard format so it can be imported into SQL Server.

This is one of my primary uses for it for in connection with SQL.  I have received numerous files of data that was either structured in a nonstandard format or completely unstructured but with data such as e-mail addresses that need to be pulled out en mass.  While it is possible to do these in SSIS or even pull it into a SQL Temp Table line by line and then parse it using pure T-SQL it is generally easier and faster to use Python or occasionally C# depending on the file type. 

For instance, a file that is structured like this:

… does not work easily with tools such as bulk insert or SSIS. [2] However Python can be used to easily determine the needed number of commas and add them to all lines, such as:

Afterwards the output file is quite amenable to Bulk Insert.

There are several choices to actually connect with SQL Server within python.  The options include the default odbc which comes as a standard library, the win32com client tools, mxODBC (commercial product) and pyODBC. While the odbc module will handle almost any requirement, the other options add additional features which can simplify development.

Pyodbc, for example, permits access of values by name and makes cursors iterable.  These changes from the default odbc package make it much easier to use.  As an example,  pyodbc can be used to connect to a SQL Server and then extract all e-mail addresses from a free form memo field.

A similar script can scan an unstructured text document to extract e-mail addresses and insert them into a SQL table.  Of course, it is often more useful to be able to call stored procedures.  This can be done with pyodbc by sending the exec command for the procedure with all parameters through the execute method.  Many of the libraries automatically begin transactions, so those most be committed or rolled back in the code, though some such as pyodbc permit a connection to be opened in autocommit mode.

How to acquire Python

Python is free and open source software maintained by the Python Software Foundation (PSF).  The latest version can be downloaded directly from the PSF at Python.ORG Python Programming Language — Official Website

For many people that is not the best option though.  For one thing, depending on what libraries and compatibility with other older code is needed, it may be wise to use a slightly more established version than the  latest release.    Currently, both the 2.7.x and 3.x lines are being maintained.  The 3.x line is the more modern version with many enhancements, but not all libraries have been brought up to date and it is not entirely backwards compatible with the 2.7 line. Python 3 has many, but since it breaks backwards compatibility, a need to work with an older code base or with libraries that have not been updated may make 2.7 the better choice in some cases.

Also, it may be easier to download a pre-packaged distribution such as WinPython, ActivePython, or the commercially supported Enthought Canopy. They come with many packages and libraries that are not part of the core Python Distribution as well as pre-configured development environments which are more sophisticated than IDLE. Getting one of the distributions can save a lot of time downloading and installing individual components later.

There are many options for Python IDE’s and editors.  Personally, I tend to use Spyder when I want to work with Python interactively or write short, simple scripts.  It is lightweight and fast, but has some basic editing and introspection features that are convenient.  When I am writing a longer or more complicated script/program I tend to use Eclipse with Pydev.  It is loads slowly but has very sophisticated introspection and project management abilities.   It is largely a matter of taste.

How to learn and get help with Python

There are many excellent resources on learning python and getting help.  Once the absolute basics have been mastered, one of the best sources of information is the Doc strings for the entity being called.  Many IDE’s provide user friendly access to the doc strings, but they can always be called by typing:

at an interactive prompt.

A few of the more useful resources for learning Python are:

Also, there are some more generalized programming resources with excellent Python sections, such as Stack Overflow.

Further Reading:

Tags: , , , , , ,

  • 76099 views

  • Rate
    [Total: 2    Average: 4.5/5]
  • BuggyFunBunny

    Why python is my language of choice
    Python can be used from nearly every machine/OS combination on the planet, even the IBM z/series. That means, of course, that it will talk to nearly every database, not just SQLServer.

    Some organizations may be pure M$, but not many, and fewer by the day. Perl was once the universal glue/script language, and before that shell in the *nix world. Python is now.

    And a full-fledged development language, too. People do write applications with it. Just go to Google.

  • Peter

    Agreed
    I had to maintain a batch process that needed logging and the ability to process SQL Data into some rather odd formats or pull data out of some rather odd formats – things that DTS or SSIS didn’t handle very well. In addition, I often needed to Compress/Decompress and transmit files as well. Add in some rather rapidly changing environments and Python was a great win for me. I could easily edit the script, debug it, not need to compile/deploy, and it did everything I needed.

    The only major issue I found was trying to compress with the Python native commands. I eventually just shelled out to a compression program from within Python – much faster than trying to iterate through the file objects and compress them one by one, at least for me.

    Great introductory article. I look forward to seeing more.

  • Anonymous

    String formatting
    A more pythonic way of preparing string is using the % string formatting operator. For example,
    instead of
    connStr = ( r’DRIVER={SQL Server};SERVER=’ + server + ‘;DATABASE=’ + database + ‘;’ +’Trusted_Connection=yes’)

    use

    connStr = r’DRIVER={SQL Server};SERVER=%s;DATABASE=%s;Trusted_Connection=yes’ % (server, database)

  • Sarnu

    Also have a look into IronPython
    Beside Python, which we use for general loading/maintenance scripts, IronPython (the .Net incarnation of Python) can be very useful to the DBA.
    From IronPython you can access the entire .Net Class Library and especially SMO and AMO Classes for manipulating DB-Metadata and SSAS-Objects.
    We use this for quick and dirty mass creation of objects as well as for productive maintenance.

  • timothyawiseman@gmail.com

    Replies
    Samu, I am a huge fan of IronPython personally and think it is probably one of the better choices for a SQL Server DBA. In this particular article I wanted to focus on the basics of python for a SQLServer DBA, so centered on the reference implementation.

    And for the anonymous poster about the string formatting operator, I agree fully. However, for this piece I thought string concatentation would be more familiar to SQL Server DBA than the formatting options. In fact when dealing with pure strings I tend to use normal concatentation unless I specifically think about.

    However, the formatting options can give you more control and permits type conversions if you are working with mixed data types instead of only strings. http://diveintopython.org/native_data_types/formatting_strings.html provides a good overview on this topic.

  • Sridhar Ratnakumar

    Python cookbook
    ActiveState hosts a cookbook site for Perl, Python and Tcl. The Python cookbook is especially the most popular category so much that O’Reilly’s book “Python Cookbook” is based on this site.

    Here’s the link to the database category:

    http://code.activestate.com/recipes/langs/python/tags/database/

  • Curtis Koppang

    Python vs PowerShell
    As a programmer I am drawn toward Python (IronPython) but MSFT is shipping so many PowerShell scripts with it’s products it’s hard to justify porting them all to Python.

  • Rowland

    Interesting
    Thanks for a well written article. I’m not sure I’d bother with Python for many reasons but it’s nice to see someone approach this topic differently.

  • just3ws

    Another approach.
    Another approach is to use an ORM mapping tool for the Sql. I recently did pretty much the same thing as described in the article but used SqlAlchemy, a Python ORM to handle the data mapping for me. This way you don’t have to worry about building the sql strings. http://www.sqlalchemy.org/

  • timothyawiseman@gmail.com

    Python and PowerShell
    Curtis, PowerShell is another good option. Personally for development I prefer Python and IronPython, but there is certain no reason you cannot use both especially when the code for the PowerShell portion is already provided.

  • Ed.Carden

    Personally I hate Powerhell.. er Powershell.
    Personally I hate Powershell (aka Powerhell). I have never reviewed Python till reading your article above and I already can follow the Python sample code you have provided.

    Even with training and experience in C++, VB/VB.Net and some scripting like vb script, I still can’t do the simplest of tasks with Powershell because the darn thing is so convoluted/complicated. And I have tried on more the one occassion to sit down and go thru the Powerhell tutorials/samples. I get the whol reason for wanting to provide in a single scripting option something that can do it all but it seems to me that when the scripting language in question is harder to learn & follow then full fledged programming langauges like C++ then that should tell you that maybe you’ve over done the thing.

    Personally I hope Powerhell fades away like so moany other MS technologies that seem to go no where even after billions in hype.

  • nicholsonjf

    Just what I was looking for
    I work for large enterprise on a team that uses perl scripts to pull and parse user account data from over 100 business applications that run on sqlserver. My core competency is in Python, and I was worried I’d have to start learning Perl to contribute. After reading this article, I’m confident I’ll be able to use Python to accomplish what I need. Thanks!!