Click here to monitor SSC
Av rating:
Total votes: 43
Total comments: 11


Timothy Wiseman
Python for the SQL Server DBA
27 August 2009



  Python is increasingly used by DBAs as a general-purpose scripting language, despite the pressure to adopt Microsoft's PowerShell.  They find it easy to learn, powerful, and reasonably intuitive. Here Timothy Wiseman, a working DBA, explains the attraction of Python and gives a general introduction to the language, suitable for anyone who has been wondering whether to give it a try.

What is Python?

Python is a dynamic, interpreted language which was originally designed by Guido van Rossum with code readability as one of its primary goals.  As part of that, one its most unusual features is the use of meaningful white space, with code blocks being shown and enforced through indentation.  Although some of its extensions and libraries are specific to one operating system, the core language is available for all major operating systems and most Python scripts will run unaltered from one system to another.

Why would a SQL Server DBA be interested in Python?

The Microsoft T-SQL programming language that is built into SQL Server along with the capabilities of included tools such as SSIS and SSRS form a remarkably versatile platform without any third party tools whatsoever.  In fact, T-SQL itself is Turing complete.

However, they are not the best tool for all tasks.  For instance, SQL Server by itself does not have full support for regular expressions and while it is fully capable of interacting with the file system there are generally better tools for any complex manipulation of the operating system.  Pure SQL is also not the strongest option for complex mathematics or complicated file munging.

Python is one tool capable of handling those tasks.  It is certainly not the only option, with Microsoft C# and PowerShell also being excellent options for some of these tasks.  Perl and Lisp are amongst many other options, each with their own strong points.  

Python has several strong points which make it worthy of attention.  It is free and open source software for one thing, but far more significantly it is easy to learn and easy to read even for people who are not Python programmers.  It is capable of being used to write full scale applications and server software, but for most DBAs it is more significant that it is excellent for jotting off quick utility scripts to solve a specific problem.  For instance, it is excellent for writing scripts to reorganize files into a new directory structure and  automatically verify that all the automated backup files actually are where they are supposed to be each morning. 

How Python can be used with SQL Server

 Python can be used to parse data that is not in a standard format so it can be imported into SQL Server.

This is what I primarily use it for in connection with SQL.  There have been numerous files of data that was either structured in a nonstandard format or completely unstructured but with data such as e-mail addresses that need to be pulled out en mass.  While it is possible to do these in SSIS or even pull it into a SQL Temp Table line by line and then parse it using pure T-SQL it is generally easier and faster to use Python or occasionally C# depending on the file type. 

For instance, a file like this:

#Standardizes the number of commas per line prior to being read by
#programs such as BCP and required for compliance with RFC4180 for CSV files
#initialize the variables, change FileNames as appropriate
InFileName = r'C:\test\testimport.csv'
OutFileName = r'C:\test\testimport2.csv'
NumCommas = 0

File = open(InFileName)

for line in File:
    
if line.count(',') > NumCommas:
        NumCommas
= line.count(',')

#return to the start of the file
File.seek(0)

OutFile = open(OutFileName, 'w')
for line in File:
    OutFile.write
(line.rstrip() + ',' * (NumCommas - line.count(',')) + '\n')

OutFile.close()
File.close()

Afterwards the output file is quite amenable to Bulk Insert.

There are several choices to actually connect with SQL Server within python.  The options include the default odbc which comes as a standard library, the win32com client tools, mxODBC (commercial product) and pyODBC. While the odbc module will handle almost any requirement, the other options add additional features which can simplify development.

Pyodbc, for example, permits access of values by name and makes cursors iterable.  These changes from the default odbc package make it much easier to use.  As an example,  pyodbc can be used to connect to a SQL Server and then extract all e-mail addresses from a free form memo field.

import pyodbc as p
import re #RegEx library

server = 'TWISEMAN-PC'
database = 'test'
table = 'salesData'
field = 'memo'

connStr = ( r'DRIVER={SQL Server};SERVER=' +
            
server + ';DATABASE=' + database + ';' +
            
'Trusted_Connection=yes'    )
        
#e-mail regex examples available at
#http://www.regular-expressions.info/email.html
pattern = r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b'
emailMatcher = re.compile(pattern, re.I)
emails = [] #create a list

conn = p.connect(connStr)
dbCursor = conn.cursor()
sql = ('select ' + field + ' as memo '
  
    ' from '+ table+
        
' where ' + field + ' like \'%@%\")
dbCursor
= conn.cursor()
dbCursor.execute(sql)
for row in dbCursor:
    newEmails
= emailMatcher.finditer(row.memo)
    
if newEmails is not None:
        emails.extend
([addr.group() for addr in newEmails])
        
conn.close()
print emails

A similar script can scan an unstructured text document to extract e-mail addresses and insert them into a SQL table.  Of course, it is often more useful to be able to call stored procedures.  This can be done with pyodbc by sending the exec command for the procedure with all parameters through the execute method.  Many of the libraries automatically begin transactions, so those most be committed or rolled back in the code, though some such as pyodbc permit a connection to be opened in autocommit mode.

How to acquire Python

Python.Org (not Python.Com).

Python.Com is a pornographic site that is nothing to do with IT!  It has no affiliation with the PSF and to use a borrowed phrase is “Not Safe For Work.” I found this out the hard way when trying to go to the reference library without benefit of my collection of bookmarks.

Python is free and open source software maintained by the Python Software Foundation (PSF).  The latest version can be downloaded directly from the PSF at Python.ORG Python Programming Language -- Official Website

For many people that is not the best option though.  For one thing, depending on what libraries and compatibility with other older code is needed, it may be wise to use a slightly more established version than the  latest release.    Currently, both the 2.6 and 3.x lines are being maintained.  The 3.x line is the more modern version with many enhancements, but not all libraries have been brought up to date and it is not entirely backwards compatible with the 2.6 line.

Python(x,y), ActivePython or Enthought

I personally use Python(x,y). It is entirely a matter of taste though. Python(x,y) is entirely free and every component in it can be downloaded separately. ActivePython comes in both a free and Enterprise version. All the components of the free version can be downloaded separately. Enthought is a commercially supported distribution with a free trial period. Sage is also a fantastic distribution but it is available in its full version only on Linux.

Also, it may be easier to download a pre-packaged distribution such as Python(x,y), ActivePython, or the commercially supported Enthought Python Distribution. They come with many packages and libraries that are not part of the core Python Distribution as well as pre-configured development environments which are more sophisticated than IDLE.  Getting one of the distributions can save a lot of time downloading and installing individual components later.

Another option which is worth looking at is IronPython.  It is an implementation of Python that targets .Net using the Dynamic Language Runtime (DLR) and can make use of .NET libraries.  With its ability to take advantage of SQL Server Management Objects (SMO) it can be of particular interest to DBAs.  It is primarily written by Jim Hugunin and is mostly compatible with the reference version from the PSF. However, some Python extensions such as NumPy and SciPy will not work with it directly.  There are generally other ways of solving those problems using .Net libraries instead.  There are efforts to change this, such as the IronClad compatability layer project.

There are many options for Python IDE's and editors.  Personally, I tend to use PyDee when I want to work with Python interactively or write short, simple scripts.  It is lightweight and fast, but has some basic editing and introspection features that are convenient.  Spyder will be the next incarnation of PyDee and is currently in beta.  When I am writing a longer or more complicated script/program I tend to use Eclipse with Pydev.  It is loads slowly but has very sophisticated introspection and project management abilities.   It is largely a matter of taste.

How to learn and get help with Python

There are many excellent resources on learning python and getting help.  Once the absolute basics have been mastered, one of the best sources of information is the Doc strings for the entity being called.  Many IDE's provide user friendly access to the doc strings, but they can always be called by typing:

“print [object_name].__doc__” at an interactive prompt.

A few of the more useful resources for learning Python are:

Also, there are some more generalized programming resources with excellent Python sections, such as Stack Overflow.

Further Reading:



This article has been viewed 23985 times.
Timothy Wiseman

Author profile: Timothy Wiseman

Timothy A Wiseman is a Database Administrator for SAIC with a focus on efficiency and readability of the database and its surrounding ecosystem. In addition to administering the core SQL Server system, he has experience working with Python and Microsoft Access in conjunction with SQL Server. He holds a Bachelor's of Science in Mathematics as well as an MCDBA and MCITP.

Search for other articles by Timothy Wiseman

Rate this article:   Avg rating: from a total of 43 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Why python is my language of choice
Posted by: BuggyFunBunny (view profile)
Posted on: Wednesday, September 02, 2009 at 9:44 AM
Message: Python can be used from nearly every machine/OS combination on the planet, even the IBM z/series. That means, of course, that it will talk to nearly every database, not just SQLServer.

Some organizations may be pure M$, but not many, and fewer by the day. Perl was once the universal glue/script language, and before that shell in the *nix world. Python is now.

And a full-fledged development language, too. People do write applications with it. Just go to Google.

Subject: Agreed
Posted by: Peter (not signed in)
Posted on: Wednesday, September 02, 2009 at 12:23 PM
Message: I had to maintain a batch process that needed logging and the ability to process SQL Data into some rather odd formats or pull data out of some rather odd formats - things that DTS or SSIS didn't handle very well. In addition, I often needed to Compress/Decompress and transmit files as well. Add in some rather rapidly changing environments and Python was a great win for me. I could easily edit the script, debug it, not need to compile/deploy, and it did everything I needed.

The only major issue I found was trying to compress with the Python native commands. I eventually just shelled out to a compression program from within Python - much faster than trying to iterate through the file objects and compress them one by one, at least for me.

Great introductory article. I look forward to seeing more.

Subject: String formatting
Posted by: Anonymous (not signed in)
Posted on: Friday, September 04, 2009 at 12:46 AM
Message: A more pythonic way of preparing string is using the % string formatting operator. For example,
instead of
connStr = ( r'DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + database + ';' +'Trusted_Connection=yes')

use

connStr = r'DRIVER={SQL Server};SERVER=%s;DATABASE=%s;Trusted_Connection=yes' % (server, database)

Subject: Also have a look into IronPython
Posted by: Sarnu (not signed in)
Posted on: Friday, September 04, 2009 at 3:56 AM
Message: Beside Python, which we use for general loading/maintenance scripts, IronPython (the .Net incarnation of Python) can be very useful to the DBA.
From IronPython you can access the entire .Net Class Library and especially SMO and AMO Classes for manipulating DB-Metadata and SSAS-Objects.
We use this for quick and dirty mass creation of objects as well as for productive maintenance.

Subject: Replies
Posted by: timothyawiseman@gmail.com (view profile)
Posted on: Friday, September 04, 2009 at 1:17 PM
Message: Samu, I am a huge fan of IronPython personally and think it is probably one of the better choices for a SQL Server DBA. In this particular article I wanted to focus on the basics of python for a SQLServer DBA, so centered on the reference implementation.

And for the anonymous poster about the string formatting operator, I agree fully. However, for this piece I thought string concatentation would be more familiar to SQL Server DBA than the formatting options. In fact when dealing with pure strings I tend to use normal concatentation unless I specifically think about.

However, the formatting options can give you more control and permits type conversions if you are working with mixed data types instead of only strings. http://diveintopython.org/native_data_types/formatting_strings.html provides a good overview on this topic.

Subject: Python cookbook
Posted by: Sridhar Ratnakumar (not signed in)
Posted on: Friday, September 04, 2009 at 6:09 PM
Message: ActiveState hosts a cookbook site for Perl, Python and Tcl. The Python cookbook is especially the most popular category so much that O'Reilly's book "Python Cookbook" is based on this site.

Here's the link to the database category:

http://code.activestate.com/recipes/langs/python/tags/database/

Subject: Python vs PowerShell
Posted by: Curtis Koppang (not signed in)
Posted on: Tuesday, September 08, 2009 at 8:54 AM
Message: As a programmer I am drawn toward Python (IronPython) but MSFT is shipping so many PowerShell scripts with it's products it's hard to justify porting them all to Python.

Subject: Interesting
Posted by: Rowland (view profile)
Posted on: Friday, September 18, 2009 at 9:35 AM
Message: Thanks for a well written article. I'm not sure I'd bother with Python for many reasons but it's nice to see someone approach this topic differently.


Subject: Another approach.
Posted by: just3ws (view profile)
Posted on: Friday, September 18, 2009 at 10:35 AM
Message: Another approach is to use an ORM mapping tool for the Sql. I recently did pretty much the same thing as described in the article but used SqlAlchemy, a Python ORM to handle the data mapping for me. This way you don't have to worry about building the sql strings. http://www.sqlalchemy.org/

Subject: Python and PowerShell
Posted by: timothyawiseman@gmail.com (view profile)
Posted on: Monday, September 21, 2009 at 7:44 PM
Message: Curtis, PowerShell is another good option. Personally for development I prefer Python and IronPython, but there is certain no reason you cannot use both especially when the code for the PowerShell portion is already provided.

Subject: Personally I hate Powerhell.. er Powershell.
Posted by: Ed.Carden (view profile)
Posted on: Wednesday, September 23, 2009 at 11:09 AM
Message: Personally I hate Powershell (aka Powerhell). I have never reviewed Python till reading your article above and I already can follow the Python sample code you have provided.

Even with training and experience in C++, VB/VB.Net and some scripting like vb script, I still can't do the simplest of tasks with Powershell because the darn thing is so convoluted/complicated. And I have tried on more the one occassion to sit down and go thru the Powerhell tutorials/samples. I get the whol reason for wanting to provide in a single scripting option something that can do it all but it seems to me that when the scripting language in question is harder to learn & follow then full fledged programming langauges like C++ then that should tell you that maybe you've over done the thing.

Personally I hope Powerhell fades away like so moany other MS technologies that seem to go no where even after billions in hype.

 










Phil Factor
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL Server... Read more...



 View the blog
Converting String Data to XML and XML to String Data
 We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In... Read more...

SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

How to Import Data from HTML pages
 It turns out that there are plenty of ways to get data into SQL Server from websites, whether the data... Read more...

SQL Scripts Manager: An Appreciation
 SQL Scripts Manager is Simple-Talk's present to its readers. William Brewer was an enthusiastic... Read more...

Hosted Team Foundation Server 2010 Review
 Team Foundation Server (TFS) has expanded its remit to support the whole software development process,... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk