Python is increasingly used by DBAs as a general-purpose scripting language, despite the pressure to adopt Microsoft's PowerShell. They find it easy to learn, powerful, and reasonably intuitive. Here Timothy Wiseman, a working DBA, explains the attraction of Python and gives a general introduction to the language, suitable for anyone who has been wondering whether to give it a try.
What is Python?
Python is a dynamic, interpreted language which was originally designed by Guido van Rossum with code readability as one of its primary goals. As part of that, one its most unusual features is the use of meaningful white space, with code blocks being shown and enforced through indentation. Although some of its extensions and libraries are specific to one operating system, the core language is available for all major operating systems and most Python scripts will run unaltered from one system to another.
Why would a SQL Server DBA be interested in Python?
The Microsoft T-SQL programming language that is built into SQL Server along with the capabilities of included tools such as SSIS and SSRS form a remarkably versatile platform without any third party tools whatsoever. In fact, T-SQL itself is Turing complete.
However, they are not the best tool for all tasks. For instance, SQL Server by itself does not have full support for regular expressions and while it is fully capable of interacting with the file system there are generally better tools for any complex manipulation of the operating system. Pure SQL is also not the strongest option for complex mathematics or complicated file munging.
Python is one tool capable of handling those tasks. It is certainly not the only option, with Microsoft C# and PowerShell also being excellent options for some of these tasks. Perl and Lisp are amongst many other options, each with their own strong points.
Python has several strong points which make it worthy of attention. It is free and open source software for one thing, but far more significantly it is easy to learn and easy to read even for people who are not Python programmers. It is capable of being used to write full scale applications and server software, but for most DBAs it is more significant that it is excellent for jotting off quick utility scripts to solve a specific problem. For instance, it is excellent for writing scripts to reorganize files into a new directory structure and automatically verify that all the automated backup files actually are where they are supposed to be each morning.
How Python can be used with SQL Server
Python can be used to parse data that is not in a standard format so it can be imported into SQL Server.
For instance, a file like this:
#Standardizes the number of commas per line prior to being read by
#programs such as BCP and required for compliance with RFC4180 for CSV files
#initialize the variables, change FileNames as appropriate
InFileName = r'C:\test\testimport.csv'
OutFileName = r'C:\test\testimport2.csv'
NumCommas = 0
File = open(InFileName)
for line in File:
if line.count(',') > NumCommas:
NumCommas = line.count(',')
#return to the start of the file
File.seek(0)
OutFile = open(OutFileName, 'w')
for line in File:
OutFile.write(line.rstrip() + ',' * (NumCommas - line.count(',')) + '\n')
OutFile.close()
File.close()
Afterwards the output file is quite amenable to Bulk Insert.
There are several choices to actually connect with SQL Server within python. The options include the default odbc which comes as a standard library, the win32com client tools, mxODBC (commercial product) and pyODBC. While the odbc module will handle almost any requirement, the other options add additional features which can simplify development.
Pyodbc, for example, permits access of values by name and makes cursors iterable. These changes from the default odbc package make it much easier to use. As an example, pyodbc can be used to connect to a SQL Server and then extract all e-mail addresses from a free form memo field.
import pyodbc as p
import re #RegEx library
server = 'TWISEMAN-PC'
database = 'test'
table = 'salesData'
field = 'memo'
connStr = ( r'DRIVER={SQL Server};SERVER=' +
server + ';DATABASE=' + database + ';' +
'Trusted_Connection=yes' )
#e-mail regex examples available at
#http://www.regular-expressions.info/email.html
pattern = r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b'
emailMatcher = re.compile(pattern, re.I)
emails = [] #create a list
conn = p.connect(connStr)
dbCursor = conn.cursor()
sql = ('select ' + field + ' as memo '
' from '+ table+
' where ' + field + ' like \'%@%\")
dbCursor = conn.cursor()
dbCursor.execute(sql)
for row in dbCursor:
newEmails = emailMatcher.finditer(row.memo)
if newEmails is not None:
emails.extend([addr.group() for addr in newEmails])
conn.close()
print emails
A similar script can scan an unstructured text document to extract e-mail addresses and insert them into a SQL table. Of course, it is often more useful to be able to call stored procedures. This can be done with pyodbc by sending the exec command for the procedure with all parameters through the execute method. Many of the libraries automatically begin transactions, so those most be committed or rolled back in the code, though some such as pyodbc permit a connection to be opened in autocommit mode.
How to acquire Python
Python.Org (not Python.Com).
Python.Com is a pornographic site that is nothing to do with IT! It has no affiliation with the PSF and to use a borrowed phrase is “Not Safe For Work.” I found this out the hard way when trying to go to the reference library without benefit of my collection of bookmarks.
Python is free and open source software maintained by the Python Software Foundation (PSF). The latest version can be downloaded directly from the PSF at Python.ORG Python Programming Language -- Official Website
For many people that is not the best option though. For one thing, depending on what libraries and compatibility with other older code is needed, it may be wise to use a slightly more established version than the latest release. Currently, both the 2.6 and 3.x lines are being maintained. The 3.x line is the more modern version with many enhancements, but not all libraries have been brought up to date and it is not entirely backwards compatible with the 2.6 line.
Python(x,y), ActivePython or Enthought
I personally use Python(x,y). It is entirely a matter of taste though. Python(x,y) is entirely free and every component in it can be downloaded separately. ActivePython comes in both a free and Enterprise version. All the components of the free version can be downloaded separately. Enthought is a commercially supported distribution with a free trial period. Sage is also a fantastic distribution but it is available in its full version only on Linux.
Also, it may be easier to download a pre-packaged distribution such as Python(x,y), ActivePython, or the commercially supported Enthought Python Distribution. They come with many packages and libraries that are not part of the core Python Distribution as well as pre-configured development environments which are more sophisticated than IDLE. Getting one of the distributions can save a lot of time downloading and installing individual components later.
Another option which is worth looking at is IronPython. It is an implementation of Python that targets .Net using the Dynamic Language Runtime (DLR) and can make use of .NET libraries. With its ability to take advantage of SQL Server Management Objects (SMO) it can be of particular interest to DBAs. It is primarily written by Jim Hugunin and is mostly compatible with the reference version from the PSF. However, some Python extensions such as NumPy and SciPy will not work with it directly. There are generally other ways of solving those problems using .Net libraries instead. There are efforts to change this, such as the IronClad compatability layer project.
There are many options for Python IDE's and editors. Personally, I tend to use PyDee when I want to work with Python interactively or write short, simple scripts. It is lightweight and fast, but has some basic editing and introspection features that are convenient. Spyder will be the next incarnation of PyDee and is currently in beta. When I am writing a longer or more complicated script/program I tend to use Eclipse with Pydev. It is loads slowly but has very sophisticated introspection and project management abilities. It is largely a matter of taste.
How to learn and get help with Python
There are many excellent resources on learning python and getting help. Once the absolute basics have been mastered, one of the best sources of information is the Doc strings for the entity being called. Many IDE's provide user friendly access to the doc strings, but they can always be called by typing:
“print [object_name].__doc__” at an interactive prompt.
A few of the more useful resources for learning Python are:
Also, there are some more generalized programming resources with excellent Python sections, such as Stack Overflow.
Further Reading: