Python in SQL Server

Anyone using R in SQL Server employs the procedure sp_execute_external_script, the first parameter of this being the language to use. The documentation rather obliquely says that “the script must be written in a supported and registered language“. Until recently, the only language was R, but now a second supported language, Python, has appeared.

Yes, at their recent ‘Data Amp’ conference, Microsoft announced that SQL Server 2017 will add Python as a supported language. From now on, any Python package will run natively in SQL Server. SQL Server R Services will be renamed as SQL Server Machine Learning Services since it will support both languages. This brings it more in line with Azure’s ‘Machine Learning’ product.

It marks quite a remarkable turnaround since, back in 2010, Microsoft officially handed over responsibility for IronPython (and IronRuby) “to the community”. Despite optimistic reports that IronPython was not dead just resting, it was in fact, dead.

It was a shame to lose a simple, dynamic scripting language with robust typing, and strong data-processing capabilities, ideal for seeking out anomalies, trends, and patterns in data. The ‘Iron’ languages pioneered the dynamic runtime inherited subsequently by PowerShell, but they fell between two stools. Although there were great advantages for .NET programmers, in being able to reuse their existing C# assemblies directly within IronPython, it also meant straying from the idea of a true “cross-platform” language, and from the Python standard. It meant putting up with a few things that worked differently in IronPython (such as garbage collection). Python programmers weren’t interested, and PowerShell’s subsequent emergence dented the demand for a Python-based .NET scripting language.

Now of course, Microsoft’s attitude is entirely different. Python in SQL Server makes perfect sense. It has become the natural scripting language for data science; the glue to create large-scale analysis processes. Its advanced predictive analytics, or machine learning, capabilities make it a “natural peer” to the R language, and means Microsoft can fully support data science applications that are written in Python. We’ll no longer need to export large volumes of SQL Server data, or using only sample data, for Python processing. We deploy the Python model right inside the T-SQL stored procedure.

So, that strange first parameter to sp_execute_external_script now makes better sense. Why stop at one extra language, though? What other language should be added to ‘Machine Learning Services’ that would benefit from fast access to large sets of data? Scala and Spark maybe? It would be great to know your thoughts.

Commentary Competition

Enjoyed the topic? Have a relevant anecdote? Disagree with the author? Leave your two cents on this post in the comments below, and our favourite response will win a $50 Amazon gift card. The competition closes two weeks from the date of publication, and the winner will be announced in the next Simple Talk newsletter.

  • 4275 views

  • Rate
    [Total: 9    Average: 4.7/5]
  • Peter Schott

    Well, obviously LISP should be the next language to support. My parens keys need a good workout. 🙂

    Seriously, though, I’m glad that we can use other scripting languages to put them close to the data, but need to do some reading to see how we’d set up modules and such w/ Python to make them usable. I haven’t really followed the big data languages as much as I should have so don’t know about the next language. It would be interesting to see PowerShell support, though.

  • Seems SQL 2017 will be able to compete with statistical analysis tools like SAS with the introduction of Python and R into the platform. Excited to play with it!

  • Dave Poole

    I’m not sure about Scala. It’s one of those languages that is incredibly powerful but some of its advantages over Java have been neutralised by Java 8 and with Java 9 around the corner it appears to be having a problem with adoption.
    If you really wanted to throw the cat amongst the pigeons you could implement Java. The prospect of Oracle’s lawyers scrambling to find ways to apply Oracle licencing charges to SQL Server doesn’t bear thinking about.

    I am discovering a love of awk and other Linux command line utilities.

    I’m also beginning to think that a language designed for ETL would be beneficial. GUI based ETL is fine but sometimes you need something that fits with a test framework and deploys as code

  • robwestwood

    APL, definitely. Will add to the mystique…

  • Keith Rowley

    The big question of course is which one of these languages I should learn since I don’t know either of them. 😏

  • Nico Jacobs

    Maybe if Julia becomes more mature it could make up a good candidate. Not as popular as the two other, but apparently with nice performance… and SQL people love performance 🙂

    • Gina Taylor

      Hi Nico, congratulations on winning this week’s commentary competition! Please could you drop an email to newsletter@simple-talk.com so that we can send your Amazon gift card.

    • Gina Taylor

      Hi @disqus_D6jJlgeLp1:disqus , congratulations on winning last week’s commentary competition! Please could you drop an email to newsletter@simple-talk.com so that we can send your Amazon gift card.

  • jeffrey.yao

    I would say C#, which currently with Roslyn, it IS a script language. With C# script, it is not only about data science, machine learning but also about innovation.

  • Sarath Chandra

    Well i got what i wanted to know about.thanks for providing this info.
    sql dba training in hyderabad