Anyone using R in SQL Server employs the procedure sp_execute_external_script, the first parameter of this being the language to use. The documentation rather obliquely says that “the script must be written in a supported and registered language“. Until recently, the only language was R, but now a second supported language, Python, has appeared.
Yes, at their recent ‘Data Amp’ conference, Microsoft announced that SQL Server 2017 will add Python as a supported language. From now on, any Python package will run natively in SQL Server. SQL Server R Services will be renamed as SQL Server Machine Learning Services since it will support both languages. This brings it more in line with Azure’s ‘Machine Learning’ product.
It marks quite a remarkable turnaround since, back in 2010, Microsoft officially handed over responsibility for IronPython (and IronRuby) “to the community”. Despite optimistic reports that IronPython was not dead just resting, it was in fact, dead.
It was a shame to lose a simple, dynamic scripting language with robust typing, and strong data-processing capabilities, ideal for seeking out anomalies, trends, and patterns in data. The ‘Iron’ languages pioneered the dynamic runtime inherited subsequently by PowerShell, but they fell between two stools. Although there were great advantages for .NET programmers, in being able to reuse their existing C# assemblies directly within IronPython, it also meant straying from the idea of a true “cross-platform” language, and from the Python standard. It meant putting up with a few things that worked differently in IronPython (such as garbage collection). Python programmers weren’t interested, and PowerShell’s subsequent emergence dented the demand for a Python-based .NET scripting language.
Now of course, Microsoft’s attitude is entirely different. Python in SQL Server makes perfect sense. It has become the natural scripting language for data science; the glue to create large-scale analysis processes. Its advanced predictive analytics, or machine learning, capabilities make it a “natural peer” to the R language, and means Microsoft can fully support data science applications that are written in Python. We’ll no longer need to export large volumes of SQL Server data, or using only sample data, for Python processing. We deploy the Python model right inside the T-SQL stored procedure.
So, that strange first parameter to sp_execute_external_script now makes better sense. Why stop at one extra language, though? What other language should be added to ‘Machine Learning Services’ that would benefit from fast access to large sets of data? Scala and Spark maybe? It would be great to know your thoughts.