Why has Microsoft acquired Revolution Analytics, the company who provide open source distributions of R, alongside commercial “Enterprise” extensions for big data infrastructures?
R is a programming language and platform for data manipulation, time series analysis, statistical modelling and graphics. It provides powerful statistical methods that can, for example, explore the relationships between the many variables that can affect a certain outcome, such as the decision to buy or not to buy a product, or whether a candidate will win or lose an election. In addition, there are thousands of downloadable R libraries that make available all sorts of statistical analysis and predictive modelling methods, including the likes of Lattice, for ‘multivariate data visualization’, and GoogleVis, an R interface into the Google Charts API. As demonstrated by Sergei Dumnov in a recent series of Simple-talk articles these represent powerful magic for exploring data.
So what does it mean for Microsoft? Judging from what’s been said so far, R will find growing use for advanced analytics on big data platforms such as Hadoop, as well as on Microsoft Azure. R is already baked into Microsoft’s Azure Machine Learning (ML) tool.
It would seem that Data Scientists already like to use SQL Server. As quoted in a recent survey on the Revolution Analytics website, regarding the software used by data scientists, “only SQL was rated higher than R“. At present, they like to combine the analytic functions available through SQL with the statistical and graphical powers of R.
Currently, you can pass data between R and SQL Server but it is messy, involving xp_cmdshell and OLE Automation procedures. Given the size of data sets that are now being used, this is a problem that needs to be fixed.
So much for the Data Scientists, but how would R affect the average SQL Server developer? It isn’t just the regression analysis that puts the glint in the eye of the developers. R is also a data visualization platform, offering a wide range of graphical tools and techniques from simple bar charts to complex “3D surfaces”, as well as impressive libraries, such as ggplot2, based on the “grammar of graphics” tool.
Is R a natural successor to Reporting Services (SSRS), which despite mild contrary claims by Microsoft, is thought by many to be in its ‘sunset years’? Already, Timo Klimmer from Microsoft has made available a Code R Graphics Device, which can be used as a Custom Report Item in SSRS to build ggplot2 data visualizations. By bringing R’s powerful data analytics to a huge community of SSRS users, Microsoft could more easily achieve their stated goal of “reducing the analytics skills gap inside their customer’s organizations”.