Microsoft introduced the Cortana Analytics Suite (CAS) in July 2015, at the Worldwide Partner Conference in Orlando. Since then, I’ve struggled to find anything in the way of concrete information. I decided it was time to learn more.
When Microsoft first announced CAS, it touted the suite as an integrated set of cloud-based services that vaguely promised to be “a huge differentiator for any business.” The suite would be available through a simple monthly subscription and be customizable to fit the needs of different organizations. The company planned to make CAS available that coming fall.
Two months later, Microsoft hosted the first-ever Cortana Analytics Workshop, a gathering of techies that would provide participants with a chance to learn about Microsoft’s advanced analytics vision. The workshop appeared to represent the suite’s official launch.
At some point during the build-up, Microsoft also set up a slick new website dedicated to the CAS vision ( https://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/). The website featured rolling graphics with stylized icons, and large bold headlines that emphasized the suite’s imminent importance. Cortana Analytics, it would seem, had officially arrived.
Making sense of Cortana Analytics
When setting out to understand what CAS was all about, I headed straight to the CAS landing page in hopes of finding definitive information about the suite’s true nature. One of the first statements I encountered was a bewildering description of CAS as a “fully managed big data and advanced analytics suite that enables you to transform your data into intelligent action.”
As tidy as this statement sounded, it provided little in the way of specifics, and the assertions I found on the rest of the landing page grew even more vague:
- “Take action ahead of your competitors by going beyond looking in the rear-view mirror to predicting what’s next.”
- “Get closer to your customers. Infer their needs through their interaction with natural user interfaces.”
- “Get things done with Cortana in more helpful, proactive, and natural ways.”
Although Microsoft’s marketing mavens might have smiled in contentment at a job well done, I still had no idea what CAS actually was from the technical perspective, or how it fitted into the larger context of Microsoft services, particularly all those Azure analytic offerings.
So I started checking out the links in the “See it in action” section at the bottom of the landing page. These led me to case studies about how companies such as Pier 1 Imports and Rockwell Automation used Azure services to build customized analytic solutions. Although the studies were quick to extol Azure’s virtues, they said nothing about Cortana Analytics.
I thought I was just having a run of bad luck in getting to the technical ‘beef in the sandwich’, so I dug and dug, having to wade through one sales pitch after the next, along with an assortment of heartwarming phrases about enabling and orchestrating and transforming. Finally, I landed on some useful information, in the form of the following table, which I proudly pulled off the CAS site.
It turns out that Cortana Analytics is made up mostly of Azure services, many of which existed before the CAS branding came along. In addition, the suite pulls Power BI and the Cortana personal assistant into the mix. What is also included, but mentioned only indirectly within the “Perceptual intelligence” category, is the Cortana Analytics Gallery, what was once referred to as the Azure Machine Learning Gallery.
As grateful as I was for finding the table, its discovery also led me to a number of other questions. What exactly is CAS? Is there something going on in the backend that sets CAS apart from other Azure services? Is CAS simply a label-or brand-being imposed on a set of services that continue to operate independently of that label? After all, I can subscribe to Azure Machine Learning and Azure SQL Data Warehouse, as well as use Power BI and the Cortana personal assistant, without even encountering the Cortana Analytics moniker. What exactly sets CAS apart?
Because I could find so few answers, I turned to the resources that I dreaded the most, videos. Why do I dread them? In my experience in IT, such material, in general, often includes only two minutes of useful ‘hard’ information for every 20 minutes of marketing ‘positioning’, along with some really bad music. That’s not true of all videos and webcasts, of course, but you never know when and where you’ll find the gems until after you’ve dug though a lot of rubble.
When it came to the numerous videos and webcasts available through the CAS site, it was business as usual, and I found myself wasting a fair amount of time. To make matters worse, I had to register repeatedly to view much of this material, which meant filling out forms or parts of forms and waiting for surprisingly slow emails to reward me with the actual links.
Despite these hurdles, I persevered and forced myself to sit through several of the videos and webcasts and was eventually able to glean bits of useful information, at least enough to give me a more general idea of how the pieces might fit together: but it was hard work.
One issue I have yet to resolve is related to how the “one simple monthly subscription” is supposed to work. I found nothing on the CAS site to point me to more specifics about this pricing model, nor did I find anything in Azure’s pricing information about CAS. One of the workshop videos mentions a CAS-specific SKU, but provides few details about what that means. This and another video also indicate that customers can continue to use individual Azure services like they’ve been doing all along. In this sense, according to one video, they’re already using Cortana Analytics.
I did take one other step to try to get answers to my Cortana Analytics questions. I clicked the “Contact us now” link and filled out yet another form that allowed me to submit CAS-related questions to Microsoft. I have yet to receive an answer.
Despite all my digging, I still cannot say with any certainly what CAS is, other than being a name for a set of individual services that are designed to work together, but are available independently. After I got past the marketing and sales pitches and obfuscated language, all I found was a list of Azure services, with Power BI and the Cortana personal assistant added to the mix. I do not know if in the end you get anything extra with CAS outside of the Cortana Analytics brand, but I do know that the Azure services play an integral role, so let’s take a closer look at them.
The Azure side of CAS
When it comes to the individual Azure services, we can often find more concrete information than we can with Cortana Analytics. That’s not to say we won’t run into the same type of marketing clutter, but we can usually find details that are a bit more specific (even if it means going outside of Microsoft). What we don’t find are many references to Cortana Analytics, although that doesn’t prevent us from building the types of solutions that the CAS marketing material likes to show off.
The first of the CAS-related services have to do with storing and processing large sets of data:
- Azure Data Warehouse : A database service that can distribute workloads across multiple compute nodes in order to process large volumes of relational and non-relational data. The service uses Microsoft’s massive parallel processing (MPP) architecture, along with advanced query optimizers, making it possible to scale out and parallelize complex SQL queries.
- Azure Data Lake Store: A scalable storage repository for data of any size, type, or ingestion speed, regardless of where it originates. The repository uses a Hadoop file system to support compatibility with the Hadoop Distributed File System (HDFS) and offers unlimited storage without restricting file sizes or data volumes.
Azure Data Lake Store is actually part of a larger unit that Microsoft refers to as Azure Data Lake. Not only does it include Data Lake Store, but also Data Lake Analytics and HDInsight, both of which share the CAS label. You can find additional information about the Data Lake services in the Simple-Talk article Azure Data Lake.
The next category of services that fall under the CAS umbrella focus on data management:
- Azure Data Factory : A data integration service that uses data flow pipelines to manage and automate the movement and transformation of data. Data Factory orchestrates other services, making it possible to ingest data from on-premises and cloud-based sources, and then transform, analyze, and publish the data. Users can monitor the pipelines from a single unified view.
- Azure Data Catalog : A system for registering enterprise data sources, understanding the data in those source, and consuming the data. The data remains in its location, but the metadata is copied to the catalog, where it is indexed for easy discovery. In addition, data professionals can contribute their knowledge in order to enrich the source metadata.
- Azure Event Hubs : An event processing service that can ingest millions of events per second and make them available for storage and analysis. The service can log events in near real time and accept data from a wide range of sources. Event Hubs uses technologies that support low latency and high availability, while providing flexible throttling, authentication, and scalability.
For more information about Event Hubs, refer to the Simple-Talk article Azure Event Hubs. In the meantime, here’s a quick overview of the analytic components included in the CAS package:
- Azure Machine Learning : A service for building, deploying, and sharing predictive analytic solutions. The service runs predictive models that learn from existing data, making it possible to forecast future behavior and trends. Machine Learning also provides the tools necessary for testing and managing the models as well as deploying them as web services.
- Azure Data Lake Analytics : A distributed service for analyzing data of any size, including what is in Data Lake Store. Data Lake Analytics is built on Apache YARN, an application management framework for processing data in Hadoop clusters. Data Lake Analytics also supports U-SQL, a new language that Microsoft developed for writing scalable, distributed queries that analyze data.
- Azure HDInsight : A fully managed Hadoop cluster service that supports a wide range of analytic engines, including Spark, Storm, and HBase. Microsoft has updated the service to take advantage of Data Lake Store and to maximize security, scalability, and throughput.
- Azure Stream Analytics : A service that supports complex event processing over streaming data. Stream Analytics can handle millions of events per second from a variety of sources, while correlating them across multiple streams. It can also ingest events in real-time, whether from one data stream or multiple streams.
I’ve already mentioned how Data Lake Analytics and HDInsight are part of Azure Data Lake, and I’ve pointed you to a related article. If you want to learn more about Stream Analytics, check out the Simple-Talk article Microsoft Azure Stream Analytics.
Of course, you can also refer to the Azure site for more details about each of the services, though I can’t guarantee that the information will always be the most useful. Then again, why should I be the only one having fun?
Cortana Analytics Gallery
Another interesting component of the CAS package is the Cortana Analytics Gallery, formerly the Azure Machine Learning Gallery. The gallery provides an online environment for data scientists and developers to share their solutions, particularly those related to machine learning. Microsoft also publishes its own solutions to the site for participants to consume.
The Cortana Analytics Gallery is divided into the following six sections.
- Solution Templates : Templates based on industry-specific partner solutions. Currently, the category includes only the Vehicle Telemetry Analytics solution, published by Microsoft this past December. The solution demonstrates how those in the automobile industry can gain real-time and predictive insights into vehicle health and driving habits.
- Experiments : Predictive analytic experiments contributed by Microsoft and those in the data science community. The experiments demonstrate advanced machine learning techniques and can be used as a starting point for developing your own solutions. For example, the Telco Customer Churn experiment uses classification algorithms to predict whether a customer will churn.
- Machine Learning APIs : APIs that can access operationalized predictive analytic solutions. Some of the APIs are reference within the “Perceptual intelligence” section listed in the table above. For example, the Face APIs were published by Microsoft and are part of Microsoft Project Oxford. They provide state-of-the-art algorithms for processing face images.
- Notebooks : A collection of Jupyter notebooks. The notebooks are integrated within Machine Learning Studio and serve as web applications for running code, visualizing data, and trying out ideas. For example, the notebook Topic Discovery in Twitter Tweets demonstrates how a Jupyter notebook can be used for mining Twitter text.
- Tutorials : Tutorials on how to use Cortana Analytics to solve real-world problems. For example, the iPhone app for RRS tutorial describes how to create an iOS app that can consume an Azure ML RRS API using the Xamarin development software that ships with Visual Studio.
- Collections : A site for grouping together experiments, templates, APIs, or other items within the Cortana Analytics Gallery.
Although Microsoft has changed the name of the gallery to make it more CAS-friendly, much of the content still focuses on the Machine Learning service. Even so, the gallery could prove to be a valuable resource for organizations jumping aboard the CAS train, particularly once the gallery has gained more momentum.
Power BI and the Cortana Personal Assistant
And now we come to the non-Azure side of the CAS equation: Power BI and the Cortana personal assistant.
Power BI is a cloud-based service and set of tools that provide self-service business intelligence (BI) capabilities to knowledge workers, business analysts, and anyone else who needs to gain quick insight into data. With Power BI, users can create reports that contain a variety of rich visualizations, post those visualizations to their dashboards, share the information with other users, and access the dashboards and reports from their mobile devices, via the Power Bi apps.
Power Bi also provides Power BI Desktop, a free application for defining more comprehensive visualizations and reports that can then be published to the Power BI site. For information about Power BI Desktop, see the Simple-Talk article Working with SQL Server data in Power BI Desktop.
Microsoft has put a lot of time and energy into Power BI and it shows. It is a comprehensive service that keeps getting better every day. Currently, Power BI comes in two subscription models: the basic free version and the Pro version. Last I looked, the Pro version was running about US $10 per user per month.
Although the CAS website indicates that Power BI is included as part of the suite, the site does not specify which subscription model applies. Of course, if you’re putting together your own analytics solution, you can choose which model works best, just like you can pick which Azure services to include. However, if such a Cortana Analytics SKU does come into existence (or is lurking out there unbeknown to me), perhaps we’ll get a better sense of what’s included before 2017 rolls around.
The other non-Azure component is the Cortana personal assistant, which gained much notoriety with the release of Windows 10. Cortana is a virtual assistant that provides advanced search capabilities and services integration within the Windows OS sphere. It can respond to both text and voice input, and can provide answers that are personalized for the specific user.
From what I could uncover in the videos and webcasts, it appears that Microsoft is integrating Cortana with Power BI, although it is not clear exactly how far along Microsoft has gotten with this project or the degree of that integration. Power BI already supports the ability to type questions directly from within the dashboard, but the Cortana integration appears to take this a step further. Does this mean that users will be able to retrieve Power BI data without being connected to the dashboard? That too is not clear.
It will be interesting to see what happens with Cortana and whether that integration will be tied exclusively to the mysterious CAS package. Microsoft must have deemed Cortana important enough to have made it part of the suite’s name. With such billing, I would expect the Cortana integration to be something quite extraordinary, once it has been fully realized, much more than simply enhancing our ability to talk into our computers or phones.
The CAS mystery
Cortana Analytics is still a relatively new concept, so I would expect a few rough edges, and I have to applaud Microsoft for its sweeping attempt to bring big data under control. The Azure service model has a lot going for it, and for many organizations, managed services are the only credible option for understanding their own data. Microsoft has also done a good job of keeping the Azure pricing model simple and comprehensible, without throwing too many surprises our way.
I’m still not sure what Microsoft aims to do with Cortana Analytics. In principle, CAS makes sense, pulling together an integrated suite of services for building analytic solutions. But I need more specifics upfront, not just about CAS, but about any new product or service. I want to know what it does, how it works, and how it is different from other products and services, without having to jump through infinite Google or Bing hoops or sit through countless videos that repeat the same marketing talking points over and over.