StreamInsight is an important technology for tracking streams of data from several sources to spot significant trends and react to them. It now runs as an Azure service as well. Roger Jennings describes how to install it and try it out.
Introduction: Move Complex Event Processing to the Cloud
The SQL Server team released its StreamInsight v1.0 complex event processing (CEP) feature in April 2010 and followed with SteamInsight as a Windows Azure Service in a private CTP codenamed “Project Austin” in May 2011. Project Austin’s third CTP of August 2012 updates the service to the Windows Azure SDK v1.7 and StreamInsight v2.1.
This article will describe how to provision a “Project Austin” service on Windows Azure with Visual Studio 2010 or 2012 and the downloadable AustinCtpSample solution. A second article will show you how to test the service with the SampleApplication and EventSourceSimulator projects and use the graphical Event Flow Debugger.
Why Complex Event Processing (CEP)
Complex event processing (CEP) is a way of tracking streams of information that combines data from several sources in order to identify meaningful events and respond to them rapidly. It is a hot topic for finance, energy and manufacturing enterprises around the globe, as well as participants in the Internet of Things (IoT) and social computing. Instead of traditional SQL queries against historical data stored on disk, CEP delivers high-throughput, low-latency event-driven analytics from live data streams.
CEP’s most notorious application is high-frequency algorithmic trading (HFAT) on regulated financial exchanges and unregulated over-the-counter (OTC) swaps in what are called “dark pools.” The U.S. Congress’s Dodd-Frank Wall Street Reform and Consumer Protection Act attempts to prevent the HFAT excesses that contributed to tanking the U.S. economy in late 2008. Less controversial CEP uses include real-time utility usage monitoring and billing with SmartMeters for natural gas and electricity, as well as product and personal sentiment analysis inferred from social computing data, such as Twitter streams. Health care providers envision a future in which patients wear blood-pressure, heart-rate and other physiological monitors, which communicate over the Internet to remote CEP apps that report running averages and issue alerts for health-threatening excursions. The largest potential CEP market appears to be real-time reporting of data from environmental sensors to discover actual and potential pollution sources and other threats to the earth and its atmosphere.
Microsoft’s CEP: SteamInsight
Major providers of relational database management systems (RDBMSs) are the primary sources of CEP apps. Microsoft’s Server & Tools Business group released v1 of its SteamInsight CEP feature, together with SQL Server 2008 R2, to manufacturing in May 2011. StreamInsight competes with IBM InfoSphere Streams, Oracle Event Processing (OEP), and SAP Sybase Event Stream Processor/Aleri RAP, as well as offerings from independent CEP specialists, such as StreamBase. Figure 1 represents on-premises StreamInsight architecture.
StreamInsight Services for Windows Azure, codenamed ‘Project Austin’
Recognizing that many enterprises won’t want to make large investments in on-premises hardware for CEP, the SQL Server team introduced a private Community Technical Preview (CTP) of StreamInsight Services for Windows Azure, codenamed “Project Austin,” in May 2011. CTP updates in February, June and August 2012 kept pace with new on-premises StreamInsight versions and Windows Azure SDK upgrades. This article covers the August 2012 update for the Windows Azure SDK v1.7.
StreamInsight Processing Taxonomy
CEP applications have two primary orientations: computing aggregates and detecting event patterns. Microsoft describes the querying concepts that SteamInsight supports as follows:
Given an input event in the data flow, projections perform calculations over the event fields or compose new event types based on the field values. With StreamInsight, calculations are represented by .NET expressions and new event shapes are defined by .NET types.
Given an input event in the data flow, filters check conditions over one or more of the event fields. The filter propagates the event to the output stream only if the filter conditions are satisfied. The event is passed on if the filter conditions are satisfied. With StreamInsight, filter conditions are defined as .NET expressions.
Grouping partitions the incoming data flow into groups. Groups then are processed separately so that individual results can be computed on a per-group basis. Given an input event from the data flow, the grouping applies the partitioning function to the event and then routes the event to its group for further processing.
Given a set of input events, aggregations compute aggregate functions over the events. StreamInsight supports Sum, Avg, Count, Min and Max as aggregation functions.
Given input events from two data flows, the join operation matches events from one flow with corresponding events from the other. In temporal systems like StreamInsight, the join operation evaluates two conditions: (1) the traditional join condition over the fields of the events, and (2) an overlap check over the timestamps. If both conditions hold, the events are matched and output. With StreamInsight, only the first condition is defined by the user in a .NET expression. The second condition is always implicitly added by the system.
Streaming data is time-based by definition. These are Microsoft’s descriptions of StreamInsight’s types of time-based windows:
Time-driven windows progress based on a schedule defined in the query. There are two types of time-driven windows:
- Hopping: The hopping window accumulates events over a fixed period of time. Once all events have been received over that period of time, the events are passed on for further processing as a set. Hopping windows "hop" forward in time by a fixed period. The window is defined by two time spans: the hop size H and the window size S. For every H time units, a new window of size S is created.
- Tumbling: Tumbling windows are a special case of hopping windows where the window instances are adjacent to each other on the timeline.
Event-driven windows produce output if there is activity in the input. Event-driven windows such as the snapshot window typically rely again on a window size and, upon activity, return the set of events that overlap with the window.
Given a count parameter n, the count-driven windows in StreamInsight return event sequences of length n.
Getting On Board and Setting up the “Project Austin” CTP
Access to “Project Austin” CTP deliverables—installers for client and local provider libraries, an Austin August CTP Sample C# project and documentation—requires membership in Microsoft Connect’s StreamInsight Customer Advisory Group. If you’re a member, you’ll see the download page here after you sign in. Otherwise send a request for admission to firstname.lastname@example.org. You’ll be notified by email when you’re admitted.
To set up prerequisites for the AustinCtpSample.sln solution, follow these steps:
- Download and install Visual Web Developer 2010 Express or Visual Studio Express 2012 for Web (or higher.)
- Download and install the Windows Azure SDK for .NET – June 2012 SP1 (v1.7, see the recommended Web Platform installer method in the Overview section.)
- If you don’t have a Windows Azure subscription, sign up for a free 90-day trial here. (You’ll need a Microsoft Account and credit card for identification, but you won’t be charge for Azure services unless you give explicit permission.)
- Go to the Windows Azure Management Portal and log in with the Microsoft Account you used for the subscription to open the “It looks like you’re new” page, click the Create an Item button to open the New form, select Data Services, Storage, Quick Create, type a unique DNS prefix in the URL text box (oakaustin for this example), select the closest data center location in the Region/Affinity Group Text box, and clear the Enable Geo-Replication check box to reduce usage as shown in Figure 2:
- Select Storage in the portal’s navigation pane and click the name of the storage account you created in step 4 to open its dashboard. Scroll down and copy the Subscription ID value and the bottom right of the page (see Figure 3.)
- In the portal, Click New, Compute, Cloud Service and Quick Create (see Figure 4) to open the Create a Cloud Service dialog. (Cloud Service is the new name for Hosted Service.)
- Complete the Create a Cloud Service dialog by specifying a DNS prefix for the services URL, oakaustinfor this example, selecting the same region as your storage account, and the subscription (see Figure 5.)
- Activate the Windows Azure StreamInsight service for your subscription by typing your e-mail alias in the online signup form, pasting the Subscription Id, marking the Terms of Service check box and clicking its Submit button. Follow the instructions you receive to gain access to Microsoft Connect’s Stream Insight Advisory Group.
- Download and extract files from the AustinAugustCTP.zip file to a folder on your development machine.
- Follow the instructions in the Getting Started with Austin.docx file’s “Management Certificate” section to create management certificate and private key files. This is done by executing this command at the Visual Studio Developer Command prompt…
makecert -r -pe -a sha1 -n "CN=Windows Azure Authentication Certificate" -ss My -len 2048 -sp "Microsoft Enhanced RSA and AES Cryptographic Provider" -sy 24 "testcert.cer" -sv "testcert.pvk…and typing and confirming a password.
- Upload testcertificate.cer as a management certificate by clicking ‘Settings’ in the portal’s navigation pane, clicking ‘Upload’, browsing to the *.cerfile you created in step 8, and selecting your subscription in the list (see Figure 6.) The instructions in the *.docxfile are for the old portal version.
- Create a Personal Information Exchange (*.pfx) certificate in the same folder for provisioning the Windows Azure StreamInsight instance by executing this command at the Visual Studio Developer Command prompt: pvk2pfx -pvk "testcert.pvk" -spc "testcert.cer" -pfx "testcert.pfx" -pi mypassword.
- Run the StreamInsightClient.msi version for your machine to add a Microsoft StreamInsight 2.1 node to your All Programs menu and enable connecting to the StreamInsight Cloud Service using the EventFlowDebugger item. The menu also has Stream Insight Documentation (MSDN) and a link to download StreamInsight Sample projects from CodePlex. (These samples are for on-premises, not cloud instances.)
- Optionally, run the StreamInsight.msi version for your machine to enable testing the application with a local StreamInsight instance: See the Getting Started with Austin.docx file’s “StreamInsight Server” section for details.
- Obtain the Storage Account Access key from the portal by selecting ‘Storage and your storage account’, and clicking the ‘Manage Keys’ button to open the ’Manage Access Keys’ dialog. Select the Primary Access Key and copy it to the Clipboard.
Provisioning the StreamInsight Service in Windows Azure
To deploy a StreamInsight instance with one compute and two event ingress roles as a Windows Azure service, do the following:
- Start Visual Studio as an Administrator, open the extracted AustinCtpSample.sln solution, expand the Service Provisioning node, and double-click the app.config item to display its content in the editor window. Replace the Storage Account Key with the value copied in the preceding section’s step 15 and replace defaults for the other user settings as shown in Figure 8.
- Right click the ServiceProvisioning project in Solution Explorer, choose Set as Startup Project, press F5 to build and run the solution and open a console to display provisioning progress (see Figure 9).
- In the portal, select Cloud Services and your service name, oakaustin for this example. When provisioning is complete, which takes from about 5 to 10 minutes, the fully-scrolled page appears as shown in Figure 10.
Charges accrue only for Windows Azure compute and storage resources; there is no charge for the StreamInsight service itself when running CTP versions.
Warning: A Cloud Service with 12 CPU cores consumes 12 small compute units at US$0.12 (or the equivalent in other currencies) per clock hour. To prevent consuming US$1.44/hour of resources during off-hours, you must delete the service by configuring and running the ServiceDeletion project, as described in the next section.
Deleting a Service Instance Deployment
If you exceed your monthly free quota of 750 Cloud Service small instance (core) hours, your 90-day trial subscription will become inactive until the start of your next monthly billing period. If you’ve authorized billing for the trial subscription or use an existing paid subscription you’ll be billed US$0.12 per hour (or the equivalent in other currencies) for each CPU core deployed, regardless of whether it’s running. Therefore, you should delete and re-provision the service deployment when you aren’t actively using it.
Warning: Deleting the Cloud Service in the Windows Azure Management Portal doesn’t work in this CTP.
To configure and run the ServiceDeletion project, do the following:
- In Solution Explorer, expand the ServiceDeletion node and double-click the App.config item to open the file in the editor window.
- Right-click the ServiceDeletion node and choose Set as Startup Project.
- Add your Cloud Service name, oakaustin for this example, as the value of the HostedServiceName key, the path to the *.pfx certificate as the value of theServiceManagementCertificateFilePath key and the private key password as the value of the ServiceManagementCertificatePassword key (see Figure 11.)
- Press F5 to build and run the ServiceDeletion project, which displays a console window (see Figure 12.)
- Press Enter to close the console, open or refresh the Window Azure Management portal, open the Cloud Service and verify that a “You have nothing deployed to the production environment” message is present.
In part 2, I’ll be showing you how to test the service with the SampleApplication andEventSourceSimulator projects and use the graphical Event Flow Debugger.