Frequently when a new piece of tech that I’m excited about is launched, total nerd that I am, I’ll start quoting Colin Clive in, still the best, James Wale’s Frankenstein. It’s ALIVE! ALIVE!
Well, time to get excited. On Monday, July 11, Azure SQL Data Warehouse moves from being in preview on Azure to a full-fledged service offering (insert my Colin Clive imitation here). I get it. You’re wondering, why on earth would I be this jazzed about a Data Warehouse? I mean, I write books on query tuning, not SSIS. I spend time figuring out how best to deploy databases, not design star schemas. You’re right to a degree. Depending on how you look at it, I am a step removed from worrying about the Warehouse. However, this is much more than a warehouse, and the introduction of this technology on Azure has more impact than you immediately realize. First up, why this technology is so exciting.
While the tech in Azure SQL Data Warehouse will absolutely lend itself to putting a traditional warehouse into the cloud, that’s not the most exciting part of the tech. The exciting part of the tech is parallel processing. Have you ever tried to shard your data across multiple servers so that you’re bringing multiple machines, including the CPU, I/O, memory, all independently managed, yet treated as a single entity? Was it fun? My answer, well… yeah, it was kind of fun, but it was a ridiculous amount of work, which I got wrong a bunch, and it was a house-of-cards in terms of maintenance. No more. Microsoft introduced this same technology within the Analytics Platform System (APS). The name is significant. The original name was Parallel Data Warehouse (PDW). When Microsoft renamed it, I attributed it to marketing just changing stuff around, again, so it’s fresh. Instead, I believe it was clarifying what this technology does. It’s not simply about an efficient way to store and retrieve a lot of data. It’s about bringing parallel processing to the storage of data, the retrieval of data, and, probably most importantly, the processing of data. Creating aggregates using a few CPUs, cool. Creating aggregates using multiple machines simultaneously (insert Colin Clive again). This technology is much more about data manipulation in service to your analytics system than it is about simply storage and retrieval. The APS name makes sense. Now, all that power is available on the cloud. Which, brings me to my next point.
APS was an appliance. It was a very expensive appliance. There were an exceedingly small number of people using it around the globe because of this. Heck, you couldn’t even learn it without buying it. While many of us got excited about the capabilities offered, only a handful of us were able to learn the tech. Enter Azure and the Azure SQL Data Warehouse. It’s not free, don’t get me wrong, but it’s affordable. You can now get access to this technology to learn it, without any outlay at all (you are using your free credits from MSDN, right). You can experiment and see if, being able to throw parallel machine level processing at your data is going to help. Don’t get stuck on the concept of a warehouse either. Think processing first. Could you load massive amounts of data into this (quickly too if you use the right methods to take advantage of parallel processing, time to learn more than SSIS) and then process that data into a different form that you then load into other, cheaper mechanisms for longer term storage and querying? Yes you can (let’s just go with CC for each of these moments going forward). With the availability of this technology within Azure, Microsoft has radically democratized the ability to bring parallel processing at industrial scale so that medium and small shops can take advantage of it, if they need it. It’s not simply how cool the tech is that has me excited. It’s the fact that I can get my hands on it, and that I can readily suggest that you can get your hands on it too. That’s what makes it thrilling.
I recognize that I can be a little too Polyanna about things. Let’s be fair and point out that, like CCs little creation, there might be a few glitches to work out (Abby Someone). The language is different. You’re going to have to learn D-SQL. Foreign keys? Not really. Which disturbs me. You can absolutely design your storage mechanisms incorrectly in multiple incorrect ways. Messing up the design can force you to purchase a higher service tier in an attempt to fix it, driving up costs unnecessarily. In short, I’m not arguing that this is some level of perfection. It is a giant leap forward in tech, but an even more giant leap forward in tech availability (cc).
While many of us have been using this technology quite successfully while it was in preview, the kinks have largely been worked out, and it’s moving into release. Explore this new service offering. Add this tool to your tool belt. Get ready to have your own CC moment when you see what’s possible with Azure SQL Data Warehouse and true parallel processing of your data.