David DeWitt and his team at Microsoft have been exploring the 'next frontier' of architectures for building the parallel and scalable database systems that will be needed to support the "petabyte" data warehouse. The way forward is the "share nothing" grid architecture, which will underpin the likes of Madison, and will offer Commodity SMPs connected with commodity interconnect, and almost "limitless" scalability.
Ever since Oracle introduced RAC, Microsoft has been sniffy and dismissive, in the manner befitting someone with no solution to offer at all. When David presented his "Scalable Architectures" keynote speech at PASS, he could not resist taking several side-swipes at RAC, presenting it as a thoroughly second-rate solution that was "grid in name only".
Oracle had beefed up an existing shared-disk technology for database scale out (Oracle Parallel Server), called it RAC, and marketed it mercilessly. The problem was that it was eye-wateringly expensive and very few people really needed it. Some people were doing creative "out of the box" clustering with standby databases, log shipping, data guard, and so on, but few needed to move up to a full-on "grid" architecture.
However, won't the same be true of Microsoft's 'true grid' offering, when it arrives? According to Andrew Kelly, there are many people who think they need scale out, but in almost all cases, they would be better served by scaling up i.e. by just adding more memory/CPU to their existing box. Unless you either can't afford to be down for more than a few minutes, if a server crashes, or your monstrous CPU/memory demands simply cannot be sated by a single machine, then chances are you can live without clusters, and certainly without "the grid".
Ironically, an old Oracle friend once told me that when you move from a standalone machine to a 2-node cluster, your availability can actually go down slightly. A reliable, standalone SQL Server box, unmolested by human intervention, will just run for 99% of the time. Once you add in more hardware, more software, more complexity, things can and do go wrong a little more frequently, and it's a simple fact that it takes longer to fix a problem on a cluster than it does on a single box.
That's not to say SQL Server people shouldn't be interested in clustering, per se. If you have a valid business reason for building redundancy into your database system, then it is natural that you will want to investigate clustering, replication, database mirroring, distributed partitioned views…or whatever 'high-availability' solution works for you.
Just bear in mind that, with clusters, come extra complexity, extra cost, more manageability issues, and the need for dedicated, trained people to deal with the clustered environment. Multiply each of these many times and you have the same argument for "the grid". When the Microsoft share-nothing grid arrives it will be incredibly impressive, endlessly debated, and fabulously expensive; and almost no-one will need it.
As always, we'd love to hear what you think. Post your comments to this blog, and the best one will receive a prize.
Cheers,
Tony.