Scalability remains an exasperatingly vague term, even though there are well-established ways of ensuring that a web-based application reacts well to wide variations in usage. Dino cuts through the mystique to pin down what it is, what it isn't, and how to achieve it.
Some fifteen years ago, at the first-ever geek dinner meetings, it was common to joke about the sudden popularity of the concept of ‘scalability’. For example: so we’re all going to restaurant X; “Anybody knows the route? We need a scalable route that can accommodate the ten of us and maybe more if other people join”. That was a flippant remark to make, but I guess it cheered us up.
Scalability had become a craze then, and subject to a great deal of hype and marketing-spin, but it had been an aspect of code that had been around since the beginning of the software era. However, the sudden interest in the idea was due to the fact that it had become crucial with the explosion in usage of the web. Before then, the likely future scale of usage was fairly easy to predict, and to prepare for, but no longer. Now, with web-based applications, usage can change dramatically within minutes. It’s this volatility in usage that makes it vital for a site to know in advance its load limits and its resistance under stress conditions. In this article, I’ll share some ideas about ways to achieve scalability and to maintain it as the application evolves.
What’s Scalability, Anyway?
Scalability is very important for business because, if you get it wrong, it can bite the revenues and, worse yet, hit the reputation of a company. As simple as it may sound, a web site that is slow to respond may take customers off to a competitor. Similarly, a web site that collapses when too many people connect will impact the revenues and reputation of the company, not to mention making your business vulnerable to claims for liability. If you set up a web site for, say, a worldwide sport or music event, you can’t just afford to have it go down the day that tickets are on sale. If that happens ... ; well, it doesn’t have to happen!
There’s a commonly-agreed definition for scalability. Let’s use the words of Wikipedia: scalability is the ability of a software system to handle a growing amount of work in a capable manner. In addition, scalability relates to the ease with which the system’s capacity can be enlarged in order to accommodate that growth.
Whichever way you look at scalability , it is the performance that the site delivers across a range of loading that tells you whether the site is achieving scalability. The equation “scalability equals performance” though is misplaced. Scalability and performance are different aspects of the way the application behaves and should be addressed independently. Sometimes, there is even a trade-off between scalability and performance: The choices you make to improve scalability can actually lower the actual performance of individual interactions.
Trying to abstract as much as possible from technical details and specific business domains, we could say that a scalable system is a system that can serve any (high) number of requests in a unit of time without slowing down too much. In other words, its performance degrades gracefully under increasingly high load. Scalability is a relative concept: it’s not about the system being fast in absolute terms. While that is desirable, it is often unrealistic. It’s good enough to have a system that is acceptably fast with any load of users. If you go shopping on Black Friday in some US mall, you can’t realistically expect to meet just a few other people. Yet, a scalable mall is a mall that can handle all visitors and give each an acceptably good buying experience. This means taking measures such as refilling products on shelves and limiting checkout queues as much as possible.
As a software architect, what can you do?
It’s more about queues than raw performance
The example of the shopping mall shows an affinity between scalability and the queuing theory. This affinity in my opinion is even stronger than any affinity between scalability and performance. At the end of the day, in a web design, a more efficient handling of queues would reduce the wait times of requests and also increase the number of requests that can be served. Focusing on pure performance and code optimization would only address the time it takes to serve a single request.
If you have networking bottlenecks, a single web server endpoint for example, you won’t experience improvement with large volumes of requests even if you reduce the service time towards zero. Networking bottlenecks relate to infrastructure and touch on things such as networking, caching, geographical distribution of servers, web farms as well as organization and performance of the actual code.
Load Testing and Stress Testing
When it comes to measuring the overall performance of a web site and its ability to scale, two terms appear: load testing and stress testing. Together, load and stress testing are the best ways to understand the dynamics of the behavior of the system. Load testing measures how it reacts under a given work load; stress testing refers to increasing the load progressively up to finding the upper limit of capacity. Stress testing is essential to avoid the eventuality that a site crashes on launch or on peaks.
Scalability is not necessarily an attribute you can fine-tune like the audio of a TV. Or, put another way, treating scalability like the audio of a TV that you can raise or lower at a whim. It is, in reality, just the ideal goal. The real challenge is to find the tools and solutions that make it possible. Stress testing is a sort of cold measurement; it’s the analysis that you do on forecasts and expectations rather than live data. Hot measurement, instead, is when you observe the performance degrading as the number of users grows. This is a slower process that often goes hand in hand with the success of the site. Hot measurement is based on actual data such as logs of access. It represents a solid foundation for planning improvements to the code, architecture and/or the network infrastructure.
As an architect, you are mainly concerned about
- Ensuring that the application doesn’t crash at launch;
- Ensuring that the application gives signs of degrading performance in a predictable enough way to allow you to plan proper countermeasures;
- Making sure you know the most common tools and strategies to address scalability needs;
The need of scalability for a site generally grows slowly beneath the threshold of perception before exploding into your consciousness at some point. You don’t need scalability just because scalability is cool; you just need your site to be in shape and able to serve requests effectively.
Yet the question remains open: how should you design the site so that you can keep the level of service nearly constant with any number of users?
Vertical vs. Horizontal
In general, scalability improvements are referred to as being horizontal or vertical. Horizontal scalability is known as scale-out; vertical scalability as scale-up. To stick to the queuing metaphor, I could say that vertical scalability is about improving the performance of the service. Vertical scalability aims at reducing as much as possible the time it takes to deliver a service. Speaking of web, it means bringing down as much as possible the time taken by the response. You can achieve that through code optimization and/or optimization of the infrastructure whether network, databases, or computing power. Vertical scalability essentially boils down to buying more memory, more CPUs and more powerful computers. In this regard, vertical scalability is the easiest strategy to implement but it is also limited. As one of my professors was used to say when speaking of computational complexity at university, faster hardware is much more beneficial to fast algorithms than poorly designed algorithms. Another funny way to look at the inherent limits of vertical scalability is with the words of Fred Brooks—author of the mythical man month book. He said that nine women can’t deliver a baby in one month.
Vertical scalability also involves caching; and caching is an excellent performance accelerator to introduce in software and web infrastructure. Think, for example, of output caching in ASP.NET. Think also of specific products such as Varnish, Squid or Nginx which, with various degrees of functionality, operate as proxy servers with caching and load balancing capabilities over HTTP and other web protocols. These products mostly do caching; but they do it outside the core code of the application. In a way, configuring any of these products is a form of vertical scalability aimed at making a server (including a web farm) more powerful. Today, quite a few high traffic web sites--from airliners to news and media portals--are based on a classic ASP.NET web site schema deployed to a web farm and with several layers of cache on the forefront. The power of cache is amazing.
Role of Architecture
Caching is an approach that works well with queries; but what about writing? Write-intensive systems such as social applications may run into performance issues when write-access to relational databases is too frequent. Not using a relational database, instead, is still not so common. The main issue with write operations is the update of indexes and the subsequent growing complexity and execution time of queries. Relational databases, therefore, suggest vertical scalability. A relational database can definitely be scaled-out, but that's often quite an expensive operation as it involves sharding, administration, load balancing and synchronization. If caching is not enough because write operations are too heavy, an interesting alternative is represented by the CQRS architecture.
CQRS is all about designing two separate stacks for queries and commands. Each stack has possibly its own persistence layer made to measure for its purposes. For example, the persistence layer of the query stack can be based on denormalized relational tables optimized for reading. At the same time, the persistence layer of the command stack may take advantage of asynchronous writes. The HTTP request returns immediately after placing the request for a write. The burden of writing data and synchronizing the command and query data stores are left to implementation of the backend system.
CQRS is an ingeniously simple solution because it separates the read and write stacks. Not only does it simplify development but it also leads to systems where full ACID consistency is just an option. And architects, surprisingly, realize that eventual full ACID consistency is good enough in most cases. And, not coincidentally, a system based on eventual full ACID consistency scales out very well.
Role of the Cloud in Scaling-Out
Recently, one particular strategy is emerging as the most successful and rewarding in terms of chances of scalability. It is the Platform-as-a-Service (PaaS) model pushed by Microsoft Azure. In a PaaS scenario, you find out that having a simple and compact ASP.NET stack deployed to a web role is preferable to having a multi-tier system in which you take care of all the middleware. As long as you have a single instance of the web role, at the very end of the day, you're in a scenario that is very close to leveraging the services of classic Internet Service Provider. However, you can treat the Azure platform like the audio of a TV and raise or lower the number of web role instances as it suits. In doing so, you increase the responsiveness of the site linearly while gaining some noticeable side-benefits. This is the power of horizontal scalability.
For example, with multiple web role instances you have higher chances of avoiding any down-time in your application. Furthermore, the more instances you have, the higher is the overall availability of the system. If one instance goes down, others will maintain the application alive and kicking. Finally, multiple instances will also make it far easier to geographically-distribute the instances through data centers to move them closer to critically-important customers. This is like first-aid treatment for unexpected peaks of traffic from a specific area of the world.
Scalability is a serious matter in software and it has good and bad flavors. Scalability is good because it denotes traffic and possibly success. Scalability is bad because if not handled properly, it turns possible success into certain failure. There’s a lot of hype around scalability too, as if everybody would need scalability in the same way and in the same dosage.
In this article I’ve tried to explain the mechanics of scalable sites and designated three architectural devices as providing the safest way out of the mess. Caching is my choice for first place, followed by CQRS architecture and, thirdly, using horizontal deployment on a cloud platform. These are the three main moves I’d make to ensure scalability. Your take?