When the Cloud was new, it was often presented as an 'all or nothing' solution. Nowadays, the canny Systems Architect will exploit the best advantages of 'cloud' distributed computing in the right place, and use in-house services where most appropriate. So what are the issues that govern these architectural decisions?
I consider the term “Distributed Computing” to be a more accurate than “Cloud” because any organization that performs some or all computing functions under the control of other organizations is distributing that computing effort. It’s a far more straightforward and understandable definition, but of course it hides the complexity that comes with it. Even so, to maintain a level of understanding in this article, “Cloud” can be used as a term interchangeably with “Distributed Computing”.
The term “Distributed Computing” helps frame one of the primary benefits of this new way of working; using more than one location to fulfill a set of computing needs. Not only can these locations be separate, but they can also use different architectures. That represents an amazing palette of options that the organization can use to extend and expand its solutions.
But with great options comes great stress – at least I think that’s the way the quote goes. Each option has both advantages and disadvantages, not always clearly defined. It’s far easier to fall back on the comfort of what the organization is currently familiar at the cost of those advantages that other architectures provide.
An alternative approach is to pick and mix the palette of options by combining various parts together to create a hybrid. Although this is viable, there are factors to consider for combining the various options for a solution.
Defining the Architecture Choices
Almost any set of categories of Distributed Computing architecture turns out to be inadequate, but there are some categories of architecture that most of us are familiar with that serve as a pattern for discussion. I’ll start with just a few of the choices and how I define them.
“Canned” Software Running On-Premises
This is the model that is most common in both small and large organizations. The organization owns all the hardware including the servers, network wiring and routers, buildings, rooms and the entire infrastructure required to make the system work. They also employ people to operate and maintain the systems and, in many cases, pay a license fee for operating systems and software. All this is done to provide a service to the end-users of the system. The advantage is that the organization maintains complete control over the environment, and enjoys a monopoly on the use of the hardware and software.
Almost anything is available as pre-written software, in both closed and open-source models, to suit almost any need. Sometimes the software can be customized; in other cases the system provides a set series of functions. In any case, this model allows complete control, a high degree of security, and a set cost for the system, just as long as you don’t factor in the ongoing costs of managing and maintaining it.
Custom Code Running On-Premises
Sometimes the software that is offered by various vendors doesn’t quite do, or even remotely do, the things that an organization needs. In this case, the organization can hire or contract people to use any number of languages, methodologies and more to code what they need. They can use the same systems that they have for the pre-written solutions, and this setup allows for the ultimate in control, and the ultimate in responsibility. The organization is now responsible for all vectors within the system, from the hardware to the software code, and all of the infrastructure and people needed to run it.
But the advantages often outweigh the responsibility, especially if the organization provides software functions not only for internal users but for paying customers, such as a web site or service the organization uses to sell products or services. In this case, there are even more responsibilities including security and other ancillary functions that are required to operate the service.
There is another advantage for code running on-premises, whether it’s written by the organization or a vendor: some data has stringent security and ownership requirements, such as health, personal, or financial data about customers or employees. These requirements, usually driven by the government, are quite unforgiving and very specific. Hosting the application under the complete control of the organization, from hardware to software to personnel, is non-negotiable.
Infrastructure as a Service
One level up from hosting the hardware internally is an architecture that is known as “Infrastructure as a Service” or IaaS. This architecture can be hosted internally (called a “private cloud”) or by a vendor (called a “public cloud”) but in both cases has a few salient characteristics that separate it from simply virtualizing a computer. These characteristics include a consistent, automated deployment mechanism, monitoring the systems as a whole and in part, and being able to balance the load of Virtual Machines (VMs) and storage across a contiguous network fabric.
Note: There is a more strict definition for IaaS, but these three concepts will do for this discussion.
This architecture is quite similar to owning the physical systems– the organization has complete control of the operating system, and everything above it. The organization merely abstracts the hardware below. If the organization uses an internal IaaS, then they are responsible for the hardware, but can gain greater utilization and standardization across the platform, and possibly see economic gains from fully using the hardware that the system uses. The organization has to provide the full capacity they will need for any contingency, which means there may be times when the systems are under-utilized.
If a vendor provides the IaaS, then the organization can essentially grow and shrink the size of the systems at will, depending on how the vendor charges for the utilization. With some vendors the organization can create the VMs internally and copy those to the vendor’s environment (as is the case with Windows Azure) or choose a set of pre-built systems from a gallery. The vendor might also allow the images to be copied back down to the organization when not deployed to the cloud.
An IaaS system can essentially run any application that can be run on physical hardware, assuming the requirements involve software only. IaaS can run either pre-written or custom-developed software in the same way that those packages can run on physical hardware. If the IaaS is provided internally, the same government requirements can be satisfied the same way as on physical systems.
If the IaaS runs in a vendor’s datacenter, the vendor shares the responsibility for the certifications and requirements around government compliance with the owner. It’s usually a more difficult process to host-sensitive data in a vendor’s system than on local servers, but it is possible.
Custom Software on another Platform
If the system is required to host both custom code and the organization’s data (or the customer data of the organization), then it is better to abstract out not only the hardware but also the operating system and the runtimes such as .NET or Java.
In this case – often called “Platform as a Service” or PaaS – the organization focuses on writing code and storing data. The cloud vendor that is providing the platform will dictate what languages, frameworks and components they offer for the code to run. Developers have the option of using, and paying for, the components they use.
In most cases a PaaS provider has a mechanism to scale the code for the developer. It’s most common for a developer to use a scale-out, stateless model than to use a scale-up, monolithic model, since a VM can only get so large or have a certain level of performance due to the abstraction of the hardware. That isn’t to say that a VM can’t achieve high performance – it most certainly can – it’s just that a scale-up pattern is less amenable to a virtualized environment.
It is possible for an organization to host their own PaaS, but it is more uncommon than simply making an IaaS available to the developers, and working with them for operating system patches, upgrades and other platform-element requirements.
Software as a Service
If IaaS abstracts just the hardware, whereas PaaS abstracts the hardware, operating system and runtimes, then “Software as a Service”, or SaaS, abstracts almost everything.
SaaS involves providing a complete software solution to an end-user. The user logs on to the system, performs some work, and logs off. There’s usually little to configure, and nothing to install. In essence, this is what an organization’s end-user uses every day – it’s just that the internal IT teams provide the software for the user. In the case of a public cloud provider, IT doesn’t usually get as involved.
It’s common to see commodity functions such as accounting and finance or even office automation software such as e-mail and word processing farmed out to a SaaS provider. There are usually no licensing fees, more often a per-seat recurring cost based on use.
With those definitions of the different models of distributed architecture, we can now turn our attention to the cases where their use is most appropriate. An organization can certainly choose to use only one model – using all on-premises with their own hardware, an internal or external IaaS solution, PaaS or even SaaS, depending on their needs. They can also choose to use whatever architecture suits the purpose. In fact, many organizations are doing this now, even though they may not think of it in these terms. Payroll is a function that long ago went to a SaaS provider for the company. In most cases it makes little economic sense to host, maintain and staff an internal payroll system, so they outsource this function to a provider. A few internal employees access the remote system on behalf of the organization – and this is the very definition of a SaaS architecture.
The true power of a distributed computing architecture is that it allows the organization to concentrate on the problem the organization wants to solve, rather than creating and maintaining a complete infrastructure for all possible problem spaces. It can then fit the solution to the problem rather than shoehorning the problem into the existing solution. Even better, the organization can ensure that the solution not only works for a given problem, but also works together with other architectures to allow for synergies where available. This is often called a “hybrid approach”, and it is a powerful concept.
To do this, it is important to understand each of the options, and then layer in the constraints, which I’ll come to in a moment. The requirements, combined with the options and constraints, should drive the eventual solution.
The first hybrid approach for a distributed computing environment involves starting with a function that is located on-premises, and then using a provider for the same function when the demand is higher. This might be a permanent arrangement, but more often is a “burst” of systems that come online during peak usage.
I’ve been involved in designs of these kinds of systems that use everything from IaaS to PaaS. The main component in this hybrid architecture is that an application is anchored to an on-premises system, and then, based on a calendar event or load trigger, the remote systems are brought on-line. Depending on the system, this can be a seamless transfer of functionality or a batch-processing approach where the systems send work off to the remote system and the results are returned later.
As a concrete example, I’ve worked with an on-premises High Performance Computing (HPC) version of Windows Server which ran financial calculations using a “Monte-Carlo Simulation” computation pattern. During certain periods, a workload became too great for the local systems to handle. This was then sent to Windows Azure, which hosted more worker-nodes. The transfer of work between the systems was automatic for the workload; only the scheduling function needed to be aware of the location of the worker nodes.
Other platforms provide similar functions, using IaaS systems to balance load or handle overflow work.
The key is to describe the architecture in terms of functions, not in terms of a specific kind of implementation. For instance, use the term “Data Access Layer” rather than specifying “.NET Data Class Factory”. That way if you need to swap a particular piece of the architecture for a different way of doing that part, it doesn’t break the design.
Traditionally, organizations require a second copy of data, another set of servers, and a separate facility to ensure that, if the primary computing environment is comprised, another can take its place. This is an expensive proposition, but the alternative of not having a backup of the environment can be devastating – especially if the organization has a significant requirement for technology for its business.
By using a cloud provider, an organization can create an exact copy of the functions it needs to continue business. While there is no requirement that the systems be identical, it’s more work in the case of a disaster to recover if they are not.
The key to this hybrid use is to decide whether the disaster recovery (DR) system will be a “hot” standby; constantly on, and available for, use. The alternative is a warm/cold standby that is merely ready to deploy and instantiate when required. The former is more costly, the latter has a longer recovery time. The organization should make this decision carefully. It’s easiest with an IaaS installation, especially if the organization has an internal IaaS.
By far the most compelling reason to use a hybrid architecture, or any IaaS, PaaS or SaaS external provider for that matter, is to use the best-of-breed software functions for a solution.
Consider that many of us own a motor vehicle. We also walk, take a train, boat, aircraft, bus or taxi. Sometimes we use all of those things to make a single trip. That’s a perfect example of using components we own, components we rent, and components we simply use because they are the best way to do something. So it is with a hybrid use of multiple architectures for a single solution.
The best way to describe this kind of hybrid use is by an example. Assume for a moment that the organization wants to provide their customers with a mobile-device experience. They want them to be able to search for products or services, compare that to industry standards or prices, and then allow them to purchase something. The organization needs to take payment, track and ship (or fulfill) the order, and see analytics across purchases.
The organization could certainly stand up all of the components for this solution by using their own hardware, exposing the external-facing portions over the Internet and so on: But there are options available to use a mix of components, based on the needs and constraints for the solution. One possible architecture is to write the presentation layer for each device, to take advantage of the richness of that particular interface. They could then use a PaaS solution to act as a web services listening to calls for the order from the device. A SaaS solution for data might be available for industry data and pricing. An IaaS (or PaaS) solution could house the organization’s catalog of services or products, since they probably want to make that public to their partners and customers from a central location. Internal systems could be coded to retrieve the customer order for analytics, keeping private data secret. Another SaaS solution could be used to process the payments securely.
This is only one example – and even within this example, the components can be interchanged with others. A gradual approach allows organizations to migrate slowly to an external provider, all without disturbing the system.
The important thing to remember is that architects should now think about solutions in a more componentized way, rather than a monolithic internal stack of hardware and software.
Of course, this all sounds great until the actual requirements, and more importantly the constraints are exposed. While there are many things to consider, I’ll focus on three that can help set the focus for determining which parts of an application can move where.
At the outset, the organization needs to consider the security requirements. Some requirements – enforced by the government – dictate the level of ownership an organization must have on its data and data processing. While external cloud vendors can provide various levels of certifications and guaranties, it’s important to work with the vendor to carefully understand what they can and cannot do.
That isn’t to say the organization can’t use a public cloud provider. As I mentioned earlier, many organizations have already outsourced their payroll even though it has an amazing amount of private data.
The key is to understand the security profile for a particular component within the solution, and ensure that it is protected properly.
The cost model for an organization’s internal systems is quite simple – hardware, buildings, utilities, licensing, training and people. That being said, it’s not a simple matter to quantify those costs. It’s not quite a “sunk” cost, meaning a one-time payment, since computers are replaced, licenses are renewed, and people come and go.
It might seem that a cloud solution’s cost is harder to estimate, especially in the case of PaaS, since each component is often billed separately. However, it’s a “pay as you go” model, so it can actually be tied to a revenue-generating system quite directly. In the case of a SaaS, the model is usually very simple – the organization pays for a usage-period or per-seat cost. It’s a far more direct model than trying to estimate an internal system.
Even with this more direct model, it’s easy for an organization to get into trouble with this billing system. It’s a bit like a teenager with a cell-phone texting plan – easy to purchase, and very easy to over-use. It’s important to develop a cost-modeling system that can be tested over time for accuracy.
Control of a system is actually the only real consideration. Control implies what the organization is responsible for, and what the cloud vendor is responsible for. In the case of the hardware, the vendor owns the design, purchase and maintenance of the system. In the case of PaaS, that responsibility moves all the way up to the operating system and even the runtime layer.
So the organization has to consider what level of control it wants to maintain, and what level of control it’s willing to abdicate. I like to drive myself to the airport, but I’m happy to let the pilots fly the plane.
In general, the job of the IT professional is to expand what he or she knows about the options, and what the organization’s requirements are, and as always, to offer possible architectures that will solve the problem. Now those solutions involve another resource – Distributed Computing.