Click here to monitor SSC
  • Av rating:
  • Total votes: 1
  • Total comments: 2
Ming Lee

What do you mean the cloud isn’t free?

04 May 2012

Because cloud services are so easy to deploy and scale-up, it is easy to start racking up unexpectedly high bills. There are plenty of ways of keeping costs down, but don't assume that Cloud services are necessarily a cheap option. It depends on how well you control costs

Cloud computing has now entered into most executives’ common vernacular. Certain ideas and themes come through and a common misconception is that the cloud is cheap, right? In some circumstances, it is even free (check out Amazon Web Services ‘Free Web Tier’) so one would be a fool not to consider this as a viable deployment option. However, the reality can be a lot different and some people’s perceptions of the cost of cloud computing needs to be re-adjusted to the reality of running live, production customer-paying websites, applications, and services complete with high-availability options and failover.  In this article I’m going to look at some of the ways your cloud solution can accidentally bleed money, and what you can do about it.

My Experience

The advantages of cloud computing are well known: pay-for-what-you-use subscription service, elasticity of supply to match demand, and low or zero levels of hardware maintenance. So it was a surprise to some when I presented a quarterly report on the cost of running our cloud computing. Now, the amount is high only relative to one’s expectation – the quarterly cost was still only 75% of the cost of the hardware alone for a typical project.

However, I was asked to explain the main areas of cost to the stake-holders.

As per my previous post, most of the expense when running in the cloud is determined by your architecture: how many servers (instances) are you running and for how long? In most cases, you will be running something 24/7. That suggests calculating the cost should be quite easy, right? I was pressed on this point – did I get the initial architecture wrong?

Checking the current system and the submitted plans, there were no major changes. We have live servers, staging servers, enterprise load balancers, elastic IPs, and other bits of AWS EC2 related paraphernalia. All was in order yet the cost, broken down per month, was always higher than expected. Clearly, there was something that I had either missed or had underestimated.

When I first submitted the cost, I added a contingency of an extra 15% of the total, just in case. It was just enough but I was personally surprised how close I came to going over this amount. Those in charge of the budgets were even more surprised.

After some investigation I discovered the culprits and drew some valuable lessons from them. Now, I’d like to share some hard-won insight into controlling the cost of using AWS.

Truly be elastic – shut down unwanted instances

We made the mistake, which I think many have and will, of merely transferring physical hardware into the cloud. We continued to treat the instance like our own hardware and kept them running 24/7 even when they were not being used. This was partly because our monitoring and logging systems were all designed around a 24/7 continuous monitoring. In reality, outside of office hours, for some of the services, there were no customers. They should have been switched off either manually or via a batch script and then restarted in time for the first customers.

Lesson: Think elastic; shut down all un-used instances; re-architect your design to accommodate this.

It’s easy to spin up new instances, almost too easy

Related to the above point, it’s easy to spin up a new instance for testing or for sheer curiosity. It’s also pretty simple to tear down an instance and spin up a new one from an AMI. Problem is, each time a new AMI is spun up, it incurs the full hourly charge even if it only ran for a few minutes.

We had some issues with applications not working properly due to misconfiguration. Thus, a lot of troubleshooting was performed with each major step being saved as a snap-shot to enable roll back. Roll-backs were frequent as various troubleshooting avenues were explored and rejected. Over time, fixes were done but this had the effect that we were running two to three times the number of instances per hour at times.

Lesson: A subscription service isn’t efficient when it comes to troubleshooting; one is better off working on local resources or local virtual machines. Understand the ‘rhythm’ of your environment. The elasticity people speak of is only worthwhile when handling usage peaks in live, production systems where extra use = extra revenue. For testing, research and development, and other related functions, consider using local resources.

Being human and forgetful

There were a few occasions when AWS Quadruple Extra Large instances ($2.44 per hour) were accidentally left switched on and idle over a long weekend. On another occasion someone spun up a new instance in a different region, forgot to switch it off, and only remembered a couple of weeks later.

Accidents happen but it all builds up.

Lesson: Institute some sort of policy that controls and monitors what instances are running, which region it’s in, and who owns it. Use the meta-tags to more properly label the instances (example: ‘test – can be deleted after 01/01/2012’) to aid in the management of the instances.

Use the new medium instances wherever possible

Recently, AWS made their medium instance type available. You could use it before as an option via the EC2 Command Line, but now it’s part of the main product line as an option. The medium instance has very good specifications and we are actively looking at the performance metrics of this instance type. Already, any ‘test’ or ‘staging’ environments are running as medium instances.

Lesson: Consider using the micro or medium instance types instead of the default large instance.

Use Linux where you can

A comparative Linux instance usually costs 50% less per hour than its Windows counterpart. Definitely look at porting over what code and applications you can to open source variants. Websites, forums, and e-business backend applications have 100s of open source options that can run happily under Linux. AWS has almost all of the main flavours of Linux available including Ubuntu, SuSe, RedHat, Debian, and Centos as well as Amazon’s own Linux offering. Recently, a core part of our application stack was significantly re-engineered so it can work very happily under RedHat and Suse linux.

Identify and eliminate ‘dead-time’

Any process that requires human interaction can be plagued by delays, each stage waiting for human input. A process that finishes at 02:00 in the morning and waits for a human to click on the ‘next’ button at 09:00 when the office opens up is not very efficient as you have now  paid for seven hours of ‘dead-time’.

Lesson: Automation of processes through scripts and programs is important. This includes utilizing well known options including Windows schedule and Python to integrate the EC2 command line and AWS Elastic Beanstalk. Automation should also be able to shut down instances when a process is completed.

Actively manage the AWS environment

Over time, snapshots and copies of Elastic Block Store (EBS) volumes will litter the dashboard. Everything there will cost money and a big factor of that are the disk partitions, especially of instances that are either unused or have already been removed. EBS volumes take up valuable space and they cannot be used. Since it’s so easy to spin up an instance and then tear it down, AWS sometimes leaves a number of artefacts behind – including the disks. The same thing applies to the Relational Database Service (RDS) – snapshots can take space; look at downloading a copy of the data outside of AWS as part of any disaster recovery plan.

Lesson: Be proactive and seek out orphaned disk volumes, redundant snapshots, and other artefacts. Then see about removing or archiving them.

Cost control and monitoring

Since AWS makes it easy to manage the account, I realised that the task of keeping track of spend had actually devolved down to me. I manage the technical operations but not the budget. However, those who worry about the budget had no visibility of the regular spending as they did not have the login details nor was the billing through the usual purchase order / invoice mechanism.  It was actually deemed too easy to subscribe and use resources as subscriptions could be easily started through credit card sign-up. In the past, one had to raise a purchase order and it would go through a number of steps before approval.

Lesson: Controlling and managing cost needs to be embraced by everyone in your organization.

Final thoughts

When a cost/benefit analysis is performed between a ‘traditional’ hosting platform (where you buy your own server and have it running in your server room) and the cloud then I am still quite sure that the cloud option will turn out significantly cheaper to start-up (lower capital expenditure costs) and to run (operational costs).

However, the bills will come in regularly and some months may be higher than others. The cloud is elastic so it’s easy to activate more resources; there’s nothing like throwing CPU resources at a problem! Due to the ease of this, you can find that a particular job can balloon in terms of resources as more hardware is thrown at it to speed things up. This can add to the cost significantly. Loose release procedures and inadequate resource tracking can be a big issue. Being able to think and act elastically means a review of existing processes and working practices. Working with a public cloud provider as if it’s still a finite bunch of servers downstairs in your server room can be just as costly a mistake as a poorly thought out plan of execution.

Be warned but do not be afraid, the cloud is great. Just don’t get too soaked when it pours it down!

Ming Lee

Author profile:

Ming works for Esri UK, a leader in GIS software and services in the UK for ten years. Specially, he manages all the operational aspects of the online services and has in-depth experience in all things GIS and cloud computing. More generally, Ming has worked as a GIS consultant with the World Bank, DfiD and the United Nations around the world in the last 20 years.

Search for other articles by Ming Lee

Rate this article:   Avg rating: from a total of 1 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject:
Posted by: David Sheardown (not signed in)
Posted on: Monday, December 03, 2012 at 8:02 AM
Message: Great article.. my experience saw a relatively small cost issue where I had for some reason setup an elastic IP within a different zone (EU) which wasn't mapped to a server - I must have removed the server at some point, and the IP remained.. It took me two months to realise the extra cost was there, as I had checked my current zone (USA) and of course no IP's floating around.. so the need to ensure you switch to the correct zone for sure!

Subject:
Posted by: robin sasson (not signed in)
Posted on: Monday, December 03, 2012 at 8:02 AM
Message: Great article and useful - I signed up for the free web tier today.

 

Top Rated

Data Science Laboratory System – Object-Oriented Databases
 Object-Oriented Databases (OOD) avoid the object-relational impedence mismatch altogether by tightly... Read more...

Tales from a Cloud Software Firm
 Following on from a discussion about how people are using the cloud, the Simple-Talk Editorial Team sat... Read more...

Data Science Laboratory System – Document Store Databases
 A Document Store Database (DSD) is similar to a Relational Database Management system with the... Read more...

Data Science Laboratory System - Instrumentation
 It is sensible to check the performance of different solutions to data analysis in 'lab' conditions.... Read more...

Testing the StreamInsight Service for Windows Azure
 Getting 'up to speed' with StreamInsight is easier if you take the time to run it and test it out.... Read more...

Most Viewed

Windows Azure Virtual Machine: A look at Windows Azure IaaS Offerings (Part 2)
 We continue our introduction of the Azure IaaS by discussing how images and disks are used in the Azure... Read more...

PHPFog and Pagoda Box: A Look at PHP Platforms
 Cloud platforms such as Heroku, AppEngine, PHPFog and Pagoda Box are ideal for companies who just want... Read more...

An Introduction to Windows Azure BLOB Storage
 Azure BLOB storage is persistent Cloud data storage that serves a variety of purposes. Mike Wood shows... Read more...

Managing session state in Windows Azure: What are the options?
 Because you can't maintain session state for ASP.NET applications in Azure using the default in-process... Read more...

Creating a custom Login page for federated authentication with Windows Azure ACS
 Windows Azure Acess Control Service (ACS) provides a way of authenticating users who need to access web... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.