Microsoft Azure has a service called Azure Blob Storage, which is one of the components of the Azure Storage features of the platform. The other services in Azure Storage being Queues, Tables, and Files (file shares, in Preview at the time of this writing). Blob stands for “binary large object” – blobs are basically files like any you store on your regular computer. They can be VHD’s, pictures, Excel files, etc., – pretty much anything. The blobs are stored in the cloud Microsoft’s servers, and you can access them through URLs with or without authentication tokens, through the REST APIs, or using a Storage Client Library.
What would I use Azure Blob Storage for?
One example of using Blob Storage is to store static files that are frequently used by a website, such as images, CSS files, and PDF files. Changing the URLs to retrieve those files from Blob Storage reduces the load of requests on the web server hosting the website, and you can get more bang for your buck running the website. (Or, as Mike Wood put it, “you can service many more requests for dynamic content on fewer compute resources because they aren’t busy serving up static resources”.)
If using PaaS features such as Web Roles and Worker Roles, storing files in Blob Storage instead of in the application speeds up the publishing time, especially if the files are large. You also might want to change one ore more files frequently, and putting them in Blob Storage allows you to easily update the files without republishing the entire web site. For example, if you put your company logo in Blob Storage, then every time your company changes the tag line, you can just change that image and it will magically change the web site. You laugh, but I worked at a company that changed the tag line on their logo at least ten times in a couple of years.
At one company I worked for, each customer would upload images and record audio to create a video-type message. We stored the images and audio in Blob Storage. Each customer had their own “container” that the web application would read to and write from. This enabled the customer to go to any computer and log into the web application, and all of the assets for his message were available. He could work on his project on different computers, always accessing the files in Blob Storage. This was more flexible for the customer and we never had to worry about sync issues with their cache.
A third example is to use Blob Storage as a temporary holding place for large queue messages. Let’s say you have a queue that you are passing data to in order to be processed by a worker role in Azure, but the messages contain data that is larger than the size limit on the queue messages. If you need to provide more data and will exceed the queue message size, store a file in Blob Storage and put a URL to the file in the queue message. Then when getting ready to process, you can retrieve the data from Blob Storage and process it.
What kinds of blobs can I have?
There are two kinds of blobs – block blobs and page blobs. When you create the blob you specify the type. The maximum size for a block blob is 200 GB. You can upload block blobs with a size up to 64 MB in one operation. For larger block blobs you can upload them by programmatically splitting them into blocks and using multiple threads to upload the blocks in parallel. When you commit the blocks, Azure puts them back together and makes them available as a file. (This is how I always imagined the transporter in Star Trek works.)
Page blobs can be up to 1 TB in size and consist of a collection of 512-byte pages. You set the maximum size when creating a page blob and then you can write or update specific pages. The primary use I have seen for these is to back the IaaS Virtual Machines in Azure – the Virtual Hard Drives (VHDs) that represent the data disks and OS disks are stored as page blobs in Azure Storage.
What is this blob.core.windows.net stuff?
Every file placed into blob storage gets a URL. The default base URL for accessing blobs is in the pattern of http://contosostorage1.blob.core.windows.net/, where contosostorage1 is the name of the storage account. This isn’t very friendly, and doesn’t indicate the source to the consumer of the files, which might make them questionable.
To eliminate this problem, you can assign a custom domain to your storage account. In this example, we could assign “storage1.contoso.com” to “contosostorage1.blob.core.windows.net“. You would then access the blobs in that account with a URL starting with http:/storage1.contoso.com/. Aside from being more understandable to whoever is accessing the file, it also eliminates cross-domain issues when accessing files in Blob Storage from a website for the same company, such as the aforementioned example of using Blob Storage as a cache for each customer using a web application. blob storage also supports CORS to help with this type of cross source usage.
For information on configuring a custom domain for your storage account, check out the “Configure a custom domain name for blob data in an Azure storage account” article in the Azure documentation.
Failure is not an option
What happens if the hard drive that my blobs are sitting on fails? What happens if the rack my hard drive is in fails? What happens if a meteor hits the data center? In the case of the latter, they call up Bruce Willis to save the world, which may or may not include your data. Fortunately, Azure has thought of this. They support something called “replication”. They support something called “replication”. 😉
There are three kinds of replication and you get to select which one to use when you create the storage account. In most cases you can change the replication setting later, except for the Zone redundant storage, which can’t be changed after it is created.
- Locally redundant storage (LRS): This means three copies of your blobs are stored in a single facility in a single region. The replicas reside in separate fault domains and upgrade domains. This means that data is available even if the rack where your data is stored fails or is taken offline to be updated. When you make a request to update storage, Azure sends the request to each of the three replicas and waits for successful responses from all of them before responding to you. This means that the copies in the primary data center are always in sync. I use this for test data, and data that I can live without in case Bruce Willis is busy when they call. This is the least expensive option.
- Zone redundant storage (ZRS): This is a brand new option, and it only applies to block blobs at this time. The official word on this is that “it replicates your data across 2 to 3 facilities, either within a single region or across two regions”. This means unlike the LRS option which stores your data in triplicate within a single facility, the replicas are spread across facilities that are close to each other. I would use this if I wanted my data to be safer, but I didn’t need the gold-plated redundancy offered through Geo-Redundant storage.
- Geo-Redundant Storage (GRS): This is the cream of the crop in terms of redundant storage. This replicates your data three times in your chosen data center, and then replicates it three times in a secondary data center that is far away. For example, if you put your data in North Central US, it will be replicated in South Central US. If you put it in West US, it will be replicated in East US. Basically, if you use this, then it doesn’t matter if Bruce Willis is busy, because you have another copy of the data in a completely different region. When you write to your primary data center, the data in the secondary region is updated asynchronously, so this doesn’t impact the performance.
- Read-Access Geo-Redundant Storage (RA-GRS): This is geo-redundant storage plus the ability to read the data in the secondary data center. If Bruce Willis is unsuccessful (or doesn’t answer the phone), you can change your application to read your data from the secondary data center, and all will not be lost. Also, if you have an application where only a few users can write to the database, but lots of people read the data, you could point the application that writes to storage at the primary data center, and then do all the reads from the secondary.
Note that for Geo-Redundant Storage, you can set the location of the primary data center, but the secondary data center location is selected for you. If you’re curious, there’s a complete list of the primary/secondary data center locations in the Azure Documentation.
How much does it cost?
You pay only for your usage – the amount of space the blobs take up, egress charges out of Azure, and the transactions of reading and writing to/from storage.
Part of the brilliance of the business case mentioned previously where we kept all of the customer’s data in Blob Storage and read it with a web application running in Azure was that we had no charges for egress out of Azure because the web application ran within Azure as well in the same region. We only had to pay for the storage of the actual data and the transactions of reading and writing the data. The cost of storing Block blobs and Page blobs varies depending on the amount of space you are using and the redundancy level. For example, at the time of writing for 1 TB of block blob storage, the cost is $0.024/GB/month if it is locally redundant. If it is Read-Access Geo-Redundant, it is $0.061/GB/month.
For a full list of prices by size, redundancy, and type of blob, check out Microsoft’s pricing page.
In this article, I provided an overview of Azure Blob Storage. I talked about the types of blobs available, what you would use it for, custom domains, storage redundancy, cost, and Bruce Willis. Next up in the series I’ll talk more about containers, demonstrate how to upload and download blobs, and show what Blob Storage looks like.