2187-just_azure.svg

In this article, I’m going to talk about using the Storage Client Library to programmatically manage blob storage. There are storage client libraries for Java, Node.js, and a few other languages as well; I’m going to use the .NET storage client library. All of these libraries are wrappers around the REST API, which will be discussed in a future article.

I’m going to show you how to build a class called BlobMethods that has a lot of the commands used for accessing blob storage – uploading blobs, downloading blobs, copying blobs, deleting blobs, etc. The full class is available on GitHub or you can download it from the link at the bottom of the article, in case you’re one of those people who likes to know the ending before reading the book. For developing the code, I’m using Visual Studio 2013 and .NET storage client library version 4.2.1, which is the newest version available at the time of this writing.

The storage client library allows you to use objects to represent the different parts of storage, such as the storage account, the container in blob storage, and the blobs themselves. There is a bit of plumbing that you have to do to drill down to the actual blob, and each call uses the object created before, which means there’s a hierarchy to the calls. First you get a reference to the storage account, then you get the blob client service in that storage account, then you get the container in that blob client service, and finally, you get the blob in that container. The formal hierarchy looks like this:

  • CloudStorageAccount – this is a reference to the storage account, created using the account name and key.
  • CloudBlobClient – this is a reference to the object used to perform operations on the blob storage account. This is created using the CloudStorageAccount object.
  • CloudBlobContainer – this is a reference to the container that your blobs reside in. This is used in most blob operations, and is created using the CloudBlobClient.
  • CloudBlockBlob – this is a reference to the actual blob. This is created using the CloudBlobContainer object and the name of the blob. (If using page blobs, this would be CloudPageBlob).

You could put code for creating this hierarchy of objects in each method that accesses a blob in the container. However, that means it would retrieve the same CloudStorageAccount, CloudBlobClient, and CloudBlobContainer objects over and over again. The only thing that really changes is the reference to the CloudBlockBlob. Therefore, we’ll instantiate those objects only once, in the constructor of the class. Then for each blob action, we will set the reference to the specified CloudBlockBlob and then perform the action.

I put the setup of the hirerarchy objects in a method called SetUpContainer; this is called when the class is instantiated. There are two class-level objects that are used throughout the class – CloudBlobContainer and the ContainerName – that are also set when the class is instantiated.

When we instantiate this class, it creates all of these objects and sets up access to the container. After that, you can pass in the blob name to use the different methods performing actions on the blobs in that container (such as copying, uploading, deleting, etc.). If you need to access different containers, you can simply create multiple instances of the BlobMethods class – one for each container.

Having set the groundwork, let’s see some code.

Set up the Hierarchy of Objects: SetUpContainer(…)

First we have the class-level variables and the constructor for the BlobMethods class. The class-level variables are set by the constructor, then used repeatedly to access the blobs in the container. Storage account credentials and container name are required for the class to work, so our constructor accepts those fields as input.

BlobMethods helper

Now let’s set up a method to create the hierarchy discussed earlier – starting with the storage client and continuing down through the reference to the container. This is only called from the constructor. Our method is called SetUpContainer. The following is the signature of the method; the rest of the code in this section goes inside SetUpContainer.

SetUpContainer constructor

In SetupContainer, the first thing we need is the connection string to the storage account and a reference to the CloudStorageAccount object.

Creating the connection string

Next in our hierarchy of objects is the CloudBlobClient, which we can get now that we have a CloudStorageAccount object.

Reference to CloudBlobClient

Next, we create a reference to the CloudBlobContainer object. This is used to perform operations on the container, and to access blobs in that container. This code gets a reference to the CloudBlobContainer using the cloudBlobClient object.

Reference to CloudBlobContainer

The CloudBlobContainer object is returned to the constructor, which puts it in the class variable. This section contains all of the code that goes in SetUpContainer.

Defensive Programming

When writing solutions for the cloud, you must program defensively. Cloud solutions are often comprised of multiple sometimes-connected products/features that rely on each other to work. For example, a program that allows someone to upload pictures to blob storage could consist of the following: (a) the client application running in a cloud service (PaaS), in a VM, or in an Azure website, (b) a backend service called by the client application to access the database, and (c) blob storage.

Any of those bits could stop working for some reason – Azure websites could go down, the network between the VM and the backend service could start denying access for some reason, the disk your blobs are stored on could hit a bad patch and the controller could be in the process of repointing your main blob storage to one of the replicas. You have to assume any of these things (and more) can happen, and develop your code so that it handles any of these cases.

You also have to handle the possibility that human intervention can cause you a problem. In the case of using blob storage, maybe someone changed the storage account to point to a different Azure subscription and didn’t create the container needed. Or maybe someone changed the container name or even deleted the container, not realizing it would have an impact. Maybe someone changed the container’s access from private to public. (You’re wondering how this would ever happen. Sometimes you might work with people who don’t understand everything they think they understand, and they make changes thinking they’re helping you. Think of putting in defensive code as saving them from themselves. 😉 )

It would be a good idea to wrap all of the code in each method with a Try/Catch. Additionally, if you know you’re going to work with specific containers, it’s a good idea to make sure the container exists and to set the permissions to what you want them to be. You don’t want to do this every single time someone accesses a blob, but it’s a good idea to put this code in the startup of your application so it runs at least once during the lifetime of the website, web role, or worker role.

Keeping that in mind, let’s add this method, RunAtAppStartup to our BlobMethods. Then when using blob storage, the application can instantiate BlobMethods and call this method to ensure the container exists and has the right permissions. Here’s the code:

RunAtAppStartup to ensure the container is set up correctly

Blob Operations

Now that we have a reference to the CloudBlobContainer, let’s look at the different tasks you can perform with blobs. Upload a file to a blob with the same name as the source file. Uploading files from the local computer is pretty standard fare.

Uploading a file

Write some text (string textToUpload) to a blob. I’ve used this when providing a text area for the customer to type in, and then saving the text directly from there to blob storage. This is quicker than writing the text out to a file and uploading the file.

Writing text to a blob

Upload from a byte array to a blob; myByteArray is defined as Byte[]. I used this in an app where the customer could select a picture from his computer and it would resize it and put it in a picture box on his screen. You can take the byte array representing the resized picture directly from the picturebox control and upload it to blob storage.

Uploading from a byte array

Upload from a stream. You could use this if you have a memory stream and you want to write it to a file in blob storage. Be sure to set the position back to 0 before you start, so you don’t get a partial file.

Uploading from a stream

Download a blob to the local machine. First, check to see if the blob exists before trying to download it. That’s one of those defensive programming tasks we talked about earlier. Also, when I download the file, I want to match the hierarchy of the “folders” in the blob name (see my previous article to explain the blob paths). I have to replace forward slashes with backwards slashes – blob storage uses forward slashes, but Windows uses back slashes. Then I’m going to check and see if the folder exists on the local computer, and if it doesn’t, I’ll create it before downloading the file. For example, if the blob name is “NatGeo/Tiger.jpg“, I want to change this to “NatGeo\Tiger.jpg“, create a folder called NatGeo, and put the Tiger.jpg blob into it.

Downloading a blob

Download to a byte array. Referring to my previous example, you could download an image to a byte array and put it in a picturebox control without saving the file to local disk. Note that this calls blob.FetchAttributes(), which populates the blob properties so we can get the length of the blob.

Downloading a byte array

Download to a stream. Reversing my previous example, you could download directly from blob storage to a memory stream.

Downloading direct to a stream

Rename a blob. There is no rename method for blobs. You have to get a reference to the original blob, copy it to the new blob name, then delete the original blob. This uses StartCopyFromBlob, which is actually run asynchronously. This means it doesn’t stop and wait for the copy to finish, or the delete, for that matter. It will delete the source blob when it finishes copying it.

Renaming a blob

Delete a blob. To do this, you call DeleteIfExists(). If the blob was there and it was deleted, this returns true. If the blob wasn’t there, this returns false.

Delete a blob

Now let’s take a look at listing the blobs. When you ask for a list of blobs, you get a list of blob URI’s. Example: http:// robinstorage.blob.core.windows.net/images/NatGeo/Tiger.jpg. If that works for you, great. But what if you want to show the list of files in a list box? You don’t need the base URL displayed over and over again, and you don’t need the container name displayed over and over again. To show just the actual file names, you have to parse this and strip off the base URL and the container name (“http:// robinstorage.blob.core.windows.net/images/“). Remember that the blob names can have relative path in the name.

Here’s a routine that parses the whole URI and returns just the blob name. For the above link, it would return “NatGeo/Tiger.jpg“.

Extract blob name from URI

Now let’s look at getting a listing of the blobs in a container, and only show the blob names (including their relative path “folders”, which are part of the blob name). CloudBlobContainer.ListBlobs returns an IEnumerable(IListBlobItem) that you can iterate through to access each blob in the container.

Retrieving a list of files in a container

If you want to only get files that are in one of the “relative path folders”, you just specify that as the first argument in the call to cloudBlobContainer.ListBlobs. For example, if you wanted to get a list of the NatGeo “folder” and all the files in it, that’s the same thing as asking for all the files starting with “NatGeo/“.

Filtering blob retrieval to those staarting with ‘NatGeo’

This should provide enough code to get started managing your own blob storage. Roll your own using the above code samples, or download the finished class from GitHub and try it out!

Retry policies

What if there is a problem reading/writing to blob storage? What kind of problem could you have? Here are three common problems:

  • Azure could have a drive failure and have problems writing to all of your replicas.
  • Azure could be doing something in the background, like moving a partition, which could have an impact on your performance.
  • You could have intermittent connection issues.

Again, we come back to coding defensively, and always handling the case of failure, whether long-term or intermittent. So what should you do? Should you wrap the call in a loop and try it three or four times until it succeeds or it loops too many times? Well, you can definitely do that, but it sure would be a lot of work. Wouldn’t it be nice if Azure storage had retry ability built in? Well, it’s your lucky day, because it does!

For blob storage, there is a retry policy implemented by default, so if you do nothing, it will do what’s called exponential retries. It will fail, then wait a bit of time and try again; if it fails again, it will wait a little longer and try again, until it hits the maximum retry count. (As W. C. Fields once said, “If at first, you don’t succeed, try, try again. Then quit. There’s no point in being a damn fool about it.”)

There are multiple kinds of retry policies, and you can write your own custom retry policy as well. Rather than reinvent the wheel, if you want details I’m going to refer you to a blog entry written by Gaurav Mantri (the original author of the Cerebrata tools). This article was written when the Azure Storage team introduced breaking changes between version 1.7 and version 2.0 of the Storage Library. Gaurav does a great job explaining the different policies, when to use them, and how to implement them. The relevant sections are the ones referring to Storage Client Library 2.0, which still apply with version 2.4 today.

Summary

In this blog entry, I’ve created a class called BlobMethods that illustrates using many of the storage client library commands to manage Azure blob storage. Feel free to use the final version, which is downloadable from the link at the bottom of the article, or get the latest code available on GitHub. I’ve also discussed retry policies and provided a link to an article with more information. In the next entry in this series, I will be talking about how to upload large files to blob storage in blocks, and will show you how to use the built-in features of Azure storage to do that – a feat only 3 people in the world understand at the time of this writing. I will also show you how to stop an upload in the middle, then restart it and finish successfully.