2187-just_azure.svg

In this post I’m going to talk about snapshots of blobs. I don’t mean you open a blob on your computer and get your fancy camera or your phone out and take a picture, although you can certainly do that.

So what exactly are snapshots? Snapshots remind me of the VAX/VMS operating system, where you could have multiple versions of the same file. You could tell because they ended with a semicolon and a version number. So you would have “mybestfile.txt;1”, “mybestfile.txt;2”, “mybestfile.txt;3” and so on. When you asked to open mybestfile.txt, it would open the most recent version. You could always revert to an earlier version by copying that version of the file on top of the most recent version. This is pretty much how snapshots work.

Taking snapshots

Each time you take a snapshot, the metadata of the base blob is copied to the snapshot, as are the system properties of the blob, which were detailed in my last article (see the section about Properties properties). For each blob that has snapshots we can see the list of snapshots and the metadata, as well as the Snapshot Time, which will be different for each snapshot. To show this, I’ll upload a file multiple times, setting the metadata and taking a snapshot after each one, and then examine the results.

First I’m going to upload a picture to blob storage with the blob name of onepicture.jpg. Then I’ll add metadata (key=”OriginalFilename”, value=name of file uploaded) and take a snapshot. After that, I’ll examine the metadata and blobs. We should be able to see the base blob with the metadata and the snapshot with the metadata. In the following code, cloudBlobContainer is already populated (see the heirarchy of blob storage objects in Part 3 of this series).[hyperlink] To work with the blob, all we have to do is get a reference to the cloudBlockBlob in that container.

Here’s the code to upload DoorwaysAndSandPic.jpg to a blob called onepicture.jpg. Inline comments explain what the code is doing.

One thing in particular you will want to notice is when it creates the snapshot, it assigns this to a CloudBlockBlob object called newBlob; I can then examine its properties. The other way to do this is to query all of the versions of the blob and select the last snapshot, but this is easier!

Here is the DebugPrintBlobInfo method. This simply prints out the blob snapshot properties and metadata, whether or not the blob is a snapshot. If the blob is a snapshot and not the base blob, it prints the time and URI for the snapshot.

Running the code above yields the following results:

Note the name of the file I uploaded is retained in the metadata; the blob is still called onepicture.jpg. The snapshot time is in UTC time and is also displayed in the URI. If I click on that link, I see the picture I uploaded.

List the snapshots

If I retrieve all of the versions of the blob I should see the one I uploaded and I should see the snapshot. To do this, you can use the following code. (You can put the IEnumerable in the foreach loop; it’s separate here because I wanted to examine it after it retrieved the blobs and before it started looping through them.) Inline comments explain what the code is doing.

If I run this code with my blob’s name, at this point I should see the base blob and a snapshot. This is what I get:

theBlob IsSnapshot = True, SnapshotTime = 12/07/2014 04:11:56 +00:00, snapshotURI = https://testsnapshots.blob.core.windows.net/a-testblob/onepicture.jpg?snapshot=2014-12-07T04:11:56.1488004Z .MetaData 1 = OriginalFilename,DoorwaysAndSandPic.jpg

theBlob IsSnapshot = False, SnapshotTime = , snapshotURI = https://testsnapshots.blob.core.windows.net/a-testblob/onepicture.jpg .MetaData 1 = OriginalFilename,DoorwaysAndSandPic.jpg

The second entry is the base blob; the first is the snapshot. Note that these are the same picture because all I’ve done is upload the picture and taken a snapshot of it, I haven’t made any changes to the actual content of the blob. It worked as expected. Note that they both have a SnapshotURI, but only the actual snapshot has a timestamp in the URL to ensure that it retrieves the right version.

Now I’m going to upload another picture, take a snapshot, upload a third picture, take another snapshot, and look at the listing again. My second picture is called “HugeWallOfPictureFrames.png” and my third is “SnakesOnABus.jpg”. I used the upload-and-snapshot code used above for “DoorwaysAndSandPic.jpg”. Here is the resulting listing:

theBlob IsSnapshot = True, SnapshotTime = 12/07/2014 04:11:56 +00:00, snapshotURI = https://testsnapshots.blob.core.windows.net/a-testblob/onepicture.jpg?snapshot=2014-12-07T04:11:56.1488004Z .MetaData 1 = OriginalFilename,DoorwaysAndSandPic.jpg

theBlob IsSnapshot = True, SnapshotTime = 12/07/2014 04:17:10 +00:00, snapshotURI = https://testsnapshots.blob.core.windows.net/a-testblob/onepicture.jpg?snapshot=2014-12-07T04:17:10.7112535Z .MetaData 1 = OriginalFilename,HugeWallOfPictureFrames.png

theBlob IsSnapshot = True, SnapshotTime = 12/07/2014 04:17:10 +00:00, snapshotURI = https://testsnapshots.blob.core.windows.net/a-testblob/onepicture.jpg?snapshot=2014-12-07T04:17:10.8502674Z .MetaData 1 = OriginalFilename,SnakesOnABus.jpg

theBlob IsSnapshot = False, SnapshotTime = , snapshotURI = https://testsnapshots.blob.core.windows.net/a-testblob/onepicture.jpg .MetaData 1 = OriginalFilename,SnakesOnABus.jpg

Again, the last one is the current version of the blob. I wouldn’t rely on this being true forever, it’s always safer to check the IsSnapshot property.

Another interesting thing about snapshots is that if the blob has a lease, the lease is not retained with the snapshot. Also, snapshots themselves cannot be leased. Only the base blob will retain a lease.

Now you know how to make snapshots and query the blob to see the snapshots. So next up is restoring a snapshot, also known as “promoting a snapshot” (its parents will be so proud!).

Promoting a snapshot

Snapshots are read only. You can read them, copy them and delete them, but you cannot modify them. Included in each snapshot are the system properties of the blob which includes properties such as the content type, length, content language, content md5. Also included in each snapshot is a copy of the metadata from that date/time.

“Promoting the snapshot” is simply copying a snapshot over its base blob. This is how you can restore an earlier version of the blob. When you do this, the original snapshot is left in place, and the base blob is replaced with a copy. The base blob’s system properties and metadata are also overwritten with that data from the snapshot. Promoting a snapshot will completely replace the base blob, so if you want to be able to recover the base blob later, you may want to take a snapshot before doing the promotion.

One use of this is to allow your user to see the list of blobs (with original file names), and let them select one to copy over the base to restore it. Imagine someone trying to select a picture for their profile, and they keep changing the picture trying to find one they like, and then decide they liked the one six versions ago. (Not that I have any experience at that. *cough*)

You can also copy a snapshot to a blob with a different name. When you do this, the resulting blob is writable. So if you have a bunch of snapshots on a blob, and you want writable versions of each of those snapshots, you can copy each snapshot to a name different from the original blob. To restore one of the snapshots to the base snapshot, you simply copy it from the snapshot to the blob.

As an example of copying out the snapshots to their own blobs I’ve created some code to put into the loop doing the print for the list of versions. This code takes each version and copies it to a new blob and calls the copy “CopyOf[OriginalFileName]”. So in the case above, I would end up with the newest base blob, and then blobs for each snapshot called CopyOfDoorwaysAndSandPic.jpg, CopyOfSnakesOnABus.jpg, and CopyOfHugeWallOfPictures.jpg. Here’s the code you can add in that foreach loop to do this (I put this at the end, after outputting the information about the metadata).

This copies each picture from the snapshot to a regular blob, ending up making all of the pictures available.

More information about snapshots

When you copy a snapshot to a new blob or copy the base blob to another blob, the snapshots of the original blob are not copied to the new blob. However, if you copy the snapshot to a blob that exists and has snapshots, it overwrites the base blob, but doesn’t affect the snapshots for the target blob. So if you have a blob called A.jpg that has snapshots and you copy B.jpg over it, the blob will now contain the contents of B.jpg; the previous snapshots of A.jpg still exist, so you can easily revert to a previous version.

Another thing to note is that when you create a snapshot, if the blob has uncommitted blocks, they are not copied to the snapshot. Only committed blocks are copied to the snapshot, along with the list of committed blocks. You can read more about the uncommitted block or editing blobs at the block level in my previous article Uploading Large Blobs.

You can also specify an access condition when you create a snapshot. If the condition is met, the snapshot is created; if the condition is not met, the snapshot is not created and an error is returned.

One last thing to note is that blob storage doesn’t actually keep each and every complete version of the blob, it keeps the incremental differences. For example, if you have a text file that’s 10KB and you take a snapshot and then add 2KB of text to the bottom of it, the amount of storage it takes up is 10KB + 2KB, not 10KB + 12KB. If you’re completely changing the content of the file (as I am above with the images), it probably doesn’t help much, but if it’s subsequent versions of an Office document, it could have a significant effect.

Deleting a blob with snapshots

If a blob has snapshots, you can’t delete the blob until you delete the snapshots. This will return an error if you try it. You can delete one snapshot of a blob, specific snapshots, or all of the snapshots on a blob. If you want to delete one snapshot, you can iterate through the list of snapshots and just delete that one.

To delete the snapshots from a blob, but not the blob itself:

To delete the snapshots and the blob:

To delete the blob, but give an error and don’t delete the blob if there are snapshots:

I think the last one is interesting. If the blob has no snapshots, that line of code will delete the blob. If the blob does have snapshots, an error will be returned.

Summary

In this article, I discussed how to keep multiple versions of the same blob using snapshots and showed the code for using the storage client library to manage them. In the next article, we’ll look at how you can lock blobs to keep them from being modified by other processes by taking leases on them.