2187-just_azure.svg

Here we are at the final article in this series. So now you know how to set up your blob storage, upload blobs, download blobs, examine blobs, take snapshots and leases, and everything else I could think of to cover. Our last topic is how to move those blobs around. How do you back them up? What if you want to copy them to a storage account on the other side of the world? What if you want to upload your entire iTunes folder to blob storage? (Mine’s well over 100GB). What if you want to download your entire iTunes folder to a different machine after uploading it? What if you have hundreds and hundreds of GBs of data you want to upload to blob storage?

Azure Import/Export Service – an overview

If you have large amounts of file data that you want to move to Azure Blob storage, you have a couple of choices. Your first choice is to use a storage explorer tool like Cerebrata’s Azure Management Studio to upload the files. You can start it and then watch it upload over the next several days (or weeks, or even months, depending on your Internet upload speed and the amount of data). I did something similar to this once, and it took 4 days to upload all of my data. Frankly, I don’t really have that much patience — I’m not a doctor. (Haha! Get it? Patience? Patients? Haha! Hey, I saw that! Stop rolling your eyes!)

Thank goodness that the Powers That Be at Microsoft decided to add a feature to Azure called the Import/Export Service. Here’s how it works. You copy all of your data onto one or more hard drives. (For God’s sake, don’t send your only copy of the data!) Next, you encrypt the data on those drives, then pack the drives in a box carefully with lots of bubble wrap and use a lot of tape to seal the box. (I always use lots of tape — I think opening Christmas presents should be a challenge). After you put a shipping label on the box, you simply mail the box to Microsoft. They will copy the blobs into your storage account where you can decrypt the data and send you the hard drives back (probably packed with a lot less tape).

This also works in reverse. What if you have a ton of data in Azure and you want to get a copy of it so you can run it through an analyzer on your local infrastructure? You can send empty hard drives to Microsoft. They will copy your blob storage from your storage account to your hard drives, encrypt it, and return the drives to you (again, probably with a reasonable amount of tape).

The Nitty Gritty

Let’s talk about the nitty gritty details (not to be confused with the Nitty Gritty Dirt Band). To start this process, you have to create an import job if you want to copy data from your local infrastructure to a hard drive. If you want to get data back from Azure, create an export job. You can do this by using the Azure Management Portal or by writing an application to call the REST interface to the service. (The second option is for those of you who want to automate the process and those of you with copious amounts of free time.)

This tells the Import/Export Service that you are shipping one or more hard drives to an Azure data center. (It doesn’t warn them about the amount of tape on the box 🙂 ). For import, they will know you are sending them drives full of data; for export, they will know you are sending empty drives. Be sure to set your job up correctly so they don’t do it backwards!

Requirements

Let’s look at the requirements for using this service:

  1. You are going to need the Microsoft Azure Import/Export Tool. Luckily, you can download the Import/Export Tool for free as a standalone package. For details about using this tool, check out the Azure Import/Export Tool Reference documentation.
  2. You must have an Azure subscription and one or more storage accounts. Each job that you set up can transfer data to or from one storage account. If you want to copy data to multiple storage accounts, you have to set up multiple jobs – one for each storage account.
  3. Hard Drives: They only accept 3.5″ SATA II/III hard drives that are no larger than 4 TB in size. They must be formatted as NTFS. If they are going to import your data, only the first data volume on the drive will be processed, so you probably don’t want to partition the drive unless you are putting all of your data in the first volume. Note that you can attach a SATA II/III drive externally to most computers using a SATA II/III USB adapter. You also need to keep track of the drive ID for each drive. This is the serial number assigned by the drive manufacturer to a specific hard disk; it should be displayed on the exterior of the drive. And one last thing – don’t ship them the power cords or USB cables for the drives. Ship only the hard drives.
  4. BitLocker encryption: The data must be encrypted using BitLocker with encryption keys protected with numerical passwords.
  5. Blob types: You can upload to or download from both block blobs and page blobs.
  6. You can have up to 20 jobs active for each storage account. Each job can contain up to 10 hard drives. (If you do the math, this means you can’t send them more than 200 drives for a single storage account. Imagine how much tape that package would require!)
  7. The carrier and carrier account number to be used when returning the drives. They actually only support FedEx for regions in the US and Europe; for regions in Asia, they only support DHL.
  8. The data center region where the target storage account resides. (When you select this while setting up the job, Azure will give you the mailing address.)

Steps to follow

Let’s look at an example of what steps you would follow to send in drives with data to be imported.

  1. Determine the data to be imported and the number of hard drives you will need.
  2. Identify the destination for your data.
  3. Use the Import/Export Tool to copy your data to the hard drive(s). This generates a drive journal file for each drive on your local computer; you will need to upload this when you create the job.
  4. Log into the Management Portal and navigate to your storage account. Under Quick Glance on the dashboard page, click Create an Import Job. This brings up a wizard that will prompt you for the necessary information, including the data center region, return carrier information, tracking number of the outgoing shipment, and name of the job.
  5. Ship your disks. (You might want to add more tape to the package before doing this, just to make sure there’s enough.)

If you’re sending in drives to have data exported from blob storage, the only difference in the steps to follow is that you also specify the blobs to be exported. You can select all of them, or filter based on the container and blob names.

Other helpful information

You can see the status of your job in the portal under the Import/Export tab for the storage account being used. You can also view the BitLocker keys in the same tab if you have an export job.

Not all regions are supported. The list of regions currently supported is: East US, West US, North Central US, South Central US, North Europe, West Europe, East Asia, and Southeast Asia. After this blog entry is posted, they will add new data centers. To see the most current list of regions supported, check the Microsoft documentation about the Import/Export service (it’s close to the bottom).

This is the best way to move large amounts of data between your local infrastructure and an Azure data center.

What if you just want to move a medium amount of data, or replicate the data in a data center to another location? In this case, you could use the AzCopy utility.

AzCopy – a free utility from the Azure Storage Team

AzCopy is a command-line utility that you can use to copy files from one blob storage account to another. You can also use it to transfer files between your local filesystem and Blob storage.

Aside from writing your own application, this is one of the few methods available to back up a storage account – it doesn’t sync between two storage accounts, it just copies from one to the other. (You can also have it do incremental copies, so it only copies files that have been added, or have changed since they were originally copied to the target account.

Download and install

The most recent version of AzCopy out in General Availability is 3.1. There is a preview release of 4.x that handles Azure Tables. Since that is irrelevant to Blob storage, I’m going to use 3.0. The download link gives you an msi called MicrosoftAzureStorageTools.msi. If you install the Azure SDK manually, this is one of the msi’s you can download and install. Download the msi and run it to install the tools. Don’t forget to read the End-User License Agreement as carefully as you always do when downloading software from the internet. *cough*

So you’ve installed it. Now what? Good question. If you start looking on your start menu for “AzCopy”, you will be disappointed. It’s actually not in an obvious location – it’s added to your Start Menu as “Microsoft Azure Storage command line”. Selecting this will take you to the Azure SDK folder in a command-line window. At that point, you can change directory to AzCopy (CD .\AzCopy) and you’re off like a herd of turtles.

When I use this, I usually write several scripts for a specific customer, and I end up with scripts and log files and all kinds of flotsam and jetsam in that directory. I don’t want to keep all of that in the same folder as the original application. I deal with this by copying the contents of the AzCopy folder to a new folder under My Documents, then I open a command line window and navigate to the new location. Then I can add as many scripts as I want to without worrying about accidentally deleting AzCopy.exe. My point is that you should know you can copy all the contents of that directory to somewhere else and it will still work beautifully. I’ve even put it on a VM in Azure and run it!

Using AzCopy

So now that we have it, what do we do with it? Open a command window and navigate to that folder with the AzCopy files in it. If you type in

azcopy /?

it will give you all of the options available, with some examples at the end. Feel free to read this later. In the meantime, I’m going to give you some examples and wisdom from actually using this tool that should get you going on your way.

From my perspective, what I’m interested in is copying the files from one storage account to another, i.e. backing up a storage account. In the AzCopy command parameters, the source and target are either a file system directory or a blob container. You can use the file system directory if you want to copy to your local computer or upload from your local computer to blob storage. For our case, both the source and destination will be a URL to a blob storage container.

Unfortunately, you can’t say “copy everything from storage account A to storage account B”. You have to do the copy one container at a time. This means you will run AzCopy for each container in a storage account (or at least for each container that you want backed up). I’m hoping they will eventually add this feature to the product.

Basic commands

Let’s start with the basic command to backup everything from one storage account container to another and include all subfolders. I’m going to put the different parameters on different lines so we can see it. This is the format of the command.

AzCopy
/Source:[URL to the container to be copied from]
/Dest:[URL to the container to copy to]
/sourcekey:[storage account key for source account]
/destkey:[storage account key for dest account]
/S

  • The /Source and /Dest are URL’s pointing to the source and destination containers in blob storage.
  • The /sourcekey and /destkey are the keys to the actual storage account(s).
  • /S tells AzCopy to do the copy recursively, i.e. to include all subfolders.

Let’s say you have the following files in your source container:

  • /myimages/tigers/picture01.jpg
  • /myimages/tigers/picture02.jpg
  • /myimages/bears/bear01.png
  • /myimages/bears/bear02.png
  • /myimages/puffin01.png
  • /myimages/puffin02.png

If you add the /S, it will include all of the files. If you exclude it, you will only get the last two files that are in the top level of the container.

Here’s an example (I’ve snipped most of the storage account keys):

AzCopy
/Source:https://testsnapshots.blob.core.windows.net/myimages
/Dest:https://testsnapshots.blob.core.windows.net/myimages01
/sourcekey:bW4T…dPg== 
/destkey:bW4T…dPg==
/S

This copies all of the files from myimages to myimages01. You could easily copy these to a different storage account simply by changing the destination URL and the key to the storage account. When I run this, I get the following output:

Finished 45 of total 45 file(s).

Transfer summary:
—————–
Total files transferred: 45
Transfer successfully:   45
Transfer skipped:        0
Transfer failed:         0
Elapsed time:            00.00:00:04

It shows how many files were transferred, and of those, how many succeeded and how many failed, and how long it took.

If you add the switch /XO, then when doing the copy it will exclude files where the source file is older than the destination file. In other words, this does incremental copies, copying only files that have changed since they were last copied over to the target container. If I run the command above with /XO on the end, we get this output:

Finished 0 of total 0 file(s).

Transfer summary:
—————–
Total files transferred: 0
Transfer successfully:   0
Transfer skipped:        0
Transfer failed:         0
Elapsed time:            00.00:00:00

This makes sense, as we haven’t changed any of the files in the source. If we go and modify a couple of the files in the source and run that same command again, it prompts us to approve the overwriting of the files that already exist. If you’re running a script, you don’t want it to prompt you because it will stay there until you respond. If you’ve kicked off the script and gone to dinner, you’re going to be very disappointed in the progress when you get back. To suppress this, add the switch /Y and it will overwrite any files that it finds that have the same name. So now we have the following (the difference is the last line):

AzCopy
/Source:https://testsnapshots.blob.core.windows.net/myimages
/Dest:https://testsnapshots.blob.core.windows.net/myimages01
/sourcekey:bW4T…dPg== 
/destkey:bW4T…dPg==
/S /XO /Y

After running this, I get the following output:

Finished 2 of total 2 file(s).

Transfer summary:
—————–
Total files transferred: 2
Transfer successfully:   2
Transfer skipped:        0
Transfer failed:         0
Elapsed time:            00.00:00:01

If you remove the /Y again, and say NO when it asks if you want to replace the files, this will show a transfer skipped count of 2.

Verbose logging

What else do they have that helps us with our use case?  The /V switch will allow you to get a verbose log. So if I change my last line to the following, it will look at all files and copy them regardless of date (I removed /XO), and replace the target.

/S /Y /V:.\verboselog.txt

This shows every single file processed. Here’s the tail end of the file:


[2014-12-07 23:23:36.642] Finished transfer: DogInCatTree.png
[2014-12-07 23:23:36.646] Finished transfer: GuyEyeingOreos.png
[2014-12-07 23:23:36.653] Finished transfer: Scotland_crathes-castle-grampian_9061_600x450.jpg
[2014-12-07 23:23:36.693] Finished transfer: AngryBritishGuy.png
[2014-12-07 23:23:36.699] Transfer summary:
                          —————–
                          Total files transferred: 45
                          Transfer successfully:   45
                          Transfer skipped:        0
                          Transfer failed:         0
                          Elapsed time:            00.00:00:01

You probably don’t want to do that if your storage account has 400,000 files unless you have a really good reason. It can be useful when debugging your commands, though.

Running multiple AzCopy commands in sequence

You can type in the command and hit enter and wait for it to run, then do the next one, but if your storage account is large or all of the containers together have a large number of files, you might be at your desk a long time. One container that I’ve backed up has 400,000+ files, and it takes a couple of hours. Frankly, I’d rather go play Plants vs Zombies Garden Warfare on my XBox One than sit and watch the copying of the files.

You can create a text file, call it something like do_backups.cmd, put the command line statements in the file, and then execute the file from the command line window. It will run all of the backups sequentially. If you do this and direct the output to a text file, you can come back later and check its progress; if it’s finished, you can see how many files were transferred. The other advantage of this is that you have the commands you executed. If there were any errors, you can correct the text file and run it again. This is much easier than typing in that command again.

When I do backups, I always run the cmd file and redirect the output to a file. To run test.cmd, I do this:

@test.cmd>output_20141205.txt

This is an old DOS directive; there must be no spaces between cmd and > and output. If you do this, you can create your cmd file, run it and direct the output, then come back once in a while and check the output file. If you put multiple azcopy commands in one cmd file, you should be able to see at least part of what has run so far if you open the output file while the cmd file is still running. If there are any problems or errors, you have captured them to a file so you can look at them later.

Note: if you run the azcopy /? command with an output directive, you can output it to a text file and then have an easier time reading it than if it were in a command window.

For example: azcopy /?>azcopy_info.txt

Miscellaneous information

When doing the copy from one storage account to another, the files are copied from the server in Azure to the server in Azure. They are not downloaded to the local machine at all. If the source and destination are in the same region, this will be very fast. If they are not in the same region, it will take a little bit longer, but not as long as downloading all of the files and uploading them again.

If you want to move files between the local file system and blob storage, just change the source or destination to a directory (depending on which way you are going). So you could do this:

AzCopy
/Source:d:\_azcopy\myimages
/Dest:https://testsnapshots.blob.core.windows.net/myimages02
/destkey:bW4T…dPg==
/S /Y

This will upload everything in the myimages folder (including subfolders) to the container myimages02.

Summary

In this article, we’ve looked at the Import/Export service for transferring large amounts of data between your local infrastructure and Azure blob storage. We’ve also looked at the AzCopy tool and how to use it to backup one storage account to another or transfer files between blob storage and the local machine. This is the end of our 10-part series on Azure blob storage. I hope it was helpful.

Thanks very much to Mike Wood and Gaurav Mantri for their invaluable feedback and edits.