Because SQL Hyperbac works at the Windows I/O level, it can be configured to work not just with the database files, but also with the files for BCP, DTS, SSIS and SQL Trace/Profiler. This means that many ETL processes can be done faster, more securely and using less disk space than before.
I wanted to start this article off with “Data is like underwear…”, a cheap gimmick to encourage the reader to keep going, so they could find out whether that statement could possibly be true, or whether I’d lost my mind in my SQL DBA dotage. Fortunately, I was smart enough to side-step that initial, potentially offensive thought and will start the article like this: Data and underwear have several things in common, the most important of which is that they can both be compressed. I realized this similarity on a recent two-day trip where I decided it would be cheaper and faster to just bring one bag with me as a carry-on. No checked bag fees and I could walk straight past my fellow passengers as they slump in desperation by the baggage claim carousel. My roll-aboard bag was stuffed to near capacity with all the accoutrements of a traveling IT professional, but it worked beautifully. As we know, data, like underwear, often spends its time in transit: surely it too would benefit its owner if it were compressed. It would fly faster and, ultimately, more cheaply to its destination. Red Gate’s SQL HyperBac provides an easy way to compress data from SQL Server with tools you already have and most likely use frequently. I am going to show you two of them in this article.
Note: SQL HyperBac does not compress underwear.
You may already be aware of SQL HyperBac’s ability to compress SQL Server backup files up to 95% or more. But you may not know that SQL HyperBac can also be used to compress data directly as it is exported from SQL Server via standard export tools like an SSIS package or BCP. In my experience, data compression for data file extracts that are part of an ETL process is usually an additional step, adding time to the process and one more moving part to administer. The same can be said for encryption of file extracts, which SQL HyperBac also supports simultaneously with the compression process.
To demonstrate using SQL HyperBac with SSIS and BCP we need data and lots of it. Of course, the amount of data we are talking about exporting or importing is relative to your individual experience. I will be using just over a million rows of data but it’s not uncommon to work with tens or hundreds of millions of rows. The sample data is taken from a table which has been gathering SQL Server connection information for me for many months now. The table is called SQL_Conn and you can learn more about that solution here if you are interested: http://www.simple-talk.com/sql/ssis/using-ssis-to-monitor-sql-server-databases-/.
Native SQL Export
In this sample database I have loaded 1035818 rows into the SQL_Conn table. To get a benchmark of space utilization, and the time it would take to export that many rows without the benefit of compression, let’s walk through the Export Wizard in SQL Server Management Studio. Figure 1 shows the Wizard, which can be launched in SSMS by right clicking the database, selecting Tasks and choosing “Export Data…”. You can see that I’m choosing to use SQL Server Native Client 10.0 as the Data Source on the (local) SQL instance and will be connecting the DBA_Rep database.
On the next tab, I am going to choose a Flat File Destination, delimited, and save it with a .csv extension. Figure 2 shows the options.
Next, I’ll choose to copy data from one or more tables or views and finally select the SQL_Conn table, where the data I want to export resides, as in Figure 3.
After choosing to run immediately, the export begins to dump out the million-plus records to the file. After approximately 36 seconds it’s complete and the file is ready to be distributed and consumed by whatever process requires it. Typically, the exported file will not build a nest and move in as a permanent resident to the location where it was exported. It will be picked up and moved downstream to another location where another automated process will take over. Sometimes this transfer will be via an FTP mechanism over a slow link connection. The rate of transfer will add to the time of completion of the entire procedure. For the number of records we’re outputting here, the file created was 619,816KB (about 605 MB), as reported in Figure 4.
Copying the file to another location on my network, I see an average data transfer rate of about 210KB a second (granted, it is a slow shared wireless network to a remote VPN). While this scenario is not typical of the speeds you will see on a local network, it’s not uncommon for remote connections and file transfers, and serves to bring home the point about the importance of data compression. The only number that really matters is the final one, but Figure 5 shows it ‘in flight’, while I wait for it to complete, with a countdown of 55 minutes.
Imagine if you had 20 files to copy and 20 tables to export. Obviously, the amount of time to copy will increase substantially, holding back critical reports, bank statements, and, Heaven forefend, payroll files containing the expenses data from your recent business trip to Hawaii (I did mention SQL HyperBac does encryption too, right?). Now let’s look at how SQL HyperBac can help us pack all of this data using the same exact steps.
Export Data with SQL HyperBac
I will bypass the steps to install SQL HyperBac because, like all of Red Gate’s products, that process is intuitive, simple, and reliable. Once it’s installed, you can look through the configuration settings if you like, but unless you need to add a custom extension other than “.zip” or “.rar”, the installation takes care of the basic setup for you, and you’re ready to fly.
You recall the steps above where I ran through the Import and Export Wizard to dump the million-plus records to a comma delimited file. I will do the same here, with two differences. First, instead of using “SQL_Conn_Native.csv” as the filename and extension I am going to use “SQL_Conn_Hyperbac.zip”. Secondly, I am not going to run the package immediately, but instead save it and load it in Business Intelligence Development Studio (BIDS), so that I can get some very specific benchmarks. Figure 6 shows how to save the export SSIS package from the Wizard. Notice I have changed the defaults so that I am saving to a File System, and that I chose not to save sensitive data.
Now that I have an SSIS package, it’s easy to change a few properties and get some interesting comparisons fairly quickly. Since the package was written to a .zip file, which tells SQL Server to write the file through SQL HyperBac, I am going to run that package first and see how long it will take to create the export file for the SQL_Conn table. Simply by changing the extension of the file to any supported HyperBac compression format, such as .zip or .rar, SQL Server will subsequently write the file through SQL HyperBac. This works for backup files as well. You recall it took 36 seconds to create the native 619, 816KB export file. After executing the package in BIDS, the data flow task dumped out the expected 1035818 rows in under 24 seconds as you can see in Figure 7.
More important than the time saving of dumping this much data with SQL HyperBac is the saving in size on disk, which will, of course, reduce the copy process time as well. You can see in Figure 8 that the size of the compressed file is 43,774 KB (42.7 MB), down from 619,816KB. The copy process is now down to under 4 minutes.
I know what you’re saying to yourself, and probably contemplating posting as a comment to this article: “That’s all fine, funny man with the data-as-underwear analogy, but why don’t you just copy directly to the destination folder via a UNC path?” That is a great question, SQLPanicAttack77, and I will be glad to answer you, by way of another wonderful tool that is almost as old as I am: BCP. Combining the powers of BCP and SQL HyperBac I will show not only how to export that same data directly to the destination path, but also how to load that data directly in from the .zip file.
BCP and SQL HyperBac
If you’ve been a DBA long enough, you’ll have heard about and have embraced BCP.exe. It’s a command line tool that’s been provided with SQL Server for as long as I can recall (I started with SQL Server version 6.5). Bulk operations in SQL Server, BCP’s raison d’être, are extremely fast for both output and input operations. Of course, you can perform bulk operations with other techniques besides BCP.exe, including SSIS Fast Table Load and T-SQL. I cover several of these techniques in the SQL Server Tacklebox.
An example of a BCP command line to extract the data directly to the network destination path would look like this:
BCP DBA_Rep..SQL_Conn out “\\Server127\d$\Temp\SQL_Conn_BCP.ZIP"-T –N
Executing this command will perform exactly the same operations I did manually before, extracting the data to a compressed .zip file and copying it to a remote location. Instead of copying the file, it’s created and written to over the network. While this may or may not be the best method to achieve the fastest results, it does reduce the number of steps and therefore the potential number of places things could go wrong. It’s really up to you to decide which method to use. The important point is the time savings you will experience with SQL HyperBac compression, and not having to wait around for 605MB to crawl across your network in the wee hours, contending with 5,000 other network careering tasks. You’re probably wondering what the final result of the BCP export over the slow network connection was. Figure 9 shows that it took roughly 4 and a half minutes to export and push the data to a remote .zip file. That is not an astounding number compared to a local disk, but it is still quite fast.
The final step is to load the data back into SQL Server to demonstrate that SQL HyperBac, again using standard tools like BCP, can operate on compressed files directly. I don’t have to extract the contents of the .zip file I created in order to be able to load the data into SQL Server. All I need to do in this example with BCP is change the “out” to “in” and begin the load directly from the SQL_Conn_BCP.zip file:
BCP DBA_Rep..SQL_Conn in “\\Server127\d$\Temp\SQL_Conn_BCP.ZIP"-T –N
The end result is that the data was loaded in the expected times. I could continue to test various different scenarios and tweak the process to get the best possible performance. The gains I have seen thus far are remarkable in disk space savings alone. And this is only for exported data. We have’ot even touched on SQL HyperBac’s database backup compression or on disk data (.mdf and .ldf) compression. I’ll leave those topics for another time, the irony being that as I write this it is 5:30am and I have a flight to catch in less than 3 hours and have yet to pack. I just realized that I am spending too much time on the road and I am not even a consultant or Red Gate Product Evangelist. This is where you’re thinking I am going to make another cheap comment about underwear and travelling and compare that to data, right? Not a chance.