Click here to monitor SSC
  • Av rating:
  • Total votes: 12
  • Total comments: 1
Rodney Landrum

Using SQL HyperBac to Compress Exported Data from SSIS and BCP

01 June 2011

Because SQL Hyperbac works at the Windows I/O level, it can be configured to work not just with the database files, but also with the files for BCP, DTS, SSIS and SQL Trace/Profiler. This means that many ETL processes can be done faster, more securely and using less disk space than before.

I wanted to start this article off with “Data is like underwear…”, a cheap gimmick to encourage the reader to keep going, so they could find out whether that statement could possibly be true, or whether I’d lost my mind in my SQL DBA dotage. Fortunately, I was smart enough to side-step that initial, potentially offensive thought and will start the article like this: Data and underwear have several things in common, the most important of which is that they can both be compressed. I realized this similarity on a recent two-day trip where I decided it would be cheaper and faster to just bring one bag with me as a carry-on. No checked bag fees and I could walk straight past my fellow passengers as they slump in desperation by the baggage claim carousel. My roll-aboard bag was stuffed to near capacity with all the accoutrements of a traveling IT professional, but it worked beautifully. As we know, data, like underwear, often spends its time in transit: surely it too would benefit its owner if it were compressed. It would fly faster and, ultimately, more cheaply to its destination. Red Gate’s SQL HyperBac provides an easy way to compress data from SQL Server with tools you already have and most likely use frequently. I am going to show you two of them in this article.

Note: SQL HyperBac does not compress underwear.

You may already be aware of SQL HyperBac’s ability to compress SQL Server backup files up to 95% or more. But you may not know that SQL HyperBac can also be used to compress data directly as it is exported from SQL Server via standard export tools like an SSIS package or BCP. In my experience, data compression for data file extracts that are part of an ETL process is usually an additional step, adding time to the process and one more moving part to administer. The same can be said for encryption of file extracts, which SQL HyperBac also supports simultaneously with the compression process.

To demonstrate using SQL HyperBac with SSIS and BCP we need data and lots of it. Of course, the amount of data we are talking about exporting or importing is relative to your individual experience. I will be using just over a million rows of data but it’s not uncommon to work with tens or hundreds of millions of rows. The sample data is taken from a table which has been gathering SQL Server connection information for me for many months now. The table is called SQL_Conn and you can learn more about that solution here if you are interested: http://www.simple-talk.com/sql/ssis/using-ssis-to-monitor-sql-server-databases-/.

Native SQL Export

In this sample database I have loaded 1035818 rows into the SQL_Conn table. To get a benchmark of space utilization, and the time it would take to export that many rows without the benefit of compression, let’s walk through the Export Wizard in SQL Server Management Studio. Figure 1 shows the Wizard, which can be launched in SSMS by right clicking the database, selecting Tasks and choosing “Export Data…”. You can see that I’m choosing to use SQL Server Native Client 10.0 as the Data Source on the (local) SQL instance and will be connecting the DBA_Rep database.

Choose the Data Source

Figure 1. Connect to local SQL Server in Import and Export Wizard.

On the next tab, I am going to choose a Flat File Destination, delimited, and save it with a .csv extension. Figure 2 shows the options.

Hyperbac File Options

Figure 2. Flat File Destination.

Next, I’ll choose to copy data from one or more tables or views and finally select the SQL_Conn table, where the data I want to export resides, as in Figure 3.

Selecting the Table for export

Figure 3. Selecting the SQL_Conn table for export to flat file.

After choosing to run immediately, the export begins to dump out the million-plus records to the file. After approximately 36 seconds it’s complete and the file is ready to be distributed and consumed by whatever process requires it. Typically, the exported file will not build a nest and move in as a permanent resident to the location where it was exported. It will be picked up and moved downstream to another location where another automated process will take over. Sometimes this transfer will be via an FTP mechanism over a slow link connection. The rate of transfer will add to the time of completion of the entire procedure. For the number of records we’re outputting here, the file created was 619,816KB (about 605 MB), as reported in Figure 4.

Size of Native Output File

Figure 4. Size of native output file on disk.

Copying the file to another location on my network, I see an average data transfer rate of about 210KB a second (granted, it is a slow shared wireless network to a remote VPN). While this scenario is not typical of the speeds you will see on a local network, it’s not uncommon for remote connections and file transfers, and serves to bring home the point about the importance of data compression. The only number that really matters is the final one, but Figure 5 shows it ‘in flight’, while I wait for it to complete, with a countdown of 55 minutes.

Waiting for file copy

Figure 5. Waiting and waiting for 605MB to copy.

Imagine if you had 20 files to copy and 20 tables to export. Obviously, the amount of time to copy will increase substantially, holding back critical reports, bank statements, and, Heaven forefend, payroll files containing the expenses data from your recent business trip to Hawaii (I did mention SQL HyperBac does encryption too, right?). Now let’s look at how SQL HyperBac can help us pack all of this data using the same exact steps.

Export Data with SQL HyperBac

I will bypass the steps to install SQL HyperBac because, like all of Red Gate’s products, that process is intuitive, simple, and reliable. Once it’s installed, you can look through the configuration settings if you like, but unless you need to add a custom extension other than “.zip” or “.rar”, the installation takes care of the basic setup for you, and you’re ready to fly.

You recall the steps above where I ran through the Import and Export Wizard to dump the million-plus records to a comma delimited file. I will do the same here, with two differences. First, instead of using “SQL_Conn_Native.csv” as the filename and extension I am going to use “SQL_Conn_Hyperbac.zip”. Secondly, I am not going to run the package immediately, but instead save it and load it in Business Intelligence Development Studio (BIDS), so that I can get some very specific benchmarks. Figure 6 shows how to save the export SSIS package from the Wizard. Notice I have changed the defaults so that I am saving to a File System, and that I chose not to save sensitive data.

Saving the Export SSIS package

Figure 6. Saving the Export SSIS package.

Now that I have an SSIS package, it’s easy to change a few properties and get some interesting comparisons fairly quickly. Since the package was written to a .zip file, which tells SQL Server to write the file through SQL HyperBac, I am going to run that package first and see how long it will take to create the export file for the SQL_Conn table. Simply by changing the extension of the file to any supported HyperBac compression format, such as .zip or .rar, SQL Server will subsequently write the file through SQL HyperBac. This works for backup files as well. You recall it took 36 seconds to create the native 619, 816KB export file. After executing the package in BIDS, the data flow task dumped out the expected 1035818 rows in under 24 seconds as you can see in Figure 7.

. Exporting via SSIS using SQL HyperBac

Figure 7. Exporting via SSIS using SQL HyperBac.

More important than the time saving of dumping this much data with SQL HyperBac is the saving in size on disk, which will, of course, reduce the copy process time as well. You can see in Figure 8 that the size of the compressed file is 43,774 KB (42.7 MB), down from 619,816KB. The copy process is now down to under 4 minutes.

Copying the compressed file

Figure 8. Copying the now compressed 43 MB compressed file.

I know what you’re saying to yourself, and probably contemplating posting as a comment to this article: “That’s all fine, funny man with the data-as-underwear analogy, but why don’t you just copy directly to the destination folder via a UNC path?” That is a great question, SQLPanicAttack77, and I will be glad to answer you, by way of another wonderful tool that is almost as old as I am: BCP. Combining the powers of BCP and SQL HyperBac I will show not only how to export that same data directly to the destination path, but also how to load that data directly in from the .zip file.

BCP and SQL HyperBac

If you’ve been a DBA long enough, you’ll have heard about and have embraced BCP.exe. It’s a command line tool that’s been provided with SQL Server for as long as I can recall (I started with SQL Server version 6.5). Bulk operations in SQL Server, BCP’s raison d’être, are extremely fast for both output and input operations. Of course, you can perform bulk operations with other techniques besides BCP.exe, including SSIS Fast Table Load and T-SQL. I cover several of these techniques in the SQL Server Tacklebox.

An example of a BCP command line to extract the data directly to the network destination path would look like this:

BCP DBA_Rep..SQL_Conn out “\\Server127\d$\Temp\SQL_Conn_BCP.ZIP"-T –N

Executing this command will perform exactly the same operations I did manually before, extracting the data to a compressed .zip file and copying it to a remote location. Instead of copying the file, it’s created and written to over the network. While this may or may not be the best method to achieve the fastest results, it does reduce the number of steps and therefore the potential number of places things could go wrong. It’s really up to you to decide which method to use. The important point is the time savings you will experience with SQL HyperBac compression, and not having to wait around for 605MB to crawl across your network in the wee hours, contending with 5,000 other network careering tasks. You’re probably wondering what the final result of the BCP export over the slow network connection was. Figure 9 shows that it took roughly 4 and a half minutes to export and push the data to a remote .zip file. That is not an astounding number compared to a local disk, but it is still quite fast.

Copying out to a network location with BCP and SQL HyperBac

Figure 9. Copying out to a network location with BCP and SQL HyperBac.

The final step is to load the data back into SQL Server to demonstrate that SQL HyperBac, again using standard tools like BCP, can operate on compressed files directly. I don’t have to extract the contents of the .zip file I created in order to be able to load the data into SQL Server. All I need to do in this example with BCP is change the “out” to “in” and begin the load directly from the SQL_Conn_BCP.zip file:

BCP DBA_Rep..SQL_Conn in “\\Server127\d$\Temp\SQL_Conn_BCP.ZIP"-T –N

The end result is that the data was loaded in the expected times. I could continue to test various different scenarios and tweak the process to get the best possible performance. The gains I have seen thus far are remarkable in disk space savings alone. And this is only for exported data. We have’ot even touched on SQL HyperBac’s database backup compression or on disk data (.mdf and .ldf) compression. I’ll leave those topics for another time, the irony being that as I write this it is 5:30am and I have a flight to catch in less than 3 hours and have yet to pack. I just realized that I am spending too much time on the road and I am not even a consultant or Red Gate Product Evangelist. This is where you’re thinking I am going to make another cheap comment about underwear and travelling and compare that to data, right? Not a chance.

Rodney Landrum

Author profile:

Rodney Landrum has been architecting solutions for SQL Server for over 10 years. He has worked with and written about many SQL Server technologies, including DTS, Integration Services, Analysis Services, and Reporting Services. He has authored three books on Reporting Services including his most recent 2008 edition for Apress. He is a regular contributor to SQL Server magazine and Simple-talk.com, where he blogs on about things like spiders, beer, somnambulance and SQL. His three recent articles in SQL Server magazine on building a DBA repository with SSIS and SSRS have been well received and implemented widely by DBAs around the world. Rodney also speaks regularly on SQL topics at such events as SQL Saturday and the Pensacola SQL Server Users Group. His day job finds him overseeing the health and well being of over 100 SQL Servers as manager of database administration in Pensacola, Florida.

Search for other articles by Rodney Landrum

Rate this article:   Avg rating: from a total of 12 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Nice!
Posted by: the_SQL (view profile)
Posted on: Thursday, August 18, 2011 at 8:57 AM
Message: I really like the fact that you can load the resulting file into SQL Server straight from the .zip file. I have had many occasions myself where I have needed to export data, transfer it across phisical locations and load the data back into SQL Server (fancy spin on ETL) which have taken far longer than I would like to admit. I have often wondered if there was a worm hole hidden somewhere, that I was not aware of, or some way to reduce the data footprint using some sort of bitwise algorythm to send my smoke signals faster. Thanks Rodney, for bringing this very useful tool to my attention, and thanks Red Gate for again helping me to become a more efficient DBA! Now, about the compression of underwear...

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.