Click here to monitor SSC
  • Av rating:
  • Total votes: 51
  • Total comments: 13
Simon Sabin

How to Read Raw Files in SSIS

01 May 2007

If you've ever used SQL Server 2005 Integration Services (SSIS), you will have come across raw files. Microsoft introduced them with SSIS as a mechanism for storing data on the local file system. The design of raw files enables data to be written very quickly but, unfortunately, to achieve this performance Microsoft has used a proprietary binary storage format. This means that you can't open these files and view their contents using a normal editor. What's more frustrating is that Microsoft has yet to supply a tool for reading these files.

To get around this limitation, I developed my own custom "Raw File Reader" tool, which this article briefly describes.

Why raw files?

One of the big benefits of working with raw files is the ease with which you can add them to your packages. The raw file destination requires you to specify only the name of file to use and the columns you want to store. Conversely, if you use a text file destination, then you have to build the structure of the file and ensure it makes your data flow. If you use an OLEDB destination, then you have to have a database and also a table created to store the data.

Their ease of configuration makes raw files ideal for storing errors in your data flows. In the past I've been guilty of configuring certain components of my packages to ignore failures, on the assumption that it "can't error". Of course, this isn't ideal and a much better approach is to capture all errors and handle them appropriately. This is where raw files are very useful. All you need to do is direct your error flows to a raw file destination, specify and filename and select the columns you want to store. If errors do occur, then they will be stored in the raw files.

So, you're storing any errors in a raw file, and now you want to review them – but you can't because of the proprietary format used. This is the exactly the srot fo scenario for which my raw file reader is designed.

Using the Raw File Reader

The Raw File Reader is designed to make it very simple to read the contents or raw files. A few of the key features are as follows ( a full list can be seen at the above link):

  • Read any SSIS Raw file
  • Results displayed in a grid
  • Columns automatically resized
  • Order of columns can be changed
  • Data can be copied from the results grid using CTRL+C
  • Column headers can be included or excluded from the copied data
  • Notification (and option to reload) when the currently loaded raw file has changed i.e. package is rerun

Opening a raw file

The simplest operation is to open a raw file (File | Open). This process will, by default, read the raw file configuration and then immediately read the data from the raw file. The configuration is displayed in a list at the top of the form and the data is displayed in a grid at the bottom of the form (you can change this default behaviour so that the data is not automatically loaded – see later).

Once the data from your raw file has been loaded you can sort the data and copy it into another application. Once a raw file has successfully been read, the filename will appear in the Recent Files section of the File menu. The last 10 files opened will be displayed here.

Sorting and copying data

Data on the grid can be sorted and copied using standard "excel-style" functionality. To sort data, simply click on the column heading of the column you wish to sort. Clicking on a column heading for the second time will reverse the order of the sorting.

You can select a cell from the grid (or multiple cells, by dragging the cursor and/or using the CTRL key) and copy to the clipboard using CRL+C, from where they can be pasted into other applications. When the data is copied to the clipboard the area copied will be the maximum area required to include all the selected cells, however cells not selected will be blank. In the example below, we copy only the object name and the object type:

When the copied data is pasted into Excel you can see that, by default, column headings are included but the data in the unselected cells are not included (you can change this default behaviour – see the Options section, later).

Opening the raw files associated with a package

Opening every raw file individually may be time consuming. Therefore, the tool provides the option of opening every raw file associated with a package. Simply, open the package configuration, identify the raw file destinations and open the files that are specified for these destinations. At present the filename for the raw file destinations must be configured directly, not via expressions or variables.

When you open the package, all of the raw files for the package will be opened, and can be organized using the options in the Window menu. As with opening raw files, recently opened packages are included in the Recent Projects list on the File menu.

Working with raw files whilst debugging packages

The Raw File Reader will automatically reload a raw file when it detects the file has changed. This is a configurable option and so can be turned off if required. If you turn this option off you can manually reload the file by using the Read File command in the Tools menu.

Reading large files

Raw files are often large. If you are using large files I recommend that you disable the option that enables the raw file data to be loaded as soon as the file is opened (see the Options section). Once the configuration has been read, you choose to load only the required columns, by checking and un-checking the columns in the Raw file columns pane at the top of the window. This can drastically reduce the time it takes to load a file. If you want to select/de-select more than one column at a time, select them using the SHIFT or CTRL keys and then click on one of the check boxes. All the selected items will then be changed to the state of the checkbox selected.

Once you have selected the columns you want, the raw file can be read by using the Read File command in the Tools menu.

Options

The configurable options can be accessed by the Options menu in the Tools menu. The copy options configures when headers are included with copied cells.

Option

Description

Always include headers

The column headers, for the cells copied, are always included, irrespective of the number or distribution of cells selected.

Never include headers

Headers are never included

Auto include headers

Column headers are only included if the cells selected are in more than one column. This enables you to copy data from a single column without column headers being added.

Other options are as follows:

Option

Description

Reload file when it changes

This enables the application to reload the data for a raw file automatically when the application detects the raw file has changed

Load file when opened

If selected the data from a raw file is read immediately that a file is opened. If deselected then only the raw file configuration is read when the file is opened.

Raw file extensions

There is no standard naming for raw file extensions, this allows you to configure the extension that the Open File dialog uses to determine a raw file. Multiple extensions can be specified by separating them by semi colons ";"

Future enhancements

I am looking to develop the application further and would appreciate comments on what features you would like to see in future versions. Some of those currently on my list are

  1. Changing the raw file columns pane to display the meta data for each column i.e. data types, sizes etc, this will allow sorting of columns
  2. Row numbers
  3. Extra column headers details
  4. Paging rows
  5. Monitoring a folder for raw files
Simon Sabin

Author profile:

Simon Sabin is a independent consultant specialising in database architecture. He has been working with SQL Server for almost 10 years, was awarded as a Microsoft MVP in April 2006 and is an active member of UK SQL Server Usergroup. You can read his ramblings on SQL Server on his blog, http://sqlblogcasts.com/blogs/simons.

Search for other articles by Simon Sabin

Rate this article:   Avg rating: from a total of 51 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: HIi Simon Sabin
Posted by: Ashwin Kumar.H (not signed in)
Posted on: Tuesday, June 05, 2007 at 6:40 AM
Message: Your article is nice.....can u pls guide me to transfer data from multiple excel files to oledb destination in ssis .....it can be in any way by the designer or by code.....

Subject: Misleading subject
Posted by: Dr. Blue (not signed in)
Posted on: Thursday, June 07, 2007 at 12:41 AM
Message: Given the titel of this article, I expected an answer on the question "How to Read Raw Files in SSIS". It turns out, that you describe how to read raw files in your tool, "which this article briefly describes."

Nevertheless, a good tool I guess, however I keep prefering storing errors in a relational environment to make some analysis easier. I guess you will not implement any SQL dialect to query the raw data within your tool ?!

Subject: From Jayaa Sathiyanarayan
Posted by: Jayaa Jayaa Sathiyanarayan (not signed in)
Posted on: Thursday, June 07, 2007 at 3:12 AM
Message: Hi Simon Sabin,

Happy to read your intelligent article, which would be very much helpful to me. My wishes to you for your future articles.
I would like to have a favour from you, Please could you please help me with the xml document import and export in sql server 2005 using the Management studio. Since the sqlxml configuration settings is bit confusing. Do you need a special xml driver file for creating in odbc the datasource and then call in the tasks wizard in the management studio Or is there any solution. I know there is a solution in using vb script and write XSD and xmlschema document, but I am trying an easy way out to import using the tasks wizard. Hope I am not disturbing amidst ur busy schedule. You can drop an email to me

jayaseeli@hotmail.com

Thanking you
Regards
Jayaa Sathiyanarayan

Subject: RAW file layout
Posted by: Steve Williams (not signed in)
Posted on: Thursday, June 07, 2007 at 3:41 PM
Message: Good initiative to develop your own tool - a "Convert to database" option would probably be useful i.e. create a table with the same layout as the raw file and then you could use it in other reporting

Do you have a link you can publish to the format of the raw file itself for others who may want to read the data but not be reliant on a third-party tool?

Subject: RAW file layout
Posted by: Simon Sabin (not signed in)
Posted on: Monday, August 13, 2007 at 10:06 AM
Message: I don't do anything clever. I just use SSIS to read the file. That way I don't need know the structure of the RAW file

Subject: SSIS
Posted by: Anonymous (not signed in)
Posted on: Wednesday, December 05, 2007 at 7:51 AM
Message: hi michael,

Please help me out in implementing the logic stated below while using SSIS package.

Suppose a flat file contains 1000 records with a header and a trailer records.And SSIS after reading 500 records successfully into database, it fails.Now I want the SSIS to read from the 501 th record when it is started again instead of starting from the first.

Please reply , thanks in advance.

Subject: SSIS
Posted by: Stais (view profile)
Posted on: Wednesday, May 14, 2008 at 9:12 AM
Message: Can any one help me in developing a simple task.
I have few files in my import directory i have to append date end of the each file and archive them in another Folder.So to do this which tasks i have to use.If i use script task what could be the vbscript for this.

Thanks in advance.

Subject: File created by SSIS and having a problem with the tool..
Posted by: Sanjeev (view profile)
Posted on: Monday, June 16, 2008 at 5:43 PM
Message: Tool displays the configuration but no data grid as you mentioned here. There is no error message displayed.

Please help, as I need to contact the input file creator with the errorneous rows.

Thanks...

Subject: Read Raw
Posted by: Anonymous (not signed in)
Posted on: Sunday, June 22, 2008 at 1:43 PM
Message: need one player for read this format on my pc anyone can help me ? RAW RAW RAW APFFFFFFFFFF

Subject: Read Raw
Posted by: Anonymous (not signed in)
Posted on: Sunday, June 22, 2008 at 1:49 PM
Message: need one player for read this format on my pc anyone can help me ? RAW RAW RAW APFFFFFFFFFF

Subject: SSIS: RAWFile pause on 1000 rows
Posted by: mit_2807 (view profile)
Posted on: Tuesday, October 14, 2008 at 2:44 AM
Message: Hi Simon,

This is mit here. Well i have tried-out one thing earlier that was RAWFile to TextFile conversion using SSIS 2005.

Secondly, I tried somewhat simillar to RAWReader.
(Reading rawfile loading it in recordset to fill datagridview.)

Now, I want to have a pause while executing the package for 1000 records and if Msgbox yes is pressed next 1000 records starting from 1001 should be loaded into datagridview.

Can you please help me?

Subject: hi Simon...!!!
Posted by: mit_2807 (view profile)
Posted on: Tuesday, October 14, 2008 at 9:17 AM
Message: Well,

I am glad to have the solution to my querry today itself but still i feel not reached to satisfaction level.

B'coz i did paging (pause with 1000 records in recordset)here after Package execution i feel you better understand what i have done.

Still i wish it should happen during execution itself actually it has performance issue attached to it.Before loading whole package i should be doing 1000 rows load.

Can you please answer to my querry?

Thanking you. Please asap i need to prepare utility which might be helpful to our client n might be so many of my friends out there.






Subject: How to open a raw file
Posted by: Arifa (view profile)
Posted on: Wednesday, July 14, 2010 at 5:55 AM
Message: Hi Simon...

This is what u stated in the above post, "The simplest operation is to open a raw file (File | Open)." but still i coundn't be able to open the raw file.

Please help me by little more explanatory.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.