The TSQL of Text Files

Phil returns to the old subject of 'Getting text-based data in and out of SQL Server'. He shows various easy ways of getting a file listings of directories from the file system, shows how one can access the Shell automation Objects, and demonstrates several ways of reading or writing data between database and file

This article revisits the theme of an article I wrote for Simple-Talk two years ago called Reading and Writing Files in SQL Server using T-SQL , which to my amazement, has proved to be the most popular article I’ve ever written.

In this belated second part (I’ll assume you have SQL Server 2005 or 2008 most of the time) We’re going to show you the TSQL you can use to

  • Get a List of Files in a directory/folder, with, if you with, the subfolders and attributes such as file size, revision date, or short-form.
  • Access the SHELL.APPLICATION to do file operations
  • Read data from file into a TSQL variable
  • Read data into a table, each line in a table row
  • Write data from a TSQL variable into a file
  • Write the String-based results of a SQL Expression into a file

These are simple operations, but it is some of the simplest operations in a database that can prove to be the most taxing:  By way of contrast when you are working in DOS, the DIR or ls commands are one of the first commands that one learns. The operation of getting data to or from files is so easy; but not always so in SQL Server. This is because SQL Server is usually the worst place to do such an operation. Very occasionally, it isn’t. In the hard reality of life in an IT department, it is sometimes the only practical option. For the majority of operations, it is the obvious choice to use SSIS/DTS, but not invariably.

The first problem you face is security. The Database User that runs these routines that I show will need ‘special powers’, powers that can’t possibly be given to general users for reasons of security .It also requires you to open up hatches in your database security ‘Surface Area Configuration’, that are normally battened down, such as OLE Automation, and use of the command line. (you’ll probably need to use the utility called ‘Surface Area Configuration for Features’) We’re not going to cover these issues here, or show you ways to solve the problem of keeping your database secure. Instead, refer to The SQL Server Security Workbench and The Process Delegation workbench/

Getting a List of Files in a directory/folder.

Why should you ever want to get a list of files in SQL Server, perhaps with their properties, such as creation date or size as well? Most commonly, it is when one wants to archive off old backup files as part of a maintenance program, or delete old log-shipping files. I often find myself needing a list of files from a directory and all its subdirectories when I’m importing HTML or data files into a database. There are quite a few occasions where one wants an automatic data feed based on files appearing in a directory. I’m routinely having to pick up text-based log files of various description, and import them into tables. I’ve even used routines like these to index up an intranet store of documents for fast access. All this requires you to be able to check what’s on disk. read stuff in, and sometimes write stuff out., preferably at lightening speed. It is not always as simple as one might wish.

The most obvious way of getting a list of files into TSQL as a table is to use the following technique. I give it to you in a stored procedure that you will find is of little use except where it is part of a longer process. I’ll explain why later.

Fine, but there are problems. Try capturing the result into a table, …

…and you get an error saying  ‘An INSERT EXEC statement cannot be nested

There are other problems to tackle. You only get the full path when you use the /S switch (to list subdirectories) and the DIR command goes doolally if you use the /S switch and if it can’t find the directory spec. Also, if you want to use this to archive all backups made over three weeks ago then you’re stuffed. the bare command doesn’t pass back the file sizes, or dates, and the information can’t be got reliably from the DIR command anyway for a variety of reasons (varying Date Settings, DOS version differences in the column settings, not all useful info there etc). Another problem is that you usually want the directory too without having to parse the filename. If you omit the /B /S, you get the filename, with the /B you get the whole path: Hassle.

There is one problem we can solve pretty smartly. It is due to the fact that it is a stored procedure rather than a function. insert into ...Exec cannot be nested. You cannot use the obvious solution of doing an xp_cmdshell in a function because insert into…execute isn’t allowed in a function. You can do an OPENROWSET to do the EXECUTE, but the OPENROWSET only takes string literals, not variables, so you cannot specify the directory to get a directory listing from! This requires patience and cunning. Our first technique will be to use an XML output variable.
 

We can use the routine like this…

so, fired with enthusiasm, we construct a solution that allows us to do both subdirectory and simple directory listings, and tells us the directory that every file was found in. Just to show off a bit, we can get it just to list the directories.

The code for the procedure spGetFilePaths is here. You need to click here to see the code! And then, maybe you could use it something like this, to get a listing of files

Let’s try a more sophisticated approach to try to encapsulate it in a function. We can use OLE automation in functions but there is a big snag. The File System Object (FSO) doesn’t allow file listings from xp_oaMethod since the item property of a folder doesn’t have an index value. Strangely, Microsoft also have a slightly flawed method that uses SHELL.APPLICATION. This last one works, more or less (It hates it if the directory you specify doesn’t exist, so we have to program around that)

Accessing the SHELL.APPLICATION to get a directory listing.

Here is a function that uses SHELL.APPLICATION automation. You’ll have to click here to see the source code!

As you can see, it makes the whole process a lot simpler, but it is very slow.

This routine returns a table representing all the items in a folder. It takes as parameter the path to the folder. It does not take wildcards in the same way as a DIR command. Instead, you would be expected to filter the Table results of the function using SQL commands.

Notice that the size of the item (e.g. file) is not returned by this function. It is just not in the list of properties. Microsoft can be cussed at times.

This function uses the Windows Shell COM object via OLE automation. It opens a folder and iterates though the items listing their relevant properties. You can use the SHELL object to do all manner of things such as printing, copying, and moving filesystem objects, accessing the registry and so on. Powerful medicine.

This seems great. But the problem is that it doesn’t iterate over subdirectories. Oops. Also, it is a bit short on properties. Still it does for most purposes. It isn’t a complete substitute for using DIR because it is s…l…o..w. Most of the time, this doesn’t matter much, but just occasionally it is a pain.

So we now do a function that addresses most of these problems except for the slowness.

You will have to click here to see the source code for the dbo.files function. This is an example of how it might be used.

Getting Text into a TSQL Variable

Now we have solved the awful problem of getting the names of the files we want, we can have some fun. It is now easy to read a file into SQL Server. There are a number of ways of getting stuff in to SQL Server but this is the simplest.

Yes, but some silly old moo in Microsoft has cobbled the OPENROWSET function so it won’t take a variable so unless you hard-wire your OPENROWSET commands to literals rather than variables (of course you don’t) you can’t use it.

However, there is a cunning trick you can use with sp_ExecuteSQL to get round this. It is easier to demonstrate than explain.

So let’s try it out by writing a routine to find a particular string, ‘Explosive’ in this example, in a whole lot of files. We’ll use our ListPathsXML procedure from our file-listing armoury just because we only need the paths, but no other file information and we want it fast.

Well, that is fine, but sometimes you want something that is going to bring the text from a file line-by-line so you can process it in a table, line-by-line. 

Getting Text into a Table

 We’ll illustrate this with a version of the routine that reads in all the errorlog files for an instance line-by-line as a table, and prints out each line that has the word ‘error’ in it. This uses a new procedure called spLoadTextFromAFile which does just that. It loads the text from a file line buy line and returns it as a result. You’ll have to see the code by clicking here. And here is the procedure in use.

Write data from a TSQL variable into a file

If you want to write text out from a variable to a file, here is something that will do the trick.

We’ll first read text in and then write it back out again to a different file just to get confidence that nothing gets corrupted. We’ll use a handy large text file for this. In effect, we are copying the file.

Write the String-based results of a SQL Expression into a file

We can do other things with the same idea we’ve just seen. The following procedure even allows you to write the results of a SQL Statement to a file as long as the result is a single string column. This sort of idea can be adapted to a number of purposes.

So here we are. We have a range of functions and stored procedures to do the basic file I/O, which you can combine with the routines I provided in the first article. None of this is pretty or elegant, but it is all part of the armoury of the DBA and database developer for those occasional routines where, for one reason or another, there isn’t a better alternative solution.

Downloads

Tags: , , , , ,

  • 158220 views

  • Rate
    [Total: 65    Average: 4.6/5]
  • Phil Factor

    Doing folder access the CLR way.
    In case you were wondering whether it would be better to do directory listings with a table-valued UDF, I’ll be tackling that method next week, God willing.

  • Rich

    A way I have done this
    Great article, ways of doing this I have not thought of. My way is too run two dos dir commands. One to get a table of file names, the other to get the file data. Using both we can get the following information: File Date, File Size, File Name, File Type, File Path. It works and was fun to do 🙂 code here:

    SELECT @CmdStrInfo = ‘dir /O-S /-C /A-D “‘ + @Path + ‘”‘        
    SELECT @CmdStr = ‘dir /B “‘ + @Path + ‘”‘        

    INSERT INTO [#DirFileInfo]
          
    (
          
    [DirFileInfo]
            
    )  
      
    EXEC xp_cmdshell @CmdStrInfo
      
    UPDATE [#DirFileInfo]
      
    SET [DirFilePath] = @PathType
    WHERE [DirFilePath] IS NULL    
      
    INSERT INTO [#DirFile]
          
    (
          
    [DirFile]
            
    )  
      
    EXEC xp_cmdshell @CmdStr  

    INSERT INTO #Files
          
    (
          
    [FileDate],
          
    [FileSize],
          
    [FileName],
          
    [FileType],
          
    [FilePath]
          
    )
    SELECT CONVERT(DATETIME, LEFT([DirFileInfo], 17)),
          
    RTRIM(LTRIM(REVERSE(SUBSTRING(REVERSE(dfi.[DirFileInfo]), LEN(df.[DirFile]) + 2, (CHARINDEX(‘ ‘, REVERSE(dfi.[DirFileInfo]), LEN(df.[DirFile]) + 2) – LEN(df.[DirFile]) + 2))))),
          
    df.[DirFile],
          
    UPPER(REVERSE(LEFT(REVERSE([DirFileInfo]), CHARINDEX(‘.’, REVERSE([DirFileInfo])) – 1))),
          
    dfi.[DirFilePath]
      
    FROM #DirFileInfo dfi
    INNER JOIN [#DirFile] df ON LEFT(REVERSE(dfi.[DirFileInfo]), LEN(df.[DirFile])) = REVERSE(df.DirFile)
    WHERE (dfi.DirFileInfo LIKE ‘% AM %’
        
    OR dfi.DirFileInfo LIKE ‘% PM %’)  
       AND
    SUBSTRING(REVERSE(dfi.[DirFileInfo]), LEN(df.[DirFile]) + 1, 1) = ‘ ‘ — This means we dont get dupes

  • Saravanan

    CPU
    I tried these methods and it was rejected by the Security group in our company. A small SSIS Script works like a charm.

  • Phil Factor

    re: CPU
    As I say in the article, ‘for the majority of operations, it is the obvious choice to use SSIS/DTS, but not invariably’. I suspect that SSIS is inherently safer, though it would be interesting to hear from an SSIS expert on its potential risks.

  • Dustin

    This a must read
    Phil,

    Great article, I have ran into the need to do these types of things many times during my time as a DBA

  • Jim

    Good Stuff!
    The quality of the writing and thought on this site is outstanding!

  • Kraig Kerr

    Awesome Stuff!
    But you didn’t cover using “Bulk Insert” to get text into a table

    create table #tmpDept
    (
    [CurrentLSN] [nvarchar](23) NOT NULL,
    [AllocUnitId] [bigint] NULL,
    [AllocUnitName] [nvarchar] (387) NULL,
    [DateTime] [datetime] NOT NULL,
    [RowLogContents] [varbinary](8000) NULL
    )

    create nonclustered index [IX_AllocUnitID] on #tmpDept ([AllocUnitId] ASC)

    bulk insert #tmpDept
    from ‘C:logfiles

  • cosmic

    Corrections
    Hi, thanks for the code. Very useful.

    Some minor corrections in the files, function:

    – In the update statement, the column [DateLastModified] is miswritten.
    – At the end of the function, is required an END clause.

  • wilkanah

    On the Dir Function
    The way sp_OACreate is used….with Shell.Application…

    You’re right its slow… but it also uses SQLServer memory…and is a serious potential for instability in SQLServer.

    I tried to do it like so…
    EXEC @hr = sp_OACreate ‘Shell.Application’, @Object OUT,4

    but i keep getting error…
    0x80040154 ODSOLE Extended Procedure Class not registered

    The extra parameter 4 to sp_OACreate is supposed to say use system memory not memory allocated to SQLServer…

    I’ve even gone on to register shell32.dll… i imagine that can’t have been the problem… no luck yet…

    Any suggestions will be appreciated.

  • igorship

    File Function
    Next code call in the WHILE cycle for every file(and reserves memory):
    IF @hr=0 EXEC @hr = sp_OAMethod @objFileSystem,
    ‘GetFile’, @objFile OUT, @Filename

    but code:
    EXEC sp_OADestroy @objFile
    call only one time in the end function.
    I getting error “Error whilst getting the attributes of … ” when files count more 233.
    If EXEC sp_OADestroy @objFile put in the WHILE cycle everything is ok.

  • NESQLDude

    Here is a way to get the directory listing quickly and with no xp_cmdshell
    CREATE TABLE #MyFile
    (directory varchar(100),
    depth int,
    isfile int
    )
    INSERT INTO #MyFile (directory, depth, isfile)
    EXEC xp_dirtree ‘C:CMDATADATAFILES’,1,1
    select * from #MyFile

    this runs in about 1 second flat…