Click here to monitor SSC

Andras

Software Architect - Red Gate Software

The unexpected behaviour of DirectoryInfo.GetFiles() with three letter extensions

Published Friday, August 01, 2008 11:02 AM

There is a documented, but certainly counterintuitive issue with the DirectoryInfo.GetFiles() method in .Net. This method returns a list of files that match a particular pattern. For example in the following example it will return us all the files on drive Z: that have the exact extension “.foobar”


DirectoryInfo folder = new DirectoryInfo(@"z:");

FileInfo[] files = folder.GetFiles("*.foobar",
    SearchOption.AllDirectories);



However, the DirectoryInfo.GetFiles method behaves very differently when you use it with an extension that contains exactly three characters.  For example consider the following example:


FileInfo[] files = folder.GetFiles("*.sql", SearchOption.AllDirectories);


This will, as expected, return all the files with the extension “.sql”. However, it will also return all the files that have the extension “.sql-backup”, “sqlold”, “sql~”, etc.  Surprisingly this is the behaviour that is documented in Visual Studio’s documentation. A quote from that documentation (http://msdn.microsoft.com/en-us/library/ms143327.aspx):


“The matching behavior of searchPattern when the extension is exactly three characters long is different from when the extension is more than three characters long. A searchPattern of exactly three characters returns files having an extension of three or more characters. A searchPattern of one, two, or more than three characters returns only files having extensions of exactly that length.



The following list shows the behavior of different lengths for the searchPattern parameter:
•    "*.abc" returns files having an extension of.abc,.abcd,.abcde,.abcdef, and so on.
•    "*.abcd" returns only files having an extension of.abcd.
•    "*.abcde" returns only files having an extension of.abcde.
•    "*.abcdef" returns only files having an extension of.abcdef.


The reason for the above strange behaviour is the support for the 8.3 file name format. A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.


This behaviour has bitten me with a tool I’ve been working on. This tool reads “.sql” files and builds up a database schema from these files. This schema can then be compared with live database schemata. The primary motivation for such a tool is to support database schemata in source control. However, there were two different scenarios when the application started to fail. In one case I used emacs to edit a file, and it left me (as expected) a backup file postfixed with a ~ character. On another occasion, I used a source control system that decided to store caching information in the same folder where the sql scripts were located, and the cached files had an extension that started with sql and were followed by a timestamp. In both of these cases the database schema that built by my application was inconsistent, due to objects being duplicated.


The only solution to the strange behaviour of the DirectoryInfo.GetFiles() seems to be to check the extension of the file explicitly if you use an extension with exactly three characters. The FileInfo.Extension property returns the full extension of the file, not only the first three characters.


by András

Comments

 

Tom Groszko said:

That list of files you get back can also return files that you can't do anything with because the path legal for windows is illegal for .NET. The exception thrown is System.IO.PathTooLongException The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

I cannot find a way to process these files using .NET. These files are in a network drive connected to a Macintosh system. Back to WIN32 for this task.
August 4, 2008 3:00 PM
You need to sign in to comment on this blog

About András

András Belokosztolszki is a software architect at Red Gate Software Ltd. He is a frequent speaker at many UK user groups and events (VBUG, NxtGen, Developer’s Group, SQLBits). He is primarily interested in database internals and database change management. At Red Gate he has designed and led the development of many database tools that compare database schemata and enable source control for databases (SQL Compare versions 4 to 7), refactor databases (SQL Refactor) and show the history of databases by analyzing the transaction log (SQL Log Rescue). András has a PhD from Cambridge and an MSc and BSc from ELTE, Hungary. He is also a MCSD and MCPD Enterprise. See his articles on simple-talk.
<August 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456
How to Kill a Company in One Step or Save it in Three
 The majority of companies that suffer a major data loss subsequently go out of business. David Wesley... Read more...

Migrating from OCS 2007 R2 to Lync: Part 4
 Having migrated the rest of our users and legacy resources across, and start getting ready to... Read more...

Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL... Read more...

Seth Godin: Big in the IT Business
 Seth Godin has transformed our understanding of marketing in IT. He invented the concept of 'permission... Read more...

Using SQL Test Database Unit Testing with TeamCity Continuous Integration
 With database applications, the process of test and integration can be frustratingly slow because so... Read more...