Andras

Software Architect - Red Gate Software

The unexpected behaviour of DirectoryInfo.GetFiles() with three letter extensions

Published Friday, August 01, 2008 11:02 AM

There is a documented, but certainly counterintuitive issue with the DirectoryInfo.GetFiles() method in .Net. This method returns a list of files that match a particular pattern. For example in the following example it will return us all the files on drive Z: that have the exact extension “.foobar”


DirectoryInfo folder = new DirectoryInfo(@"z:");

FileInfo[] files = folder.GetFiles("*.foobar",
    SearchOption.AllDirectories);



However, the DirectoryInfo.GetFiles method behaves very differently when you use it with an extension that contains exactly three characters.  For example consider the following example:


FileInfo[] files = folder.GetFiles("*.sql", SearchOption.AllDirectories);


This will, as expected, return all the files with the extension “.sql”. However, it will also return all the files that have the extension “.sql-backup”, “sqlold”, “sql~”, etc.  Surprisingly this is the behaviour that is documented in Visual Studio’s documentation. A quote from that documentation (http://msdn.microsoft.com/en-us/library/ms143327.aspx):


“The matching behavior of searchPattern when the extension is exactly three characters long is different from when the extension is more than three characters long. A searchPattern of exactly three characters returns files having an extension of three or more characters. A searchPattern of one, two, or more than three characters returns only files having extensions of exactly that length.



The following list shows the behavior of different lengths for the searchPattern parameter:
•    "*.abc" returns files having an extension of.abc,.abcd,.abcde,.abcdef, and so on.
•    "*.abcd" returns only files having an extension of.abcd.
•    "*.abcde" returns only files having an extension of.abcde.
•    "*.abcdef" returns only files having an extension of.abcdef.


The reason for the above strange behaviour is the support for the 8.3 file name format. A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.


This behaviour has bitten me with a tool I’ve been working on. This tool reads “.sql” files and builds up a database schema from these files. This schema can then be compared with live database schemata. The primary motivation for such a tool is to support database schemata in source control. However, there were two different scenarios when the application started to fail. In one case I used emacs to edit a file, and it left me (as expected) a backup file postfixed with a ~ character. On another occasion, I used a source control system that decided to store caching information in the same folder where the sql scripts were located, and the cached files had an extension that started with sql and were followed by a timestamp. In both of these cases the database schema that built by my application was inconsistent, due to objects being duplicated.


The only solution to the strange behaviour of the DirectoryInfo.GetFiles() seems to be to check the extension of the file explicitly if you use an extension with exactly three characters. The FileInfo.Extension property returns the full extension of the file, not only the first three characters.


by András

Comments

 

Tom Groszko said:

That list of files you get back can also return files that you can't do anything with because the path legal for windows is illegal for .NET. The exception thrown is System.IO.PathTooLongException The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

I cannot find a way to process these files using .NET. These files are in a network drive connected to a Macintosh system. Back to WIN32 for this task.
August 4, 2008 3:00 PM
You need to sign in to comment on this blog

About András

András Belokosztolszki is a software architect at Red Gate Software Ltd. He is a frequent speaker at many UK user groups and events (VBUG, NxtGen, Developer’s Group, SQLBits). He is primarily interested in database internals and database change management. At Red Gate he has designed and led the development of many database tools that compare database schemata and enable source control for databases (SQL Compare versions 4 to 7), refactor databases (SQL Refactor) and show the history of databases by analyzing the transaction log (SQL Log Rescue). András has a PhD from Cambridge and an MSc and BSc from ELTE, Hungary. He is also a MCSD and MCPD Enterprise. See his articles on simple-talk.


















<August 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456
Niklaus Wirth: Geek of the Week
 It is difficult to begin to estimate the huge extent of the contribution that Niklaus Wirth has made to... Read more...

Building an Exchange Server 2007 environment
 Of course, changing a 32,000 mailbox system, based in 40 Exchange Servers, to a centralised 25,000... Read more...

Manage Stress Before it Kills You
 The key to a long career in IT is in learning how to cope adaptively with stress. Matt Simmons, like... Read more...

Expecting the Worst
 Optimists are often disappointed Read more...

To Boldly Ask IT for Development Work
 Phil has always been mystified by the way that, in Science-Fiction films, the crew of space-ships are... Read more...