Andras

Software Architect - Red Gate Software

The unexpected behaviour of DirectoryInfo.GetFiles() with three letter extensions

Published Friday, August 01, 2008 11:02 AM

There is a documented, but certainly counterintuitive issue with the DirectoryInfo.GetFiles() method in .Net. This method returns a list of files that match a particular pattern. For example in the following example it will return us all the files on drive Z: that have the exact extension “.foobar”


DirectoryInfo folder = new DirectoryInfo(@"z:");

FileInfo[] files = folder.GetFiles("*.foobar",
    SearchOption.AllDirectories);



However, the DirectoryInfo.GetFiles method behaves very differently when you use it with an extension that contains exactly three characters.  For example consider the following example:


FileInfo[] files = folder.GetFiles("*.sql", SearchOption.AllDirectories);


This will, as expected, return all the files with the extension “.sql”. However, it will also return all the files that have the extension “.sql-backup”, “sqlold”, “sql~”, etc.  Surprisingly this is the behaviour that is documented in Visual Studio’s documentation. A quote from that documentation (http://msdn.microsoft.com/en-us/library/ms143327.aspx):


“The matching behavior of searchPattern when the extension is exactly three characters long is different from when the extension is more than three characters long. A searchPattern of exactly three characters returns files having an extension of three or more characters. A searchPattern of one, two, or more than three characters returns only files having extensions of exactly that length.



The following list shows the behavior of different lengths for the searchPattern parameter:
•    "*.abc" returns files having an extension of.abc,.abcd,.abcde,.abcdef, and so on.
•    "*.abcd" returns only files having an extension of.abcd.
•    "*.abcde" returns only files having an extension of.abcde.
•    "*.abcdef" returns only files having an extension of.abcdef.


The reason for the above strange behaviour is the support for the 8.3 file name format. A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.


This behaviour has bitten me with a tool I’ve been working on. This tool reads “.sql” files and builds up a database schema from these files. This schema can then be compared with live database schemata. The primary motivation for such a tool is to support database schemata in source control. However, there were two different scenarios when the application started to fail. In one case I used emacs to edit a file, and it left me (as expected) a backup file postfixed with a ~ character. On another occasion, I used a source control system that decided to store caching information in the same folder where the sql scripts were located, and the cached files had an extension that started with sql and were followed by a timestamp. In both of these cases the database schema that built by my application was inconsistent, due to objects being duplicated.


The only solution to the strange behaviour of the DirectoryInfo.GetFiles() seems to be to check the extension of the file explicitly if you use an extension with exactly three characters. The FileInfo.Extension property returns the full extension of the file, not only the first three characters.


by András

Comments

 

Tom Groszko said:

That list of files you get back can also return files that you can't do anything with because the path legal for windows is illegal for .NET. The exception thrown is System.IO.PathTooLongException The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

I cannot find a way to process these files using .NET. These files are in a network drive connected to a Macintosh system. Back to WIN32 for this task.
August 4, 2008 3:00 PM
You need to sign in to comment on this blog

About András

András Belokosztolszki is the architect of SQL Compare versions 4, 5, 6 and 7, SQL Log Rescue and SQL Refactor. He is focused on database internals, database synchronization and database schema evolution.

















<August 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456
Larry Gonick: Geek of the Week
 Cartoonist, mathematician, historian and environmentalist. Larry Gonick proved that learning could be... Read more...

A SysAdmin's Guide to Change Management
 In the first in a series of monthly articles, ‘Confessions of a Sys Admin’, Matt describes the issues... Read more...

Exchange: Recovery Storage Groups
 It can happen at any time: You get a request, as Admin, from your company, to provide the contents of... Read more...

Build Your Own Virtualized Test Lab
 Desmon explains the fundamentals of building a test lab for Windows servers and Enterprise applications... Read more...

Rendering Hierarchical Data with the Treeview
 It sometimes happens that Web Server controls that visualize data don't quite fit with the way that... Read more...