If you ask me, and I take it as implicit by your visit that you do
(sorry about that, but it's Friday afternoon and I need a weekend),
Windows is far too facist about filenames. Most notably in terms of the
characters one can put into a filename, and the obstreperous way it will cough and whine at you if you dare to break the rules.
For the application I'm working on, we automatically generate filenames
based on certain criteria. These filenames can contain descriptive text
which we don't necessarily want to restrict to the set of
filename-legal characters (I'll return to this point momentarily). So
we have to filter the descriptive text and strip out illegal characters.
So what is the set of legal characters? I consulted my trusty internet, and after the usual few minutes throwing seaweed over it
and sticking pins in effigies, it yielded the following rather useful
bit of wisdom, from the Windows Platform SDK, under
Win32
and COM Development, System Services, Files and I/O, SDK Documentation,
Storage, Storage Overview, File Management, Creating, Deleting and
Maintaining Files, Naming a File.
...
Use any character in the current code page for a name, except
characters in the range of 0 through 31, or any character that the file
system does not allow. A name can contain characters in the extended
character set (128–255). However, it cannot contain the following
reserved characters:
< > : " / \ |
The following reserved device names cannot be used as the name of a
file: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7,
COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
Also avoid these names followed by an extension, for example, NUL.tx7.
...
Now at first, as I was in a hurry, I read this as "everything except < > : " / \ | is ok".
But I forgot that characters below 32 are explicitly disallowed.
And then I noticed that (quite naturally) applications don't let you
save filenames containing the "*" key, as this is used as a wildcard.
And also filenames containing the "?" key are disallowed. This is also
a wildcard character, and used to escape long paths by prepending
"\\?\" (I always liked that little hack. Such style.)
But with a firm conviction that MSDN Does Not Lie, I checked my ASCII
table to make sure I wasn't hallucinating. Sure enough, "*" and "?" are
ASCII 42 and 63 (please don't anyone tell me they knew that from
memory, or I may cry).
But, and aha, there's another clause I hadn't read properly: "...or any
character that the file system does not allow". So, time to work out
what characters are considered illegal by FAT, FAT32 and NTFS,
methought.
Around this point I came across the system structure BIGFATBOOTFSINFO,
which I only mention as it's an excellent name. But I digress.
So, I wandered towards the Platform SDK again, by way of
Win32 and COM Development, Development
Guides, Windows 95/98/Me Programming, Long File Names, Long File Names
and the Protected-Mode FAT System. It was a long walk, but I sustained myself on the way with a coffee. Herein I found more wisdom.
...
When
an application creates a file or directory that has a long file name,
the system automatically generates a corresponding alias for that file
or directory using the standard 8.3 format. The characters used in the
alias are the same characters that are available for use in MS-DOS file
and directory names. Valid characters for the alias are any combination
of letters, digits, or characters with ASCII codes greater than 127,
the space character (ASCII 20h), as well as any of the following
special characters.
$ % ' - _ @ ~ ` ! ( ) { } ^ # &
The space character has been available to applications for file names
and directory names through the functions in current and earlier
versions of MS-DOS. However, many applications do not recognize the
space character as a valid character, and the system does not use the
space character when it generates an alias for a long file name. MS-DOS
does not distinguish between uppercase and lowercase letters in file
names and directory names, and this is also true for aliases.
The set of valid characters for long file names includes all the
characters that are valid for an alias as well as the following
additional characters.
+ , ; = [ ]
...
So, we're getting a little closer. At least FAT16 fesses up to not allowing "*" and "?".
As to exactly which characters are disallowed by NTFS, this remains a
mystery to me. It doesn't fess up to disallowing "*" and "?", and
indeed googling suggests that all Unicode characters are allowed by
NTFS as filenames. So it is presumably the OS level Windows, quite
possibly not much deeper than the Win32 API, which disallows "*" and
"?" as well as the characters listed in the first quoted section above.
Could it disallow others? Since "*" and "?" are not documented
explicitly, one has to presume so.
If it wasn't for the fact that our application had to support Unicode,
I would write a simple filter based on ASCII character codes, and be
rather facist about it. In our case we can simply replace offending
characters with underscores "_" and have done with it. Once, of course,
we've determined what those offending characters are.
I'm sure I've encountered this issue before, and I seem to recall
wishing then just for a simple table, for both Unicode and ASCII
characters, showing which were disallowed and by what level of the
system (shell, file system API, file system itself). My wish is still
pending, but since I have some pressing work to do I should Alt-Tab and
get on with it...