Thursday, March 13, 2014

When Directory.GetFiles gets crappy and grabs *.htm but not *.html (here's one reason why)

The Old New Thing - Why does the Directory.GetFiles method sometimes ignore *.html files when I ask for *.htm?


A customer reported that one of their programs stopped working, and they traced the problem to the fact that a search for *.htm on some machines was no longer return files like awesome.html, contrary to the documentation. What's going on?

What's going on is that the documentation is trying too hard to explain an observed behavior. (My guess is that some other customer reported the behavior, and the documentation team incorporated the customer's observations into the documentation without really thinking it through.)

The real issue is that the Get­Files method matches against both short file names and long file names. If a long file name has an extension that is longer than three characters, the extension is truncated to form the short file name. And it is that short file name that gets matched by *.htm or *.txt.

Even as originally written, in the presence of short file names, the documentation is wrong, because it would imply that a search for reallylong*.txt could match reallylong_filename.txtother. But try it: It doesn't. That's because the short name is probably REALLY~1.TXT, and that doesn't match reallylong*.txt.

What happened is that short file name generation was disabled on the drive at the time the files were created, so there was no short file name available, so there was consequently no SHORTN~1.HTM file to match against.



This is one of those things that you might never find and might never even know you might not find. No exception, it just doesn't work as expected...

Trust, but verify.

No comments: