Saturday, October 11, 2008

Do we really need to say goodbye to MD5’s? There are 340,282,366,920,938,463,463,374,607,431,768,211,456 reasons there’s maybe no rush…

Ride The Lighting - OF MD5 COLLISIONS AND TRAIN WHISTLES

“…

And on to MD5 collisions – in a previous post, I mentioned that all MD5 collisions which have thus far been documented have been “forced” and have not occurred in the wild. A reader wrote with a follow up question – and I must apologize because I seem to have “misfiled” (never say lost) his e-mail, so I cannot tip my hat and thank him by name.

To answer his question, the likelihood of two naturally occurring differing files having identical MD5 hash values has been calculated by experts to be 340 billion, billion, billion, billion. In short, you are not likely to ever see it. Just for grins, the exact number is 340,282,366,920,938,463,463,374,607,431,768,211,456.

You could be struck by lightning and win the lottery many times over before you would see this in real life.

…”

Every so often, the subject of MD5 collisions comes up at work (given my job, that’s no surprising) so I found these paragraphs interesting.

While it’s an important fact that there can be MD5 collisions, it’s also important to put the collision chance into perspective…

What worries me most is not the real world chance, but the “bad guy manufactured” chance. If I were a bad guy, and there was a file I wanted to hide, I might think about consciously altering it so its MD5 matches one on the NSRL RDS list (aka a known system file… say make its MD5 match that of notepad.exe, etc). I wonder how hard it would be to write a utility that given a MD5 hash it forces/tweaks another file to match it. THAT’s the MD5 thought that keeps me up at night…

 

Related Past Post XRef:
MD5 Collisions
Strategic Legal Technology :: E-discovery and De-duplicating

2 comments:

Anonymous said...

Why use an algorithm that possibly has flaws when there are better alternatives?

Greg said...

Because many legacy systems use MD5's. And due to the size differences in other algorithm's it's not easy to convert to different ones.

But I hear you, and agree that as an industry, EDD, should move towards a different one, such as one of the SHA's.

But then again, that will cost a good bit and do we REALLY need to? If MD5 isn't really really broken and can be considered "good enough" for EDD purposes, then why spend in the time and effort on the chance?