Monday, December 27, 2004

Strategic Legal Technology :: E-discovery and De-duplicating

Strategic Legal Technology :: E-discovery and De-duplicating

"...
Existing approaches to detecting duplicates have limitations. One approach is to use a “hash,” a mathematical technique. This approach determines only if documents are completely identical; a single difference in one character or the file path makes two documents different. Another approach is to use meta-data to detect possible duplicates.

Software start-up Equivio has software that, upon first evaluation, allows litigators to identify near duplicates and adjust what is meant by “near.” For example, drafts of the same document prepared by different authors on different days with different file names could be identified as potential duplicates. (Hashes and meta-data cannot do this.) Such differences may be relevant to the case, but often they are not. Clustering near duplicates and reviewing them simultaneously can be a great advantage in helping to insure consistent responsiveness and privilege designations and in saving review time.
..."

Interesting... (For me in my business at least).

Deduplication by MD5 is currently pretty standard. But as the author points out MD5/Hash dedupe it is very all or nothing. Equivio appears to offer a different solution. What I find it most interesting is that they offer it as a OEM product with a wide range of API's.

Something to look at at least...

No comments: