Thursday, July 11, 2013

Time ENF? "ENF, a New Standard for Managing Native Files"

EDRM - Proposing ENF, a New Standard for Managing Native Files

July 11, 2013 – In a pair of white papers published today, EDRM member Wade Peterson, writing on behalf of the EDRM Native Files Project, proposes the creation and adoption of a new ENF (encapsulated native file) standard for the production of native files.

In Can Native File Productions be ENF (Enough)?, Peterson presents the conceptual framework for defining the new standard.

In This is Just About ENF, he illustrates a sample ENF, describes some of its elements, and describes the operation of a basic utility to view ENF files.

Can Native File Productions be ENF (Enough)?

...

Executive Overview

Legal document productions have come a long way since the mid-80s; and yet we still use standards defined almost two decades ago! Today, litigation support professionals are faced with increasing challenges to force newer technology documents into older technology standards. In today’s environment we have “3-dimensional documents”. Examples of these include Word documents with embedded links to web sites; embedded Excel worksheets in Word documents; Excel cells dynamically obtaining data from SQL databases; PowerPoint slides with embedded videos, Excel graphs, and animation; hidden rows/columns; pivot tables; etc.

Attempting to represent these 3-dimensional documents in a 2-dimensional world of TIFF or PDF is challenging, if not impossible.

This paper presents a conceptual framework for defining a new standard. A standard designed for litigation support professionals. A standard designed for legal document productions.

Challenges

I’ve already outlined some of the challenges faced by litigation support professionals doing document productions. Listed below (although not exhaustive) are a few other challenges:

  • Courts and opposing counsel are increasingly demanding “native file productions”.
  • Native files can be altered (either intentionally or not).
  • Native files cannot be redacted.
  • Native files cannot be bates stamped or endorsed or watermarked.
  • Native file names and paths can often exceed the Windows standard 255 character limit.
  • Native files are typically never printed, hence how can they be “represented” in a print format (which is what the TIFF and PDF standards were designed for).
  • Native files with “embedded content” do not translate well to print format, therefore arbitrary mechanisms were created to support them (e.g., “parent” and “child” relationships).
  • Emails with attachments must be handled using add-on standards (e.g., BegAttach, EndAttach).
  • Extracted text from native files must be delivered separately from the TIFF files, and a mechanism built to associate that text with the actual TIFF image(s) (e.g., TIF/TXT files).
  • Color files cannot be produced as TIFF, they are often supplemented as JPEG images.
  • Vendor specific formats have been created, and adopted as pseudo-standards to support these new 3-dimensional documents (e.g., Summation DII files; Concordance delimiters; OPT and DAT files; and recently XML standards).
  • A single page TIFF (and its accompanying OPT file) production cannot be viewed outside of normal document review platforms.
  • Smaller firms do not own document review platforms.
  • Document productions are difficult, if not impossible to use in deposition or courtroom settings.

Solution

The solution is to develop a new standard. One which addresses today’s concerns yet has an open architecture to meet future requirements as they occur. Develop a new standard for litigation support document productions. One which eventually is adopted by courts as the legal standard, much like standard legal citation formats were adopted centuries ago.

The solution is not to transform native files into something they are not. The solution is to embrace native files, and build an architecture around them that addresses the challenges our industry faces.

image

...

This is Just About ENF

This article illustrates a sample ENF (encapsulated native file), describes some of the elements within it, and describes the operation of a very basic utility to view ENF files (viewENF). The purpose of this article is to continue the discussion of the ENF standard, which was proposed in the previous article “Are Native File Productions ENF (Enough)”.

The ENF standard is a new, proposed standard for the production of native files; using an encapsulation concept to self-contain a single native file (or multiple files for a family of email and attachments), along with metadata, and security features into a single deliverable file. Native files are delivered on a one-to-one basis as ENFs. The contained native file is forensically the same as the original native file, stored in encrypted form, within the ENF. Security and metadata features augment the native file, to render it viewable by the viewENF program, overlaying endorsements and redactions, as well as other “vendor enhancements” as needed. Security features are built-in to allow (or prevent) export, printing, and viewing of certain features.

This article will also point out some areas of concern that will need to be addressed for full adoption of ENF as an industry standard.

  • Latent content
  • Original native file corruption and encryption
  • Hidden content
  • Native compound documents

...

imageimage

...

Native Files

This element is the heart of the ENF, and may be repeated one or more times (e.g., in the case of an ENF encapsulating an email and its family of attachments). The element contains three sub-elements in our sample.

  • MetaData – these include the extracted metadata fields of the native file, commonly done with ESI processing tools such as LAW PreDiscovery, IPRO, NUIX, etc. In this sample, I have included three sub-elements for (a) file properties; (b) document properties; and (c) extended properties for an “Attorney Comments” field added during a document review. In this example, the document properties are the standard Microsoft Word document properties.
  • Redactions – this element may be repeated one or more times and is included here rather than in the Vendors element since it is such a common occurrence in production files and most people would consider it part of the standard, rather than a vendor enhancement. The redaction shown here includes the minimum requirements for placing a redaction over the native file view. Other attributes may include the font, color, positioning, and min/max point size of the redaction tag text.Redaction coordinates will be highly dependent on the viewer code. Translation of various document review platform redaction coordinate architectures will be needed to come to an agreement on a standard for ENF redaction coordinate conversions.

    The placement of redactions over a native file view will also be critical. Since most native file viewers use a browser window to render the representation of the native file, the redaction overlays will need to “float” yet remain statically over the intended redaction position. As the browser or viewer application window is resized, the redaction overlays must be re-sized in perfect synchronization. As the user scrolls the native view, the redactions, again, must move perfectly in sync. Techniques for layering the HTML will need to be developed as I consider this to be a critical component of widespread acceptance of redactions within the ENF standard.

    Other techniques for incorporating redactions in a viewer will need to be investigated.

  • EncodedFile – since a binary file (in its native form) cannot be contained within an XML file as is, the binary file must first be encoded to a “base64” string before inclusion. That base64 string can then be encrypted using a variety of standard encryption mechanisms. The viewer would first unencrypt the base64 string, and then convert it from base64 to binary to display it with the viewer.

While the above example is just an early sampling of an ENF, it does provide enough background for further discussion, debate, and refinement.

...

While my current industry desperately needs this, I have some concerns about this spec. The first one that jumps out at me is that last bullet point, Encoding the original binary and putting it in the XML? Nope, not going to work for large doc's. Sorry....

The concept around this is awesome, but they need to talk to Microsoft about using something like the Open Packaging API, i.e. what Office/OpenXML uses for al the "...X" formats (which is really a zip file, with XML components, catalogs, etc). This way this spec could include the original binary, but in a form that won't insanely bloat the XML. That said, there's hope. In reading through the two papers it sounds like they might be looking for a "super container" kind of solution (cough... Open Packaging API... cough)

Anyway...

I'll be following this closely, but won't hold my breath. Everyone knows we need something like this, but this is going to take a while to see broad acceptance and implementation of this. But at least it's being discussed!

No comments: