Electronic Discovery — Myths, Fables and Folklore
Published by John Hopkins in Commercial Litigation, Law Technology, Mass TortsNext in our Electronically Stored Information (ESI) series I had planned to discuss the collection, review and culling of ESI.
I changed my mind after reading a few articles and recently decided cases that discussed some crazy myths and fables about ESI and e-discovery.
Let’s talk about myths, legends and folklore in the world of ESI.
Portable Document Format (PDF) documents are always “full text searchable”:
Wrong. What makes PDF documents searchable is optical character recognition (OCR), which is simply a program that reads machine language and interprets it as text so a search for “dog” can be found by the program examining the document. You can create a PDF document (or series of documents) that are NOT full text searchable and determining the full text search-ability is not always as obvious as one may think. To have search-ability, you must have the OCR data.
Tagged Image File Format (TIFF) documents are always “full text searchable”:
Wrong. TIFF is a “picture” of the document. See answer above concerning OCR.
Paper Documents are the same, but just a copy of the electronic file:
Wrong. First, the sheer magnitude of ESI significantly outweighs paper documents. The different types of ESI vary greatly. A picture of a dog printed on a piece of paper has little difference than text on a piece of paper – it is a paper document. In the ESI world, though, the picture of the dog may be a: gif, jpeg, bmp, RAW, PNG, TIFF, PDF, RGBE, CGM, or another of over 25 different formats. A text paper document as an ESI file might be: Word, WordPerfect, Word Pad, OpenOffice, Notepad, WordStar, TextEdit, or over 75 additional text editors.
Now add to this equation “metadata”; or what is referred to in ESI as “data about data”.
Also consider “portability” in the distinction of paper vs. electronic. A single gigabyte of information you could carry around on a flash drive might equate to over 150,000 pieces of paper and take up over 30 cubic feet of space; about 200 pounds of paper. If a party anticipates producing a terabyte of ESI, that’s 150,000,000 pieces of paper and 30,000 cubic feet of space.
When asking for the production of ESI, you should always ask for and get “metadata”:
Wrong. First, you should know what metadata really is as it relates to given formats of documents. The extent, amount and type of metadata you can recover from program produced documents varies greatly from format to format. A quick answer to what metadata is: “data about data”. An answer that is absolutely accurate and could not be more useless.





