There are important differences between searching for documents using technology-assisted review (TAR) and keywords. According to a U.S. Magistrate Judge based in Chicago, math should not be one of them.
In a new opinion by Judge Iain Johnston of the Northern District of Illinois,[1] a party using keywords was required to test the search effectiveness by sampling the set of documents that did not contain the keywords. Reflecting the parallels between keyword search and TAR, Judge Johnston explained that sampling the null set (documents with no keywords) would determine the “elusion rate.” In eDiscovery, elusion has typically been used as a metric to help determine success of TAR projects,[2] even though it can and should apply in other contexts, particularly the use of search terms.
A producing party must determine that its search process was reasonable. In many cases, the best way to do this is with objective metrics. Producing parties often put significant effort into brainstorming keywords, interviewing witnesses to determine additional terms, negotiating terms with the other party, and testing the documents containing their keywords to eliminate false positives. However, these efforts often still fail to identify documents if important keywords were missed, and sampling the null set is a simple, reasonable way to test whether additional keywords are needed.
Overcoming Fear of Technology and Math
If this seems technically complex, Judge Johnston also shared advice for lawyers who may feel intimidated by search technology and its associated jargon:
In life, there are many things to be scared of, including, but not limited to, spiders, sharks, and clowns –definitely clowns, even Fizbo. ESI is not something to be scared of. The same is true for all the terms and jargon related to ESI.[3]
It is important to overcome the fear of technology and its related jargon, which can help counsel demonstrate the reasonableness of search and production process. As Judge Johnston explains, sampling the null set is a process to determine “the known unknown,” which “is the number of the documents that will be missed and not produced.” Judge Johnson disagreed with the defendants’ argument “that searching the null set would be costly and burdensome.” The Order requires Defendants to sample their null set at a 95% +/-2% margin of error (which, even for a very large set of documents, would be about 2,400 documents to review).[4] By taking these measures—either with TAR or with search terms, counsel can more appropriately represent that they have undertaken a “reasonable inquiry” for relevant information within the meaning of FRCP 26(g)(1).
Testing Overinclusiveness vs. Underinclusiveness
The dispute in City of Rockford over sampling and testing the null set raises important issues regarding vulnerabilities with electronic search methods. An example that commonly plagues counsel is the use of both overinclusive and underinclusive searches.
Both TAR and keyword search strategies are intended to be iterative. This means that multiple rounds of sampling must be done to determine and then improve accuracy. This should include identifying false positives and also false negatives. “False positives” refer to documents that contain a keyword hit but are not substantively responsive to a particular search. In contrast, a “false negative” is a document that is responsive to a request, but is not revealed in the positive search results because it does not reflect any of the search criteria.
Most attorneys are familiar with the process of identifying false positives in the context of keywords; overly broad or unhelpful keywords result in wasted review of nonresponsive documents. In a TAR context, the analogous measure is usually referred to as precision, which is the number of documents TAR has categorized as responsive, divided by the number of those documents that are actually responsive. In both cases, decreasing the rate of false positives increases the efficiency of a search that was otherwise overinclusive.
While overinclusive searches represent inefficiency, underinclusive searches are a more serious problem because it may mean many documents—the false negatives—are being missed by the search. Nevertheless, effective keyword and TAR searches should iteratively identify false negatives.
As Judge Johnston observed in City of Rockford, in a keyword search, this is done in part by sampling the null set to determine what percentage of documents without keywords are responsive. A similar technique is often applied to TAR reviews that apply a workflow whereby documents predicted most likely responsive are promoted for review and documents predicted not responsive are not reviewed (“TAR 2” or “continuous active learning”). Unless parties have stipulated to a process involving keywords that does not require this sampling, a producing party should do this sampling in order to ensure it has made reasonable efforts to satisfy its FRCP 26(g) obligations.
Elusion Testing is Helpful but May Not Prove Reasonability by Itself
Despite their importance, elusion rates—standing alone—may not demonstrate reasonableness, even when they are low. Instead, they should be viewed in context of the entire document set. Additional steps to prove reasonableness should include comparing the elusion rate to the overall richness (percent responsiveness) of the entire document set, along with qualitatively evaluating the missed documents. In terms of workflow, elusion first should be compared to the richness rate. Three percent elusion rate could be considered low for a document set that is overall 35% responsive, but not low if the entire document set is only 5% responsive.[5]
In addition, the missed documents in the null set sample should be evaluated qualitatively in two ways: First, are these key documents? Even if the search has found almost all of the responsive documents, it may be inadequate if the missed documents are important. If they are the most important documents to a particular issue in the case, then the search must be adjusted. Second, do the documents suggest additional keywords or other criteria that should be added to the search so similar documents would no longer be missed? If changes might significantly improve the result, another iteration of searching and sampling may be needed.
Like TAR, Keyword Use Should Employ Metrics
So far, some attorneys for producing parties may have selected keyword searching over TAR because they perceived that it traditionally required less stringent testing or no testing at all. While such a proposition is dubious—U.S. Magistrate Judge Andrew Peck’s 2009 opinion William A. Gross Construction belied that notion, its time has surely passed. As City of Rockford now makes clear, the testing and sampling process associated with search terms is essential for establishing the reasonableness of a search under FRCP 26(g).
[1] City of Rockford v. Mallinckrodt ARD Inc., 2018 WL 3766673, Case 3:17-cv-50107 (N.D. Ill., Aug. 7, 2018).
[2] See The Grossman-Cormack Glossary of Technology-Assisted Review, 7 Fed. Cts. L. Rev. 1, 15 (2013).
[3] Attorneys may be even more inclined to read this decision when they learn Judge Johnston also cites a Simpsons episode to explain why some attorneys may have been resistant to adopting technology assisted review.
[4] A lower confidence level of 95% +/- 5% would require about 385 documents.
[5] In some TAR workflows, this comparison is done through calculating “recall”, which is the percentage of all responsive documents in the set that are found by the search.