Same Car, New Engine: Implementing Generative AI Within Current eDiscovery Workflows

  • Published on Jul 25, 2024

Much ink (digital, of course) has been spilled as commentators, thought leaders, and other self-anointed experts have rushed to herald the imminent paradigm shift coming for everyone, from arts and education to the financial sector and, most notably, for this audience – legal. While the promise of “next-generation” artificial intelligence tools and platforms is vast (and has the hype to match), in truth, we have all been leveraging machine learning in our work and personal lives for many years. Yes, even in legal.  

Machine Learning in Contemporary eDiscovery Tools

While no longer bleeding edge, many tools we use daily in eDiscovery leverage machine learning in approachable and efficient ways. Continuous active learning (CAL), sometimes called TAR 2.0, can increase the accuracy and efficiency of just about any review workflow. CAL has supervised machine learning as its foundation. This algorithm “learns” from human feedback (generally in the form of coded documents). The algorithms used in these current generation tools are generally (but not exclusively) classifiers, meaning they classify existing data into desired binary categories: e.g., Responsive/Not Responsive, Key/Not Key, etc. Importantly, they are not creating any new information and are instead used to group existing information into sorted piles. As such, they are comparatively lightweight computationally and can therefore ingest large amounts of data quickly.  

Balancing Cost and Efficiency in AI eDiscovery Workflows

Generative AI tools, on the other hand, are premised upon creating new information in the form of text or images. To make this practical in the document review world, several software companies, including some of the most prominent players, have essentially shifted much of what would be human review under current technologies to machine review using generative AI.  

At a high level, this process looks as follows: 

  1. Subject Matter Experts (SMEs) draft a review protocol in a format the tool can understand, including key players, key facts, allegations, or even just uploading a list of RFPs.  
  2. The tool “reads” the protocol and then individual documents to determine whether a given document meets the criteria set out in the protocol and associated instructions.
  3. The tool then provides its answer and, in the case of most tools on or entering the market, its rationale for its determination.
  4. The human SMEs review the tool’s determinations and rationale and revise their instructions as necessary to improve the tool’s responses.
  5. Humans then perform statistical validation to ensure that a reasonable, sufficient, and defensible quantity of responsive material has been identified.  

This process is well summarized in the Redgrave Data paper.  

The architecture of generative AI tools imposes some limitations on the workflow hinted at in the above summary, including the need to assess each document individually (or, at best, a small group of documents) and recurring costs for each assessment by the tool. Using more technical terminology: Generative AI tools do not operate on a document or even word level but instead on tokens, which are often fragments of words. The tools can only operate on a given number of tokens or word fragments at any one time, a limitation called a “context window” that precludes them from evaluating more than one document at a time. Context windows have grown significantly larger since generative AI first made waves but still operate as a hard ceiling on the combined total of instructions and document content these tools can handle before they start effectively forgetting something. Thus, there is a need to provide the review protocol and additional instructions as a prompt for each individual document in isolation rather than as a cohesive population. Unlike current predictive coding tools, however, generative AI tools provide clear and direct feedback on why document-level determinations were made, allowing for significantly increased transparency and the ability to more easily identify the areas for correction or clarification in the provided instructions. 

Relatedly, because the tools operate on tokens, software providers are charged based on the number of tokens passed through to the models. If the review protocol requires adjustment, or new facts are discovered, a key custodian is added, etc., then the documents requiring revised analysis will require their constituent tokens to be passed through the model again, and the corresponding cost will pass on to the client through the software provider. This cost starkly contrasts with current analytics tools that have become effectively free outside of hourly charges to administer them.  

Of course, these tools have the potential to mitigate one of the largest cost centers in eDiscovery: document review. Even if the unit cost of generative AI-powered document review is significantly higher than that of current tools, it may ultimately be lower than the cost of having humans review the same data.  

Unified Processes of Modern and Generative AI Tools in eDiscovery

Ultimately, current and generative AI tools follow a similar process that heavily depends on where you start, i.e., your data set, and where you finish, i.e., your validation methodology. The component in the middle (the actual algorithm) changes, not the overall workflow. This similarity has been noted across the industry and a comprehensive summary is provided in the Redgrave/Relativity paper referenced earlier.  

The TAR process for using machine learning in lieu of human review (TAR 1), whether using current technology or generative AI, is as follows:  

  1. Identify the subset of the document population that will be included in the machine learning part of the overall review process. 
    • Typically, this involves excluding documents with no text or a whole lot of text, i.e., hundreds of pages.
    • However, this can be modified to some extent for a generative AI workflow, as documents with a significant amount of text can effectively be submitted and analyzed as multiple document segments.
    • Furthermore, some of the more recent generative AI tools are multimodal and can analyze both images and audio, meaning that when software providers include the necessary support for these models and features, generative AI will be able to apply the same rubric to both text, images, audio, and video.
  2. An SME drafts a review protocol to guide the reviewers, whether human or machine.
  3. Under this guidance, a sample of the documents is reviewed to ensure that the instructions are well understood and provide the desired outcome.
  4. Update the instructions as necessary.
  5. Run the documents through the analyzation process, whether utilizing a revised prompt in the case of generative AI or revised human coding in the case of more traditional predictive coding.
  6. Validate the process’s performance and provide iterative revisions as needed through additional document coding or prompting.

A TAR 2 process would differ from the above in that those processes generally assume humans will review all (or substantially all) of the responsive documents until they are sufficiently exhausted rather than stopping the iterative training process once the model is performing adequately. In other words, steps 4 and 5 above repeat until responsive documents are effectively and defensibly exhausted. The most critical and consistent steps across current TAR 1 and TAR 2, as well as nascent generative AI workflows, are correctly determining the starting population and accurately validating the ultimate outcome.  

Statistical validation of a TAR process often involves one or both of two complementary processes: a control set and an elusion test. A “control set” is a random sample drawn from the set of documents included in the machine learning process and sized to identify a set number of positive documents, typically responsive but whatever the “yes” side of the binary determination might be. Importantly, this process can be pretty painless or quite burdensome depending on the prevalence of positive documents in the set. These documents are reviewed by SMEs and evaluated by the selected TAR process, including generative AI. The determinations regarding responsiveness are compared and an estimate regarding the ratio of responsive documents correctly identified (often referred to as “recall”) is made. A control set, then, is an affirmative measurement of the processes’ performance in identifying the desired material.  

On the other hand, an elusion test seeks to estimate how much responsive material has literally eluded the process by drawing the SME-reviewed sample only from the set of documents that would be left behind, sometimes called the “null set.” The elusion test is an indirect but effective measurement of the processes’ performance by estimating what may be missed rather than what is directly identified.  

When used in conjunction, a control set can help ensure that a minimum number of responsive documents are identified, while an elusion test can help ensure that meaningful documents are not among those that have eluded the overall process. 

Generally, eDiscovery practitioners can expect to implement many of the same workflows they know and love (or love to hate) but with a potentially smarter and inarguably much more transparent underlying machine learning process. That reward is not without some attendant risk which may take the form of added expense, additional judicial scrutiny, our pointed questions from counsel. 

Written by: Wayland Radin