Getting Started with GenAI: Practical Use Cases in Discovery
-
Published on Sep 23, 2025
I recently had the pleasure of leading a webinar discussing how legal teams are actually using generative AI tools (in discovery) right now. We covered a lot of ground, but a consistent throughline emerged; namely that there are a lot of opportunities to use generative AI that simultaneously allow case teams to just get their toes/feet wet while also reducing a lot of the friction associated with traditional ediscovery tasks. *
Starting “small” with generative AI can have several large benefits:
- Learning new tools and workflows in a lower-risk environment;
- Garnering information otherwise unavailable (metadata from flat PDFs anyone?);
- Showing the end client meaningful cost and time savings;
- Mitigating the need to disclose the use of generative AI to the court or opposing counsel.
This article will highlight several such use cases that stop short of using generative AI to identify documents for potential production or disclosure. Importantly, these selected use cases barely scratch the surface of what we (and others) have been able to accomplish with generative AI-enabled workflows. While there are of course some technological limitations that are worth bearing in mind, the practical limit is really the creativity of the teams using the available tools.

” …the resulting efficiencies far outweighed the costs and in fact reduced risk…”
We’ll start with a workflow that is broadly applicable and addresses a frustration I think just about every litigation has encountered and wrap up with a use case that it’s a bit more complex but helps solve an even more intractable problem.
Metadata (and meta metadata) Extraction
We have all, I expect, received an opposing production comprised of non-searchable PDFs and conspicuously lacking an accompanying load file containing things like email senders, subjects, dates, etc. Solutions to this problem have included first level reviewers go through the documents and record the “missing” information, using regular expressions to pull information that matches specified patterns like those found in email message headers. Human review is more time and cost intensive than regular expressions, but also more flexible and allows for extraction of information that doesn’t necessarily follow known patterns, e.g., locations, names not present in the headers, etc.
Generative AI-based tools combine the cost- effectiveness of technological solutions and the flexibility of human review allowing case teams to quickly append missing metadata to documents that don’t otherwise have it, including non-metadata information like a summary of the document, the tone of the discussion, etc. We implemented such a workflow on a recent matter and were able to provide the case team with a single prompt that returned the following:
- Traditional Metadata
- Email From
- Email To
- Email CC
- Email BCC
- Email Subject
- Date First Sent
- Date Last Sent
- Additional Document Information or (meta-metadata if you will)
- Document Subject – a phrase or sentence fragment about what the document actually covers
- A two-sentence summary of the document
- Any locations referenced in the document
- The tone of the document (positive/negative/neutral)
- Any mention of dollar amounts
The case team was then able to immediately leverage the resulting information to identify documents that were of greatest interest and prioritize review accordingly. At only pennies per document, the resulting efficiencies far outweighed the costs and, in fact, reduced risk since without the corresponding metadata, the odds that the case team missed an important document would have been higher. And finally, as this process was used strictly for internal purposes, the case team determined that no disclosure was required.

“Generative AI-based tools combine the cost effectiveness of technological solutions and the flexibility of human review…”
Summarization and Text Enrichment of Scanned Documents
“Flat” PDFs like those in the example above lack text and require OCR. What else does? Scans. What are hard to read? Scans! Even worse with handwriting! Anyone who has worked with older productions knows the pain of trying to extract meaningful information from page after page of grainy, skewed, or otherwise compromised scans.
On a recent matter, we were faced with exactly that challenge: thousands of documents that had been scanned decades ago and generated even earlier. The OCR text was technically present, but in practice it was fragmented, incomplete, and in many cases unusable. Traditional approaches would have required consigning first-level reviewers to the slog of deciphering faint text.
Instead, we began by logically unitizing massive, multi-hundred-page PDFs into their constituent documents. Once unitized, each document was passed through a generative AI-enabled workflow that produced:
- Clean Document Summaries – a coherent, two-to-three sentence narrative of the contents, regardless of OCR quality.
- Logical Separation – boundaries clearly marked so that what had once been a giant unsearchable block became discrete, review-ready documents.
- Readable Document-Level Text – a reconstructed text body that far outstripped the “Swiss cheese” output of OCR alone.
The result was a practical breakthrough. The case team could quickly understand the thrust of each document without relying on unreliable OCR output. They could search across the enriched text with far greater accuracy, prioritize which categories of documents to escalate, and reduce the amount of manual deciphering needed.
From a cost perspective, the workflow was both lean and impactful. Because summarization ran at the document level, the expense stayed predictable (again, pennies per document), and the time saved—particularly on documents that would otherwise have been almost unreadable—was dramatic. As with the metadata extraction example, this workflow was used strictly for internal review and case assessment, so no disclosure obligations were triggered.
Most importantly, the team went from “staring at gray fuzz” to actually working with substantive, searchable text. That shift not only accelerated review but materially reduced the risk of missing key content buried in poor scans.
Bringing It Home
You may have noticed that the two examples above focus on information extraction rather than synthesis, classification, or opinion. This is by design. They represent some of the lowest-risk and highest-reward workflows where the return on time invested is immediate, measurable, and defensible.
Once teams are comfortable with these foundational use cases, the logical next steps are easy to imagine. Level-ups could include:
- Key Document Enrichment – generating richer summaries or extracting quotable passages from documents already identified as important, giving case teams quick insight into exactly what matters.
- Topic Surfacing – identifying the key themes running across communications between major custodians, helping teams map out storylines and prepare for depositions.
- Pattern Recognition – spotting recurring concepts, entities, or issues that may not be obvious when reviewing documents one at a time.
Each of these builds on the same principles demonstrated in the extraction use cases: leverage generative AI to reduce friction, accelerate review, and expand visibility without immediately wading into the thornier territory of disclosure or production.
The bottom line: starting “small” with generative AI doesn’t mean starting trivial. Even modest applications can materially change how review teams work, giving them back time, reducing cost, and lowering risk; all while laying the groundwork for more advanced AI-enabled workflows down the line.
*Disclaimer: No AI was harmed in the writing of this article