Litigation Forensics

How TAR and Predictive Coding Reduce Document Review Costs by 30-50%

Cole Popkin
February 8, 2026
12 min read
How TAR and Predictive Coding Reduce Document Review Costs by 30-50%: Professional guide for attorneys on using digital forensics in legal cases. Expert testimony, evidence collection, and court admissibility.

Document review traditionally accounts for 60-70% of total eDiscovery costs, with attorney review rates ranging from $50-$350 per hour. For cases involving hundreds of thousands or millions of documents, review costs can easily exceed $500,000. Technology Assisted Review (TAR) and predictive coding offer dramatic cost reductions while maintaining or exceeding human review accuracy.

What is Technology Assisted Review (TAR)?

TAR uses machine learning algorithms to identify relevant documents, dramatically reducing the volume requiring human review. Instead of attorneys examining every document linearly, TAR:

1. Trains on human-coded "seed set" documents 2. Predicts relevance for remaining documents 3. Prioritizes likely-relevant documents for attorney review 4. Continuously improves as more documents are coded

Result: Attorneys review a fraction of the document set while the algorithm handles low-value, likely non-responsive documents.

TAR vs. Traditional Linear Review

Traditional Linear Review

Process: Attorneys review every document sequentially

Costs: 100,000 documents at 50 docs/hour = 2,000 hours × $150/hour = $300,000

Timeline: Months of attorney time

Issues: - Reviewer fatigue decreases accuracy - Inconsistent coding across reviewers - Expensive junior attorneys performing repetitive tasks - No prioritization—hot documents found randomly

TAR Review

Process: Algorithm prioritizes relevant documents, attorneys review sample set

Costs: 25,000 documents reviewed (75% reduction) = 500 hours × $150/hour = $75,000

Timeline: Weeks instead of months

Benefits: - Consistent algorithmic standards - Hot documents surfaced early - Senior attorneys review high-value documents - Defensible methodology with validation metrics

Savings: $225,000 (75% reduction) in this example

Types of TAR

TAR 1.0 (Simple Passive Learning)

How it works: 1. Attorney review "seed set" of randomly selected documents (1,500-3,000 docs) 2.

Algorithm trains on this coded set 3. Algorithm ranks all remaining documents by relevance probability 4.

Multiple training rounds refine accuracy 5.

Limitations: - Requires large seed set - Multiple training rounds time-consuming - Doesn't adapt to new issues discovered during review

Court Acceptance: Established in Da Silva Moore v. Publicis Groupe (2012) and Rio Tinto v. Vale (2015)

TAR 2.0 (Continuous Active Learning - CAL)

How it works: 1. Attorney begins reviewing highest-ranked documents 2.

Algorithm continuously learns from each coding decision 3. Document rankings update in real-time 4.

No separate training phase—review and training simultaneous 5.

Advantages: - Faster startup (no large seed set) - Continuous improvement throughout review - Adaptive to emerging issues - Reaches target recall with fewer reviewed documents

Tools: Relativity Active Learning, Disco AI

TAR 3.0 (LLM-based Predictive Coding)

Latest Innovation: Large Language Models (LLMs) like Relativity aiR

How it works: 1. Natural language processing understands document meaning 2.

Contextual understanding beyond keyword matching 3. Transfer learning from vast training data 4.

Generates useful results almost immediately 5.

Advantages: - Near-instant results - Understanding of complex concepts and synonyms - Multilingual capabilities - Identifies relevant documents humans might miss - Works with small document sets (100,000+)

2026 Cutting Edge: Most advanced firms now using LLM-based TAR for complex matters

Court Acceptance of TAR

Landmark Cases

Da Silva Moore v. Publicis Groupe (S.

D.N.

Y. 2012) First case approving TAR for document production.

Court held computer-assisted review "at least as accurate, if not more so" than manual review.

Rio Tinto PLC v. Vale S.

A. (D.

Utah 2015) Court approved TAR over objecting party's insistence on manual review, noting "technology-assisted review is an acceptable way to search for relevant ESI.

Hyles v. New York City (S.

D.N.

Y. 2016) Sanctioned party for insisting on manual review rather than TAR, calling it "an efficient and proper way to identify responsive documents.

Judicial Perspectives (2026)

TAR is no longer novel or experimental. Federal and state courts routinely approve TAR protocols. Many judges now question parties who don't use TAR for large document sets.

Sedona Conference Best Practices (2023 Update): "The case law has consistently supported the use of TAR as a reasonable method to identify responsive documents."

Key Requirements for Court Approval

1. Transparent Methodology: Disclose algorithm type, training process, validation approach 2.

Cooperation: Meet and confer with opposing counsel on TAR protocol 3. Validation: Statistical metrics proving reliability (precision, recall, F1 scores) 4.

Quality Control: Human oversight, sample testing, continuous monitoring 5.

TAR Implementation Process

Phase 1: Preparation

A. Data Collection and Processing - Collect ESI using forensically sound methods - Process data: deduplication, de-NISTing, threading - Load into review platform (Relativity, Disco, Everlaw)

B. Meet and Confer Negotiate TAR protocol with opposing counsel including: - Algorithm selection (TAR 1.0, 2.0, or LLM-based) - Seed set size and selection method - Training rounds and stopping criteria - Validation methodology - Privilege handling - Transparency (will opposing counsel see algorithm internals?)

C. Define Relevance Criteria Precisely define what constitutes "responsive" document: - Issues in case - Date ranges - Custodians - Document types - Privilege considerations

Phase 2: Training

A. Seed Set Selection (TAR 1.0) - Random sampling OR purposive sampling - Size: 1,500-3,000 documents (statistical significance) - Senior attorney review for consistency

B. Initial Training - Feed coded seed set to algorithm - Algorithm generates relevance scores for all documents - Review metrics: precision, recall, F1 score

C. Iterative Training Rounds - Review additional document batches - Re-train algorithm with expanded coding set - Repeat until stability metrics reached

D. Continuous Active Learning (TAR 2.0) - Begin reviewing top-ranked documents immediately - Algorithm updates rankings with each decision - No separate training phase

Phase 3: Review

A. Prioritized Review - Review high-ranked documents first (70%+ relevance probability) - Algorithm surfaces hot documents early - Senior attorneys handle high-value documents

B. Threshold Determination - Set relevance score cutoff (e.g., 50% probability = responsive) - Documents below threshold presumed non-responsive - May require validation sampling

C. Human Quality Control - Random sampling of low-scored documents validates algorithm accuracy - Continuous monitoring for concept drift - Expert review of borderline documents

Phase 4: Validation

A. Statistical Metrics

Precision: Percentage of algorithm-identified relevant docs that are actually relevant - Formula: True Positives / (True Positives + False Positives) - Target: 70-80%+

Recall: Percentage of all relevant docs that algorithm identified - Formula: True Positives / (True Positives + False Negatives) - Target: 70-85%+ (case dependent)

F1 Score: Harmonic mean of precision and recall - Formula: 2 × (Precision × Recall) / (Precision + Recall) - Target: 0.70+

B. Control Set Testing - Set aside random sample not seen by algorithm - After training, predict relevance for control set - Compare predictions to human coding - Validates algorithm performance on unseen data

C. Sample Testing - Randomly sample low-scored documents - Human review determines if truly non-responsive - Calculate error rates - Acceptable miss rate: <5%

Phase 5: Production

A. Threshold Application - Produce high-scored documents (above relevance threshold) - Document low-scored documents excluded with rationale

B. Privilege Review - TAR-identified relevant documents undergo privilege review - Privilege log creation for withheld documents - Some platforms offer predictive privilege models

C. Quality Control - Final review of production set - Random sampling for quality assurance - Bates numbering and load file creation

D. Defensibility Documentation - TAR protocol document - Training statistics and rounds - Validation results (precision, recall, F1) - Sampling methodology and results - Quality control measures

Cost Savings Analysis

Real-World Example: 1 Million Document Case

Traditional Linear Review Costs: - 1,000,000 documents - Average review rate: 50 docs/hour - Total review hours: 20,000 hours - Average hourly rate: $175 - Total Cost: $3,500,000 - Timeline: 12-18 months with team of reviewers

TAR Review Costs: - Seed set training: 2,500 documents × $250/hour (senior attorney) = $12,500 - Algorithm scoring: Platform costs ~$50/GB = $25,000 - Review of top 300,000 relevant documents (30% review rate): 6,000 hours × $175 = $1,050,000 - Validation and quality control: $50,000 - Total Cost: $1,137,500 - Savings: $2,362,500 (67% reduction) - Timeline: 4-6 months

Cost Factors

TAR Setup Costs: - Platform licensing/fees - Data processing and indexing - Attorney training on TAR workflows - Meet and confer time

Offset by: - Reduced review hours (30-50% fewer documents reviewed) - Earlier access to hot documents (better case strategy) - Faster resolution (compressed timelines) - Consistent quality (reduces re-review costs)

When TAR Makes Financial Sense

Document Volume Thresholds: - Under 50,000 documents: Manual review often sufficient - 50,000-200,000 documents: TAR beneficial but not essential - 200,000+ documents: TAR strongly recommended - 1,000,000+ documents: TAR virtually required for cost management

Case Economics: - Amount in controversy justifies investment - Client budget can absorb setup costs - Multiple custodians and data sources - Complex issues requiring prioritization

Addressing Common Objections

"TAR is a Black Box - We Don't Trust Algorithms"

Reality: TAR is more transparent and consistent than human review. Validation metrics (precision, recall, F1 scores) provide objective accuracy measurements impossible with manual review. Human reviewers have inconsistency rates of 20-40% even with training.

"Manual Review is More Accurate"

Studies Show: TAR accuracy matches or exceeds human review - Grossman & Cormack (2011): TAR achieved 70-80% recall vs. 60-70% for manual review - TREC Legal Track studies: TAR consistently outperforms manual methods - Human reviewers suffer fatigue, inconsistency, and miss rate of 20-50%

"Opposing Counsel Will Challenge TAR"

Court Response: Challenges to TAR are increasingly unsuccessful. Opposing parties must show TAR process was unreasonable, not merely different from manual review.

Defense: Comprehensive documentation of TAR protocol, validation metrics, and quality control measures defeats most challenges.

"TAR Costs Too Much Upfront"

Analysis: TAR setup costs ($25,000-$100,000) are offset by review savings ($500,000-$3,000,000+) in cases with 200,000+ documents.

Proportionality: Failing to use TAR may be disproportionate under Rule 26(b)(1) when traditional review costs exceed case value.

Best Practices for TAR Success

1. Engage Early

Begin TAR planning during preservation phase, not after collection complete. Early engagement ensures proper data collection and processing.

2. Cooperate with Opposing Counsel

Transparency and cooperation reduce disputes. Share TAR protocol, consider joint validation, negotiate in good faith.

3. Document Everything

Maintain detailed records of: - TAR protocol and methodology - Training rounds and decisions - Validation results and sampling - Quality control measures - Deviations from protocol and reasons

4. Use Experienced Vendors

TAR requires expertise. Engage service providers or consultants experienced in TAR implementation and defensibility.

5. Validate Thoroughly

Statistical validation through control sets and sample testing provides objective proof of TAR effectiveness. Courts scrutinize validation methodology.

6. Monitor Continuously

Track algorithm performance throughout review. Concept drift (changing understanding of relevance) requires retraining.

7. Maintain Human Oversight

TAR assists human review, doesn't replace it. Senior attorneys must oversee process, review high-value documents, and make final judgment calls.

The Future of TAR (2026 and Beyond)

LLM Integration: Large Language Models revolutionizing TAR with near-instant results and deeper understanding.

Multilingual TAR: Cross-language capabilities essential for international litigation.

Generative AI: Automated document summarization, issue spotting, and relationship mapping.

Real-Time TAR: Continuous active learning during data collection and preservation phases.

Industry Shift: TAR becoming standard practice rather than cutting-edge technology. Courts and clients increasingly expect TAR for large document sets.

Conclusion

Technology Assisted Review transforms document review from an expensive, time-consuming burden into a manageable, cost-effective process. With 30-50% cost reductions, improved accuracy, and broad court acceptance, TAR is now the gold standard for large-scale eDiscovery.

Small and large firms alike can leverage TAR through cloud platforms, vendor partnerships, and consultants. The question is no longer whether to use TAR, but which TAR methodology best fits your case needs.

Ready to Reduce eDiscovery Costs? Our eDiscovery team specializes in TAR implementation, validation, and defensibility. Contact us for a confidential consultation on your next large document case.

Article Contributors

Senior Digital Forensics Analyst

Cole Popkin is a court-qualified digital forensics expert specializing in the analysis of mobile phones, computers, cell towers, video and audio files, emails, OSINT, and metadata. A former analyst for the U.S. Department of Homeland Security and Michigan State Police, Cole provides expert witness testimony in both criminal and civil proceedings.

LinkedIn Profile
Laura Pompeu
Reviewed By
Laura Pompeu
Content Editor

Laura Pompeu is a marketing professional with 10+ years of experience in digital marketing and content strategy. She oversees content quality and editorial direction for the Litigation Forensics blog.

LinkedIn Profile
Bogdan Glushko
Approved By
Bogdan Glushko
Founder & CEO

Founder & CEO of Litigation Forensics. Expert in digital forensics strategy and litigation support.

LinkedIn Profile