Skip to content

Case Study: Intelligent Document Classification

Case Study

Key Result: 65% reduction in manual triage time, 40% faster citizen response times, 91% AI classification accuracy with human oversight for edge cases.
Agency Type Service Delivery
Domain Citizen Services
Challenge Processing high volumes of citizen correspondence
AI Approach Multi-class text classification (fine-tuned DistilBERT)

Executive Summary

A service delivery agency implemented an AI-powered document classification system to automatically categorize incoming correspondence, reducing manual triage time by 65% and improving response times to citizens by 40%.


The Challenge

Situation

  • 500,000+ pieces of correspondence received annually
  • Mix of emails, letters, forms, and scanned documents
  • 15 different categories requiring specialized handling
  • Average 3-day delay in initial triage
  • High workload on frontline staff

Problems

  • Manual classification was time-consuming and inconsistent
  • Urgent matters were sometimes delayed
  • Staff frustration from repetitive categorization work
  • Citizen complaints about response times
  • Difficulty tracking correspondence patterns

Business Impact

  • 12 FTE dedicated to triage function
  • Citizen satisfaction scores declining
  • Ministerial complaints about delays
  • Staff turnover in correspondence team

The Solution

AI Approach

Model Type: Multi-class text classification Architecture: Fine-tuned transformer (DistilBERT) Integration: Email gateway and document management system

System Design

flowchart LR
    subgraph IN["<strong>Input</strong>"]
        I1[Email/Letter]
        I2[Scanned Document]
    end

    subgraph OCR["<strong>OCR/Text Extraction</strong>"]
        O1[Clean Text]
    end

    subgraph CLASS["<strong>AI Classification</strong>"]
        C1[15 Category Classifier]
        C2[Confidence Score]
    end

    subgraph ROUTE["<strong>Routing</strong>"]
        R1[Queues by category]
    end

    IN --> OCR --> CLASS --> ROUTE
    CLASS --> HIGH[✓ High confidence<br/>Auto-route]
    CLASS --> LOW[? Low confidence<br/>Human review]

    style IN fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style OCR fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style CLASS fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style ROUTE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style HIGH fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style LOW fill:#ffcc80,stroke:#ef6c00,stroke-width:2px

Categories Classified

  1. New claims
  2. Claim updates
  3. Change of circumstances
  4. Payment enquiries
  5. Complaints
  6. Appeals
  7. Document submissions
  8. General enquiries
  9. Feedback (positive)
  10. Urgent/hardship
  11. Fraud reports
  12. Death notifications
  13. Address changes
  14. Representative requests
  15. Other/Unclear

Key Design Decisions

Decision Choice Rationale
Model architecture DistilBERT Balance of accuracy and speed
Confidence threshold 85% for auto-routing Minimize misroutes
Human review Below 85% confidence Maintain quality
Priority detection Keyword + model Catch urgent matters
Retraining frequency Monthly Adapt to changes

Implementation

Timeline

Phase Duration Activities
Discovery 6 weeks Requirements, data assessment, stakeholder engagement
Data preparation 8 weeks Labeling, cleaning, quality assurance
Model development 10 weeks Training, testing, iteration
Integration 6 weeks System integration, workflow design
Pilot 8 weeks Limited rollout, monitoring, refinement
Full rollout 4 weeks Agency-wide deployment
Total 42 weeks

Team

Role FTE Responsibility
Product Owner 0.5 Requirements, stakeholder management
Data Scientist 2.0 Model development
Data Engineer 1.0 Data pipelines
Integration Engineer 1.0 System integration
Business Analyst 1.0 Process design
Change Manager 0.5 Training, adoption

Data Preparation

Training Data: - 100,000 labeled historical documents - Manual quality review of 10,000 samples - Inter-rater reliability testing (92% agreement) - Augmentation for underrepresented categories

Challenges: - Historical labels were inconsistent - Some categories had limited samples - OCR quality varied for scanned documents

Solutions: - Re-labeled 20,000 documents with new taxonomy - Synthetic oversampling for rare categories - OCR preprocessing pipeline with quality scoring


Results

Performance Metrics

Metric Before After Improvement
Triage time (average) 3 days 4 hours 94% faster
Classification accuracy 78% (human) 91% (AI) +13 points
Auto-routed (no human touch) 0% 72% New capability
Urgent matters missed 5% 0.5% 90% reduction
Staff satisfaction 3.⅖ 4.⅕ +28%

Business Impact

Metric Value
FTE reduction in triage 8 FTE (67%)
Cost savings (annual) $960,000
Response time improvement 40% faster
Citizen satisfaction increase +12 points
Complaint reduction 35% fewer

Accuracy by Category

Category Precision Recall F1 Volume
New claims 0.94 0.92 0.93 High
Payment enquiries 0.93 0.95 0.94 High
Complaints 0.89 0.91 0.90 Medium
Appeals 0.91 0.88 0.89 Low
Urgent/hardship 0.96 0.93 0.94 Low
Other/Unclear 0.78 0.82 0.80 Medium

Challenges and Lessons Learned

Challenge 1: Data Quality

Issue: Historical labels were inconsistent across teams Solution: Invested 8 weeks in data cleaning and relabeling Lesson: Budget significant time for data preparation (30% of project)

Challenge 2: Staff Concerns

Issue: Staff worried about job security Solution: Clear communication that AI handles triage, not decisions Lesson: Involve staff early, show how AI helps not replaces

Challenge 3: Edge Cases

Issue: Multi-topic correspondence confused the model Solution: Added multi-label capability and human review routing Lesson: Design for edge cases from the start

Challenge 4: Model Drift

Issue: Accuracy dropped 5% after 3 months Solution: Implemented monthly retraining with recent data Lesson: Plan for ongoing model maintenance

Challenge 5: Privacy Concerns

Issue: Questions about AI reading citizen correspondence Solution: Clear privacy documentation, data minimization Lesson: Address privacy proactively with stakeholders


Governance and Compliance

Governance Structure

  • Executive sponsor: Director of Service Delivery
  • Governance board oversight: Quarterly review
  • Ethics review: Completed prior to deployment
  • Risk tier: Tier 2 (Medium)

Compliance Measures

  • Privacy Impact Assessment: Completed
  • Security assessment: Completed
  • Model card: Published internally
  • Audit trail: All classifications logged
  • Human oversight: Staff can override any classification

Fairness Considerations

  • Tested for geographic bias: None detected
  • Tested for language patterns: Minor adjustments made
  • Complaint routing accuracy equal across demographics

Technical Details

Model Specifications

  • Base model: DistilBERT (66M parameters)
  • Fine-tuned on: Agency-specific data
  • Input: Document text (max 512 tokens)
  • Output: 15 class probabilities
  • Confidence calibration: Platt scaling

Infrastructure

  • Training: AWS SageMaker
  • Serving: API on agency cloud (AWS)
  • Throughput: 1,000 documents/minute
  • Latency: <500ms per document
  • Availability: 99.9%

Integration Points

  • Email gateway (incoming mail)
  • Document management system
  • Case management system
  • Reporting dashboard

Recommendations for Similar Projects

Do

  • Invest heavily in data quality
  • Engage staff and unions early
  • Design for human oversight
  • Plan for ongoing retraining
  • Start with pilot before full rollout
  • Measure and communicate benefits

Don't

  • Underestimate data preparation time
  • Deploy without monitoring
  • Ignore edge cases
  • Skip change management
  • Assume one-time model training is sufficient

Cost-Benefit Summary

Costs (First Year)

Item Cost
Discovery & planning $75,000
Data preparation $120,000
Model development $180,000
Integration $90,000
Change management $45,000
Infrastructure $36,000
Total Year 1 $546,000

Ongoing Costs (Annual)

Item Cost
Infrastructure $36,000
Model maintenance $80,000
Support $40,000
Total Annual $156,000

Benefits (Annual)

Item Value
FTE savings $960,000
Improved response times Qualitative
Staff satisfaction Qualitative
Annual Net Benefit $804,000

ROI: 147% (Year 1) | Payback: 8 months


Contact

For more information about this case study, contact the AI Toolkit team.


Related documents: AI Use Case Template | ROI Calculator