Case Study: Intelligent Document Classification¶

Case Study

Key Result: 65% reduction in manual triage time, 40% faster citizen response times, 91% AI classification accuracy with human oversight for edge cases.


Agency Type	Service Delivery
Domain	Citizen Services
Challenge	Processing high volumes of citizen correspondence
AI Approach	Multi-class text classification (fine-tuned DistilBERT)

Executive Summary¶

A service delivery agency implemented an AI-powered document classification system to automatically categorize incoming correspondence, reducing manual triage time by 65% and improving response times to citizens by 40%.

The Challenge¶

Situation¶

500,000+ pieces of correspondence received annually
Mix of emails, letters, forms, and scanned documents
15 different categories requiring specialized handling
Average 3-day delay in initial triage
High workload on frontline staff

Problems¶

Manual classification was time-consuming and inconsistent
Urgent matters were sometimes delayed
Staff frustration from repetitive categorization work
Citizen complaints about response times
Difficulty tracking correspondence patterns

Business Impact¶

12 FTE dedicated to triage function
Citizen satisfaction scores declining
Ministerial complaints about delays
Staff turnover in correspondence team

The Solution¶

AI Approach¶

Model Type: Multi-class text classification Architecture: Fine-tuned transformer (DistilBERT) Integration: Email gateway and document management system

System Design¶

flowchart LR
    subgraph IN["<strong>Input</strong>"]
        I1[Email/Letter]
        I2[Scanned Document]
    end

    subgraph OCR["<strong>OCR/Text Extraction</strong>"]
        O1[Clean Text]
    end

    subgraph CLASS["<strong>AI Classification</strong>"]
        C1[15 Category Classifier]
        C2[Confidence Score]
    end

    subgraph ROUTE["<strong>Routing</strong>"]
        R1[Queues by category]
    end

    IN --> OCR --> CLASS --> ROUTE
    CLASS --> HIGH[✓ High confidence<br/>Auto-route]
    CLASS --> LOW[? Low confidence<br/>Human review]

    style IN fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style OCR fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style CLASS fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style ROUTE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style HIGH fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style LOW fill:#ffcc80,stroke:#ef6c00,stroke-width:2px

Key Design Decisions¶

Decision	Choice	Rationale
Model architecture	DistilBERT	Balance of accuracy and speed
Confidence threshold	85% for auto-routing	Minimize misroutes
Human review	Below 85% confidence	Maintain quality
Priority detection	Keyword + model	Catch urgent matters
Retraining frequency	Monthly	Adapt to changes

Implementation¶

Timeline¶

Phase	Duration	Activities
Discovery	6 weeks	Requirements, data assessment, stakeholder engagement
Data preparation	8 weeks	Labeling, cleaning, quality assurance
Model development	10 weeks	Training, testing, iteration
Integration	6 weeks	System integration, workflow design
Pilot	8 weeks	Limited rollout, monitoring, refinement
Full rollout	4 weeks	Agency-wide deployment
Total	42 weeks

Team¶

Role	FTE	Responsibility
Product Owner	0.5	Requirements, stakeholder management
Data Scientist	2.0	Model development
Data Engineer	1.0	Data pipelines
Integration Engineer	1.0	System integration
Business Analyst	1.0	Process design
Change Manager	0.5	Training, adoption

Data Preparation¶

Training Data: - 100,000 labeled historical documents - Manual quality review of 10,000 samples - Inter-rater reliability testing (92% agreement) - Augmentation for underrepresented categories

Challenges: - Historical labels were inconsistent - Some categories had limited samples - OCR quality varied for scanned documents

Solutions: - Re-labeled 20,000 documents with new taxonomy - Synthetic oversampling for rare categories - OCR preprocessing pipeline with quality scoring

Results¶

Performance Metrics¶

Metric	Before	After	Improvement
Triage time (average)	3 days	4 hours	94% faster
Classification accuracy	78% (human)	91% (AI)	+13 points
Auto-routed (no human touch)	0%	72%	New capability
Urgent matters missed	5%	0.5%	90% reduction
Staff satisfaction	3.⅖	4.⅕	+28%

Business Impact¶

Metric	Value
FTE reduction in triage	8 FTE (67%)
Cost savings (annual)	$960,000
Response time improvement	40% faster
Citizen satisfaction increase	+12 points
Complaint reduction	35% fewer

Accuracy by Category¶

Category	Precision	Recall	F1	Volume
New claims	0.94	0.92	0.93	High
Payment enquiries	0.93	0.95	0.94	High
Complaints	0.89	0.91	0.90	Medium
Appeals	0.91	0.88	0.89	Low
Urgent/hardship	0.96	0.93	0.94	Low
Other/Unclear	0.78	0.82	0.80	Medium

Challenges and Lessons Learned¶

Challenge 1: Data Quality¶

Issue: Historical labels were inconsistent across teams Solution: Invested 8 weeks in data cleaning and relabeling Lesson: Budget significant time for data preparation (30% of project)

Challenge 2: Staff Concerns¶

Issue: Staff worried about job security Solution: Clear communication that AI handles triage, not decisions Lesson: Involve staff early, show how AI helps not replaces

Challenge 3: Edge Cases¶

Issue: Multi-topic correspondence confused the model Solution: Added multi-label capability and human review routing Lesson: Design for edge cases from the start

Challenge 4: Model Drift¶

Issue: Accuracy dropped 5% after 3 months Solution: Implemented monthly retraining with recent data Lesson: Plan for ongoing model maintenance

Challenge 5: Privacy Concerns¶

Issue: Questions about AI reading citizen correspondence Solution: Clear privacy documentation, data minimization Lesson: Address privacy proactively with stakeholders

Governance and Compliance¶

Governance Structure¶

Executive sponsor: Director of Service Delivery
Governance board oversight: Quarterly review
Ethics review: Completed prior to deployment
Risk tier: Tier 2 (Medium)

Compliance Measures¶

Privacy Impact Assessment: Completed
Security assessment: Completed
Model card: Published internally
Audit trail: All classifications logged
Human oversight: Staff can override any classification

Fairness Considerations¶

Tested for geographic bias: None detected
Tested for language patterns: Minor adjustments made
Complaint routing accuracy equal across demographics

Technical Details¶

Model Specifications¶

Base model: DistilBERT (66M parameters)
Fine-tuned on: Agency-specific data
Input: Document text (max 512 tokens)
Output: 15 class probabilities
Confidence calibration: Platt scaling

Infrastructure¶

Training: AWS SageMaker
Serving: API on agency cloud (AWS)
Throughput: 1,000 documents/minute
Latency: <500ms per document
Availability: 99.9%

Integration Points¶

Email gateway (incoming mail)
Document management system
Case management system
Reporting dashboard

Recommendations for Similar Projects¶

Do¶

Invest heavily in data quality
Engage staff and unions early
Design for human oversight
Plan for ongoing retraining
Start with pilot before full rollout
Measure and communicate benefits

Don't¶

Underestimate data preparation time
Deploy without monitoring
Ignore edge cases
Skip change management
Assume one-time model training is sufficient

Cost-Benefit Summary¶

Costs (First Year)¶

Item	Cost
Discovery & planning	$75,000
Data preparation	$120,000
Model development	$180,000
Integration	$90,000
Change management	$45,000
Infrastructure	$36,000
Total Year 1	$546,000

Ongoing Costs (Annual)¶

Item	Cost
Infrastructure	$36,000
Model maintenance	$80,000
Support	$40,000
Total Annual	$156,000

Benefits (Annual)¶

Item	Value
FTE savings	$960,000
Improved response times	Qualitative
Staff satisfaction	Qualitative
Annual Net Benefit	$804,000

ROI: 147% (Year 1) | Payback: 8 months¶

Contact¶

For more information about this case study, contact the AI Toolkit team.

Related documents: AI Use Case Template | ROI Calculator

Case Study: Intelligent Document Classification¶

Executive Summary¶

The Challenge¶

Situation¶

Problems¶

Business Impact¶

The Solution¶

AI Approach¶

System Design¶

Categories Classified¶

Key Design Decisions¶

Implementation¶

Timeline¶

Team¶

Data Preparation¶

Results¶

Performance Metrics¶

Business Impact¶

Accuracy by Category¶

Challenges and Lessons Learned¶

Challenge 1: Data Quality¶

Challenge 2: Staff Concerns¶

Challenge 3: Edge Cases¶

Challenge 4: Model Drift¶

Challenge 5: Privacy Concerns¶

Governance and Compliance¶

Governance Structure¶

Compliance Measures¶

Fairness Considerations¶

Technical Details¶

Model Specifications¶

Infrastructure¶

Integration Points¶

Recommendations for Similar Projects¶

Do¶

Don't¶

Cost-Benefit Summary¶

Costs (First Year)¶

Ongoing Costs (Annual)¶

Benefits (Annual)¶

ROI: 147% (Year 1) | Payback: 8 months¶

Contact¶