Case Study: Intelligent Document Classification¶
Case Study
| Agency Type | Service Delivery |
| Domain | Citizen Services |
| Challenge | Processing high volumes of citizen correspondence |
| AI Approach | Multi-class text classification (fine-tuned DistilBERT) |
Executive Summary¶
A service delivery agency implemented an AI-powered document classification system to automatically categorize incoming correspondence, reducing manual triage time by 65% and improving response times to citizens by 40%.
The Challenge¶
Situation¶
- 500,000+ pieces of correspondence received annually
- Mix of emails, letters, forms, and scanned documents
- 15 different categories requiring specialized handling
- Average 3-day delay in initial triage
- High workload on frontline staff
Problems¶
- Manual classification was time-consuming and inconsistent
- Urgent matters were sometimes delayed
- Staff frustration from repetitive categorization work
- Citizen complaints about response times
- Difficulty tracking correspondence patterns
Business Impact¶
- 12 FTE dedicated to triage function
- Citizen satisfaction scores declining
- Ministerial complaints about delays
- Staff turnover in correspondence team
The Solution¶
AI Approach¶
Model Type: Multi-class text classification Architecture: Fine-tuned transformer (DistilBERT) Integration: Email gateway and document management system
System Design¶
flowchart LR
subgraph IN["<strong>Input</strong>"]
I1[Email/Letter]
I2[Scanned Document]
end
subgraph OCR["<strong>OCR/Text Extraction</strong>"]
O1[Clean Text]
end
subgraph CLASS["<strong>AI Classification</strong>"]
C1[15 Category Classifier]
C2[Confidence Score]
end
subgraph ROUTE["<strong>Routing</strong>"]
R1[Queues by category]
end
IN --> OCR --> CLASS --> ROUTE
CLASS --> HIGH[✓ High confidence<br/>Auto-route]
CLASS --> LOW[? Low confidence<br/>Human review]
style IN fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style OCR fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style CLASS fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style ROUTE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style HIGH fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
style LOW fill:#ffcc80,stroke:#ef6c00,stroke-width:2px Categories Classified¶
- New claims
- Claim updates
- Change of circumstances
- Payment enquiries
- Complaints
- Appeals
- Document submissions
- General enquiries
- Feedback (positive)
- Urgent/hardship
- Fraud reports
- Death notifications
- Address changes
- Representative requests
- Other/Unclear
Key Design Decisions¶
| Decision | Choice | Rationale |
|---|---|---|
| Model architecture | DistilBERT | Balance of accuracy and speed |
| Confidence threshold | 85% for auto-routing | Minimize misroutes |
| Human review | Below 85% confidence | Maintain quality |
| Priority detection | Keyword + model | Catch urgent matters |
| Retraining frequency | Monthly | Adapt to changes |
Implementation¶
Timeline¶
| Phase | Duration | Activities |
|---|---|---|
| Discovery | 6 weeks | Requirements, data assessment, stakeholder engagement |
| Data preparation | 8 weeks | Labeling, cleaning, quality assurance |
| Model development | 10 weeks | Training, testing, iteration |
| Integration | 6 weeks | System integration, workflow design |
| Pilot | 8 weeks | Limited rollout, monitoring, refinement |
| Full rollout | 4 weeks | Agency-wide deployment |
| Total | 42 weeks |
Team¶
| Role | FTE | Responsibility |
|---|---|---|
| Product Owner | 0.5 | Requirements, stakeholder management |
| Data Scientist | 2.0 | Model development |
| Data Engineer | 1.0 | Data pipelines |
| Integration Engineer | 1.0 | System integration |
| Business Analyst | 1.0 | Process design |
| Change Manager | 0.5 | Training, adoption |
Data Preparation¶
Training Data: - 100,000 labeled historical documents - Manual quality review of 10,000 samples - Inter-rater reliability testing (92% agreement) - Augmentation for underrepresented categories
Challenges: - Historical labels were inconsistent - Some categories had limited samples - OCR quality varied for scanned documents
Solutions: - Re-labeled 20,000 documents with new taxonomy - Synthetic oversampling for rare categories - OCR preprocessing pipeline with quality scoring
Results¶
Performance Metrics¶
| Metric | Before | After | Improvement |
|---|---|---|---|
| Triage time (average) | 3 days | 4 hours | 94% faster |
| Classification accuracy | 78% (human) | 91% (AI) | +13 points |
| Auto-routed (no human touch) | 0% | 72% | New capability |
| Urgent matters missed | 5% | 0.5% | 90% reduction |
| Staff satisfaction | 3.⅖ | 4.⅕ | +28% |
Business Impact¶
| Metric | Value |
|---|---|
| FTE reduction in triage | 8 FTE (67%) |
| Cost savings (annual) | $960,000 |
| Response time improvement | 40% faster |
| Citizen satisfaction increase | +12 points |
| Complaint reduction | 35% fewer |
Accuracy by Category¶
| Category | Precision | Recall | F1 | Volume |
|---|---|---|---|---|
| New claims | 0.94 | 0.92 | 0.93 | High |
| Payment enquiries | 0.93 | 0.95 | 0.94 | High |
| Complaints | 0.89 | 0.91 | 0.90 | Medium |
| Appeals | 0.91 | 0.88 | 0.89 | Low |
| Urgent/hardship | 0.96 | 0.93 | 0.94 | Low |
| Other/Unclear | 0.78 | 0.82 | 0.80 | Medium |
Challenges and Lessons Learned¶
Challenge 1: Data Quality¶
Issue: Historical labels were inconsistent across teams Solution: Invested 8 weeks in data cleaning and relabeling Lesson: Budget significant time for data preparation (30% of project)
Challenge 2: Staff Concerns¶
Issue: Staff worried about job security Solution: Clear communication that AI handles triage, not decisions Lesson: Involve staff early, show how AI helps not replaces
Challenge 3: Edge Cases¶
Issue: Multi-topic correspondence confused the model Solution: Added multi-label capability and human review routing Lesson: Design for edge cases from the start
Challenge 4: Model Drift¶
Issue: Accuracy dropped 5% after 3 months Solution: Implemented monthly retraining with recent data Lesson: Plan for ongoing model maintenance
Challenge 5: Privacy Concerns¶
Issue: Questions about AI reading citizen correspondence Solution: Clear privacy documentation, data minimization Lesson: Address privacy proactively with stakeholders
Governance and Compliance¶
Governance Structure¶
- Executive sponsor: Director of Service Delivery
- Governance board oversight: Quarterly review
- Ethics review: Completed prior to deployment
- Risk tier: Tier 2 (Medium)
Compliance Measures¶
- Privacy Impact Assessment: Completed
- Security assessment: Completed
- Model card: Published internally
- Audit trail: All classifications logged
- Human oversight: Staff can override any classification
Fairness Considerations¶
- Tested for geographic bias: None detected
- Tested for language patterns: Minor adjustments made
- Complaint routing accuracy equal across demographics
Technical Details¶
Model Specifications¶
- Base model: DistilBERT (66M parameters)
- Fine-tuned on: Agency-specific data
- Input: Document text (max 512 tokens)
- Output: 15 class probabilities
- Confidence calibration: Platt scaling
Infrastructure¶
- Training: AWS SageMaker
- Serving: API on agency cloud (AWS)
- Throughput: 1,000 documents/minute
- Latency: <500ms per document
- Availability: 99.9%
Integration Points¶
- Email gateway (incoming mail)
- Document management system
- Case management system
- Reporting dashboard
Recommendations for Similar Projects¶
Do¶
- Invest heavily in data quality
- Engage staff and unions early
- Design for human oversight
- Plan for ongoing retraining
- Start with pilot before full rollout
- Measure and communicate benefits
Don't¶
- Underestimate data preparation time
- Deploy without monitoring
- Ignore edge cases
- Skip change management
- Assume one-time model training is sufficient
Cost-Benefit Summary¶
Costs (First Year)¶
| Item | Cost |
|---|---|
| Discovery & planning | $75,000 |
| Data preparation | $120,000 |
| Model development | $180,000 |
| Integration | $90,000 |
| Change management | $45,000 |
| Infrastructure | $36,000 |
| Total Year 1 | $546,000 |
Ongoing Costs (Annual)¶
| Item | Cost |
|---|---|
| Infrastructure | $36,000 |
| Model maintenance | $80,000 |
| Support | $40,000 |
| Total Annual | $156,000 |
Benefits (Annual)¶
| Item | Value |
|---|---|
| FTE savings | $960,000 |
| Improved response times | Qualitative |
| Staff satisfaction | Qualitative |
| Annual Net Benefit | $804,000 |
ROI: 147% (Year 1) | Payback: 8 months¶
Contact¶
For more information about this case study, contact the AI Toolkit team.
Related documents: AI Use Case Template | ROI Calculator