Skip to content

Case Study: AI-Assisted Grants Assessment

Case Study

Key Result: 40% improvement in assessment consistency, 50% reduction in processing time, full human oversight maintained for all funding decisions.
Agency Type Grants Administration
Domain Funding Programs
Challenge Efficient and consistent assessment of grant applications
AI Approach Multi-component (NLP + scoring + anomaly detection)

Executive Summary

A federal grants administration agency implemented an AI-assisted assessment system to support human assessors in evaluating grant applications. The system improved assessment consistency by 40%, reduced processing time by 50%, and enabled faster funding decisions while maintaining full human oversight.


The Challenge

Situation

  • 25,000+ grant applications annually across 15 programs
  • $500M in annual grants administered
  • 120 assessors across multiple locations
  • 8-12 week average assessment time
  • Inconsistent assessment quality across assessors

Problems

  • Assessment variability between assessors
  • Long processing times delayed funding
  • Assessors spent excessive time on administrative tasks
  • Difficulty identifying high-potential applications quickly
  • Limited capacity for thorough due diligence

Business Impact

  • Applicant satisfaction declining
  • Ministerial pressure on processing times
  • Concerns about assessment fairness
  • Staff overwhelmed during peak periods
  • Audit findings on consistency issues

The Solution

AI Approach

Model Type: Multi-component system (NLP + scoring + anomaly detection) Architecture: Transformer-based text analysis + rule-based scoring Integration: Grants management system

System Design

flowchart LR
    subgraph APP["<strong>Application</strong>"]
        A1[Documents]
        A2[Budget]
        A3[Attachments]
        A4[History]
    end

    subgraph DOC["<strong>Document Processing</strong>"]
        D1[Extract Sections]
        D2[Parse Numbers]
        D3[Validate Complete]
    end

    subgraph AI["<strong>AI Analysis</strong>"]
        AI1[Criteria Matching]
        AI2[Risk Flags]
        AI3[Similar Apps]
    end

    subgraph OUT["<strong>Assessor Interface</strong>"]
        O1[Assessment Workbench]
        O2[AI Insights]
        O3[Suggested Questions]
    end

    APP --> DOC --> AI --> OUT
    OUT --> DEC[Human Decision<br/>Required]

    style APP fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style DOC fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style AI fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style OUT fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style DEC fill:#ffcc80,stroke:#ef6c00,stroke-width:2px

AI Components

1. Document Processing - OCR for scanned documents - Section extraction and parsing - Budget parsing and validation - Completeness checking

2. Criteria Matching - NLP analysis of application against criteria - Evidence extraction for each criterion - Strength/weakness identification - Gap detection

3. Risk Flagging - Financial viability indicators - Applicant history analysis - Budget reasonableness checks - Duplication detection

4. Similar Application Matching - Find similar historical applications - Show outcomes of similar applications - Identify potential duplicates

Key Design Principles

Principle Implementation
Human-in-the-loop All decisions made by human assessors
Transparency AI provides evidence, not just scores
Consistency Same application → same AI output
Explainability Clear reasoning for all flags and suggestions
Auditability Full log of AI contributions

Implementation

Timeline

Phase Duration Activities
Discovery 10 weeks Requirements, ethics review, data assessment
Design 8 weeks Workflow design, AI component design
Development 16 weeks Build and train AI components
Integration 10 weeks Grants system integration
Pilot 12 weeks Two programs, assessor feedback
Rollout 12 weeks All programs phased
Total 68 weeks

Team

Role FTE Responsibility
Product Owner 1.0 Requirements, stakeholder management
Data Scientist 2.0 AI model development
NLP Specialist 1.0 Text analysis components
Data Engineer 1.0 Data pipelines
UX Designer 0.5 Assessor interface
Change Manager 0.5 Assessor adoption
Program Expert 0.5 Domain expertise

Model Training

Data Sources: - 75,000 historical applications (5 years) - Assessment reports and decisions - Criteria guidelines for each program - Financial reports of funded applicants

Labeling: - Successful/unsuccessful decisions - Assessor scores by criterion - Risk flags from historical reviews - Due diligence outcomes

Validation: - Expert assessor review of AI outputs - A/B testing with assessor panels - Accuracy testing against historical decisions


Results

Assessment Quality

Metric Before After Improvement
Inter-assessor consistency 68% 89% +31%
Criteria coverage 78% 96% +23%
Risk identification rate 45% 82% +82%
Assessment completeness 82% 98% +20%

Efficiency Gains

Metric Before After Improvement
Average assessment time 4.2 hours 2.1 hours -50%
Time to decision 58 days 32 days -45%
Administrative tasks 40% of time 15% of time -63%
Applications assessed per assessor 180/year 290/year +61%

Quality Indicators

Metric Before After
Appeals upheld 12% 6%
Audit findings 8 per audit 2 per audit
Applicant satisfaction 3.⅘ 4.⅖
Assessor satisfaction 3.⅕ 4.0/5

Fairness Outcomes

Applicant Type Success Rate Change Status
First-time applicants +2.1% Improved
Small organizations +1.8% Improved
Regional applicants +0.9% Improved
Large organizations -0.5% Acceptable
Indigenous organizations +2.4% Improved

Challenges and Lessons Learned

Challenge 1: Assessor Concerns

Issue: Assessors worried AI would replace them Solution: - Clear communication: AI assists, humans decide - Assessors can override/ignore AI suggestions - AI handles admin, assessors focus on judgment Lesson: Position AI as tool, not replacement

Challenge 2: Varied Program Criteria

Issue: Different programs had different criteria styles Solution: - Program-specific criteria models - Common framework with program adaptations - Easy update mechanism for criteria changes Lesson: Build flexible architecture for variety

Challenge 3: Historical Bias

Issue: Historical decisions might embed bias Solution: - Fairness testing across applicant types - Removed biased features (e.g., organization name) - Human decision required for all outcomes Lesson: AI surfaces insights, doesn't make decisions

Challenge 4: Explainability for Applicants

Issue: Applicants wanted to understand assessments Solution: - AI evidence used in feedback letters - Clear mapping to criteria - Improvement suggestions generated Lesson: AI can improve applicant communication

Challenge 5: Gaming Prevention

Issue: Applicants might optimize for AI rather than quality Solution: - AI criteria matching not visible to applicants - Human judgment required for approval - Regular model updates Lesson: Keep some AI logic confidential


Governance and Compliance

Governance Structure

  • Executive sponsor: Branch Head, Grants Administration
  • Program governance: Program managers committee
  • Ethics oversight: Ethics committee review
  • Risk tier: Tier 3 (High) - Affects funding decisions

Human Oversight Requirements

Stage Human Role AI Role
Application receipt Monitor Process documents
Initial screening Approve Flag issues
Detailed assessment Assess and decide Provide insights
Recommendation Recommend Support with evidence
Final decision Decide Not involved

Compliance Measures

  • Grants CPGs (Commonwealth Grants Rules and Guidelines)
  • Public Governance Framework
  • Anti-discrimination legislation
  • Privacy Act (applicant data)
  • Administrative law (fair process)

Transparency

To Applicants: - Notification that AI assists assessment - Human makes all decisions - Right to appeal - Feedback includes evidence-based reasoning

To Assessors: - Full visibility of AI reasoning - Ability to override any AI output - Training on AI capabilities and limitations


Technical Details

AI Components

Document Processing: - OCR: Tesseract + Azure Form Recognizer - Section extraction: Custom NER model - Budget parsing: Rule-based + ML validation

Criteria Matching: - Base model: DistilBERT fine-tuned - Evidence extraction: Named entity recognition - Semantic similarity: Sentence transformers - Coverage analysis: Rule-based

Risk Flagging: - Financial risk: Gradient Boosted Trees - History analysis: Database queries + rules - Anomaly detection: Isolation Forest

Similar Applications: - Embedding: Sentence transformers - Search: Approximate nearest neighbors (FAISS) - Threshold: Human-tuned similarity cutoff

Infrastructure

  • Training: AWS SageMaker
  • Serving: Agency cloud (AWS)
  • Integration: API to grants management system
  • Storage: Application data in existing system
  • Monitoring: Custom dashboard

Performance

  • Document processing: <2 minutes per application
  • Analysis: <30 seconds per application
  • Availability: 99.5%
  • Model refresh: Quarterly

Recommendations for Similar Projects

Do

  • Design for human decision-making throughout
  • Involve assessors in design and testing
  • Build explainability from the start
  • Test for fairness across applicant types
  • Maintain complete audit trail
  • Plan for criteria changes

Don't

  • Let AI make funding decisions
  • Ignore assessor concerns
  • Rely solely on historical data patterns
  • Reveal AI details to applicants (gaming risk)
  • Skip administrative law review
  • Assume one model fits all programs

Cost-Benefit Summary

Costs (First Year)

Item Cost
Discovery & design $150,000
AI development $320,000
Integration $180,000
Pilot $100,000
Change management $80,000
Infrastructure $70,000
Total Year 1 $900,000

Ongoing Costs (Annual)

Item Cost
Infrastructure $80,000
Model maintenance $120,000
Support $60,000
Total Annual $260,000

Benefits (Annual)

Item Value
Assessor efficiency gains $1,400,000
Reduced appeals costs $120,000
Faster decisions (applicant value) $300,000
Quality improvements (est.) $200,000
Annual Benefit $2,020,000

ROI: 124% | Payback: 7 months


Contact

For more information about this case study, contact the AI Toolkit team.


Related documents: AI Governance Framework | How to Explainability