Improve a Struggling Model¶
Your AI model isn't performing as expected. This journey helps you diagnose the problem and get it back on track.
Step 1
Diagnose
Step 2
Analyse
Step 3
Fix
Step 4
Validate
Step 5
Prevent
Step 1: Diagnose the Problem¶
First, understand exactly what's going wrong.
Define "Struggling"¶
| Symptom | Possible Issues |
|---|---|
| Accuracy dropped | Model drift, data drift, concept drift |
| Biased outcomes | Training data issues, feature problems |
| Slow responses | Infrastructure, model complexity, load |
| Inconsistent results | Randomness, data quality, edge cases |
| User complaints | Expectation mismatch, UI issues, actual errors |
| High costs | Inefficiency, over-engineering, usage patterns |
Quantify the Problem¶
Before fixing, measure:
- What's the current performance? (Use Data Quality Analyzer if relevant)
- What was the expected/historical performance?
- When did the problem start?
- Is it getting worse, stable, or intermittent?
- Who/what is affected most?
Key diagnostic metrics:
- Accuracy, precision, recall, F1
- Prediction distribution
- Feature importance changes
- Input data statistics
- Latency and throughput
Key diagnostic metrics:
- Output quality scores
- User satisfaction ratings
- Guardrail trigger rates
- Task completion rates
- Hallucination frequency
Step 2: Analyse Root Causes¶
Dig into why performance has degraded.
Common Causes by Category¶
Data Issues¶
| Problem | Signs | Investigation |
|---|---|---|
| Data drift | Input distributions changed | Compare recent vs training data stats |
| Data quality | More nulls, errors, outliers | Run Data Quality Analyzer |
| Label drift | Ground truth meanings changed | Review labeling process, sample recent labels |
| Data leakage | Training looked too good | Check for future data in training |
Model Issues¶
| Problem | Signs | Investigation |
|---|---|---|
| Concept drift | Relationship between inputs and outputs changed | Compare predictions across time periods |
| Overfitting | Training was good, production isn't | Check train vs test vs production performance |
| Underfitting | Never worked well | Review model complexity, features |
| Stale model | World changed, model didn't | Check when last retrained |
System Issues¶
| Problem | Signs | Investigation |
|---|---|---|
| Infrastructure | Latency, timeouts, errors | Check system metrics, logs |
| Integration | Data not flowing correctly | Validate pipeline end-to-end |
| Configuration | Recent changes broke things | Review deployment history |
| Dependencies | Library/service changes | Check version changes |
GenAI-Specific Issues¶
| Problem | Signs | Investigation |
|---|---|---|
| Prompt degradation | Quality dropped after changes | Compare prompt versions |
| Context limitations | Long inputs losing information | Check token usage, context window |
| Model updates | Vendor changed underlying model | Check vendor announcements |
| Guardrail over-filtering | Too many blocked responses | Review filter logs |
Step 3: Implement Fixes¶
Address the root cause, not just the symptoms.
Fix Strategies by Cause¶
| Cause | Fix Options |
|---|---|
| Data drift | Retrain on recent data, add monitoring |
| Data quality | Improve pipelines, add validation |
| Label drift | Update labeling guidelines, relabel |
| Missing data | Add data sources, improve collection |
| Cause | Fix Options |
|---|---|
| Concept drift | Retrain, adjust features, change algorithm |
| Overfitting | Regularisation, more data, simpler model |
| Underfitting | More features, more complex model, more data |
| Stale model | Establish retraining cadence |
| Cause | Fix Options |
|---|---|
| Infrastructure | Scale resources, optimise queries |
| Integration | Fix pipelines, add validation |
| Configuration | Rollback, fix config, add testing |
| Dependencies | Update, pin versions, add tests |
| Cause | Fix Options |
|---|---|
| Prompt issues | Refine prompts, add examples |
| Context issues | Chunk inputs, summarise, prioritise |
| Model changes | Adapt prompts, switch providers |
| Guardrails | Tune thresholds, refine rules |
Before Deploying Fixes¶
- Fix tested in non-production environment
- Rollback plan in place
- Stakeholders aware of changes
- Monitoring ready to validate
Step 4: Validate Improvements¶
Confirm the fix worked.
Validation Approach¶
- Baseline - Document pre-fix performance
- Deploy - Roll out fix (ideally gradually)
- Monitor - Watch key metrics closely
- Compare - Measure against baseline
- Confirm - Statistical significance check
What to Measure¶
- Accuracy/precision/recall on recent data
- Prediction distribution
- Error patterns
- Latency
- User feedback
- Output quality ratings
- User satisfaction
- Task completion
- Guardrail patterns
- Cost metrics
If Fix Didn't Work¶
- Don't panic—roll back if needed
- Revisit diagnosis (was root cause correct?)
- Consider if problem is more fundamental
- May need Worried About a Project journey
Step 5: Prevent Recurrence¶
Set up systems to catch problems earlier.
Monitoring Improvements¶
Read: Monitoring Guide
Implement:
- Data drift detection
- Performance threshold alerts
- Input distribution monitoring
- User feedback collection
- Automated testing in production
Process Improvements¶
- Regular model performance reviews
- Retraining triggers defined
- Escalation paths documented
- Lessons learned captured
- Runbook updated
Retraining Strategy¶
| Approach | When to Use |
|---|---|
| Scheduled | Stable environment, predictable drift |
| Triggered | Performance threshold breached |
| Continuous | High volume, fast-changing domain |
| Manual | Low volume, high stakes, complex |
When to Escalate¶
Consider more drastic action if:
- Multiple fix attempts have failed
- Root cause can't be identified
- Fundamental assumptions are wrong
- Cost to fix exceeds value
- Users have lost confidence
See: Worried About a Project See: Shut Down an AI System
Related Journeys¶
- Check for Bias - if fairness is the issue
- Respond to an Incident - if this is urgent
- Prepare for an Audit - if explaining performance issues