Improve a Struggling Model¶

Your AI model isn't performing as expected. This journey helps you diagnose the problem and get it back on track.

Step 1

Diagnose

Step 2

Analyse

Step 3

Fix

Step 4

Validate

Step 5

Prevent

Step 1: Diagnose the Problem¶

First, understand exactly what's going wrong.

Define "Struggling"¶

Symptom	Possible Issues
Accuracy dropped	Model drift, data drift, concept drift
Biased outcomes	Training data issues, feature problems
Slow responses	Infrastructure, model complexity, load
Inconsistent results	Randomness, data quality, edge cases
User complaints	Expectation mismatch, UI issues, actual errors
High costs	Inefficiency, over-engineering, usage patterns

Quantify the Problem¶

Before fixing, measure:

What's the current performance? (Use Data Quality Analyzer if relevant)
What was the expected/historical performance?
When did the problem start?
Is it getting worse, stable, or intermittent?
Who/what is affected most?

Traditional ML/AIGenAI Systems

Key diagnostic metrics:

Accuracy, precision, recall, F1
Prediction distribution
Feature importance changes
Input data statistics
Latency and throughput

Key diagnostic metrics:

Output quality scores
User satisfaction ratings
Guardrail trigger rates
Task completion rates
Hallucination frequency

Step 2: Analyse Root Causes¶

Dig into why performance has degraded.

Common Causes by Category¶

Data Issues¶

Problem	Signs	Investigation
Data drift	Input distributions changed	Compare recent vs training data stats
Data quality	More nulls, errors, outliers	Run Data Quality Analyzer
Label drift	Ground truth meanings changed	Review labeling process, sample recent labels
Data leakage	Training looked too good	Check for future data in training

Model Issues¶

Problem	Signs	Investigation
Concept drift	Relationship between inputs and outputs changed	Compare predictions across time periods
Overfitting	Training was good, production isn't	Check train vs test vs production performance
Underfitting	Never worked well	Review model complexity, features
Stale model	World changed, model didn't	Check when last retrained

System Issues¶

Problem	Signs	Investigation
Infrastructure	Latency, timeouts, errors	Check system metrics, logs
Integration	Data not flowing correctly	Validate pipeline end-to-end
Configuration	Recent changes broke things	Review deployment history
Dependencies	Library/service changes	Check version changes

GenAI-Specific Issues¶

Problem	Signs	Investigation
Prompt degradation	Quality dropped after changes	Compare prompt versions
Context limitations	Long inputs losing information	Check token usage, context window
Model updates	Vendor changed underlying model	Check vendor announcements
Guardrail over-filtering	Too many blocked responses	Review filter logs

Step 3: Implement Fixes¶

Address the root cause, not just the symptoms.

Fix Strategies by Cause¶

Data FixesModel FixesSystem FixesGenAI Fixes

Cause	Fix Options
Data drift	Retrain on recent data, add monitoring
Data quality	Improve pipelines, add validation
Label drift	Update labeling guidelines, relabel
Missing data	Add data sources, improve collection

Cause	Fix Options
Concept drift	Retrain, adjust features, change algorithm
Overfitting	Regularisation, more data, simpler model
Underfitting	More features, more complex model, more data
Stale model	Establish retraining cadence

Cause	Fix Options
Infrastructure	Scale resources, optimise queries
Integration	Fix pipelines, add validation
Configuration	Rollback, fix config, add testing
Dependencies	Update, pin versions, add tests

Cause	Fix Options
Prompt issues	Refine prompts, add examples
Context issues	Chunk inputs, summarise, prioritise
Model changes	Adapt prompts, switch providers
Guardrails	Tune thresholds, refine rules

Before Deploying Fixes¶

Fix tested in non-production environment
Rollback plan in place
Stakeholders aware of changes
Monitoring ready to validate

Step 4: Validate Improvements¶

Confirm the fix worked.

Validation Approach¶

Baseline - Document pre-fix performance
Deploy - Roll out fix (ideally gradually)
Monitor - Watch key metrics closely
Compare - Measure against baseline
Confirm - Statistical significance check

What to Measure¶

Traditional ML/AIGenAI Systems

Accuracy/precision/recall on recent data
Prediction distribution
Error patterns
Latency
User feedback

Output quality ratings
User satisfaction
Task completion
Guardrail patterns
Cost metrics

If Fix Didn't Work¶

Don't panic—roll back if needed
Revisit diagnosis (was root cause correct?)
Consider if problem is more fundamental
May need Worried About a Project journey

Step 5: Prevent Recurrence¶

Set up systems to catch problems earlier.

Monitoring Improvements¶

Read: Monitoring Guide

Implement:

Process Improvements¶

Retraining Strategy¶

Approach	When to Use
Scheduled	Stable environment, predictable drift
Triggered	Performance threshold breached
Continuous	High volume, fast-changing domain
Manual	Low volume, high stakes, complex

When to Escalate¶

Consider more drastic action if:

Multiple fix attempts have failed
Root cause can't be identified
Fundamental assumptions are wrong
Cost to fix exceeds value
Users have lost confidence

See: Worried About a Project See: Shut Down an AI System

Check for Bias - if fairness is the issue
Respond to an Incident - if this is urgent
Prepare for an Audit - if explaining performance issues