Skip to content

Improve a Struggling Model

Your AI model isn't performing as expected. This journey helps you diagnose the problem and get it back on track.

Step 1
Diagnose
Step 2
Analyse
Step 3
Fix
Step 4
Validate
Step 5
Prevent

Step 1: Diagnose the Problem

First, understand exactly what's going wrong.

Define "Struggling"

Symptom Possible Issues
Accuracy dropped Model drift, data drift, concept drift
Biased outcomes Training data issues, feature problems
Slow responses Infrastructure, model complexity, load
Inconsistent results Randomness, data quality, edge cases
User complaints Expectation mismatch, UI issues, actual errors
High costs Inefficiency, over-engineering, usage patterns

Quantify the Problem

Before fixing, measure:

  • What's the current performance? (Use Data Quality Analyzer if relevant)
  • What was the expected/historical performance?
  • When did the problem start?
  • Is it getting worse, stable, or intermittent?
  • Who/what is affected most?

Key diagnostic metrics:

  • Accuracy, precision, recall, F1
  • Prediction distribution
  • Feature importance changes
  • Input data statistics
  • Latency and throughput

Key diagnostic metrics:

  • Output quality scores
  • User satisfaction ratings
  • Guardrail trigger rates
  • Task completion rates
  • Hallucination frequency

Step 2: Analyse Root Causes

Dig into why performance has degraded.

Common Causes by Category

Data Issues

Problem Signs Investigation
Data drift Input distributions changed Compare recent vs training data stats
Data quality More nulls, errors, outliers Run Data Quality Analyzer
Label drift Ground truth meanings changed Review labeling process, sample recent labels
Data leakage Training looked too good Check for future data in training

Model Issues

Problem Signs Investigation
Concept drift Relationship between inputs and outputs changed Compare predictions across time periods
Overfitting Training was good, production isn't Check train vs test vs production performance
Underfitting Never worked well Review model complexity, features
Stale model World changed, model didn't Check when last retrained

System Issues

Problem Signs Investigation
Infrastructure Latency, timeouts, errors Check system metrics, logs
Integration Data not flowing correctly Validate pipeline end-to-end
Configuration Recent changes broke things Review deployment history
Dependencies Library/service changes Check version changes

GenAI-Specific Issues

Problem Signs Investigation
Prompt degradation Quality dropped after changes Compare prompt versions
Context limitations Long inputs losing information Check token usage, context window
Model updates Vendor changed underlying model Check vendor announcements
Guardrail over-filtering Too many blocked responses Review filter logs

Step 3: Implement Fixes

Address the root cause, not just the symptoms.

Fix Strategies by Cause

Cause Fix Options
Data drift Retrain on recent data, add monitoring
Data quality Improve pipelines, add validation
Label drift Update labeling guidelines, relabel
Missing data Add data sources, improve collection
Cause Fix Options
Concept drift Retrain, adjust features, change algorithm
Overfitting Regularisation, more data, simpler model
Underfitting More features, more complex model, more data
Stale model Establish retraining cadence
Cause Fix Options
Infrastructure Scale resources, optimise queries
Integration Fix pipelines, add validation
Configuration Rollback, fix config, add testing
Dependencies Update, pin versions, add tests
Cause Fix Options
Prompt issues Refine prompts, add examples
Context issues Chunk inputs, summarise, prioritise
Model changes Adapt prompts, switch providers
Guardrails Tune thresholds, refine rules

Before Deploying Fixes

  • Fix tested in non-production environment
  • Rollback plan in place
  • Stakeholders aware of changes
  • Monitoring ready to validate

Step 4: Validate Improvements

Confirm the fix worked.

Validation Approach

  1. Baseline - Document pre-fix performance
  2. Deploy - Roll out fix (ideally gradually)
  3. Monitor - Watch key metrics closely
  4. Compare - Measure against baseline
  5. Confirm - Statistical significance check

What to Measure

  • Accuracy/precision/recall on recent data
  • Prediction distribution
  • Error patterns
  • Latency
  • User feedback
  • Output quality ratings
  • User satisfaction
  • Task completion
  • Guardrail patterns
  • Cost metrics

If Fix Didn't Work

  • Don't panic—roll back if needed
  • Revisit diagnosis (was root cause correct?)
  • Consider if problem is more fundamental
  • May need Worried About a Project journey

Step 5: Prevent Recurrence

Set up systems to catch problems earlier.

Monitoring Improvements

Read: Monitoring Guide

Implement:

  • Data drift detection
  • Performance threshold alerts
  • Input distribution monitoring
  • User feedback collection
  • Automated testing in production

Process Improvements

  • Regular model performance reviews
  • Retraining triggers defined
  • Escalation paths documented
  • Lessons learned captured
  • Runbook updated

Retraining Strategy

Approach When to Use
Scheduled Stable environment, predictable drift
Triggered Performance threshold breached
Continuous High volume, fast-changing domain
Manual Low volume, high stakes, complex

When to Escalate

Consider more drastic action if:

  • Multiple fix attempts have failed
  • Root cause can't be identified
  • Fundamental assumptions are wrong
  • Cost to fix exceeds value
  • Users have lost confidence

See: Worried About a Project See: Shut Down an AI System