Skip to content

How to Implement AI Explainability

Ready to Use

Quick Reference
  • Legal basis: Administrative law, FOI, natural justice require transparency
  • Three levels: Global (model behavior), Local (individual decisions), Feature importance
  • Audience matters: Technical vs. citizen-facing explanations differ
  • Technique depends on model: Interpretable models vs. black-box with SHAP/LIME

Purpose

This guide provides practical approaches for making AI systems explainable, a critical requirement for government AI that affects citizens and requires accountability.


Why Explainability Matters

  • Administrative law: Citizens have a right to understand decisions affecting them
  • Freedom of Information: Agencies must explain decision-making processes
  • Natural justice: Procedural fairness requires transparency
  • AI Ethics Framework: Australian Government requires explainable AI

Practical Benefits

  • Builds public trust
  • Enables error detection and correction
  • Supports appeals and reviews
  • Helps identify bias
  • Improves model debugging

Explainability Levels

Level 1: Global Explainability

Explains the overall model behavior - what the model has learned.

Use when: - Documenting model for stakeholders - Audit and compliance reporting - Understanding general model behavior

Level 2: Local Explainability

Explains individual predictions - why this specific decision was made.

Use when: - Citizen-facing explanations - Case review and appeals - Debugging specific errors

Level 3: Feature Importance

Identifies which inputs matter most.

Use when: - Model validation - Identifying key decision factors - Simplifying explanations


Approach by Model Type

Inherently Interpretable Models

Some models are explainable by design:

Model Explainability Pros Cons
Decision Trees Very High Visual rules May overfit
Logistic Regression High Coefficient interpretation Linear only
Rule-Based Systems Very High Clear logic Manual maintenance
Scoring Models High Point-based Limited complexity

Recommendation: For high-stakes government decisions, prefer inherently interpretable models when accuracy requirements allow.

Post-Hoc Explainability for Complex Models

When using neural networks, ensemble models, or other complex models:

Technique Type Works With Output
SHAP Local/Global Any model Feature contributions
LIME Local Any model Local approximation
Permutation Importance Global Any model Feature ranking
Partial Dependence Plots Global Any model Feature effects
Attention Visualization Local Transformers Token importance

Step 1: Choose Your Approach

Decision Framework

flowchart TB
    START([Start]) --> Q1{Accuracy requirement<br/>< 90%?}

    Q1 -->|Yes| INT["<strong>Interpretable Model</strong><br/>Decision Tree, Logistic Regression<br/>Built-in explainability"]

    Q1 -->|No| Q2{Is model a<br/>black box?}

    Q2 -->|No - Tree-based| TREE["<strong>TreeExplainer</strong><br/>SHAP<br/>Feature importance"]

    Q2 -->|Yes - Neural network| Q3{How many predictions<br/>need explaining?}

    Q3 -->|Individual cases| LIME["<strong>LIME or KernelSHAP</strong><br/>Local explanations"]

    Q3 -->|All predictions| KERNEL["<strong>KernelSHAP</strong><br/>or Surrogate Model"]

    style START fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style INT fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style TREE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style LIME fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style KERNEL fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px

Step 2: Implement Explainability

For Decision Trees

from sklearn.tree import DecisionTreeClassifier, export_text

# Train model
model = DecisionTreeClassifier(max_depth=5)
model.fit(X_train, y_train)

# Generate text explanation
tree_rules = export_text(model, feature_names=feature_names)
print(tree_rules)

# Visualize
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(20, 10))
plot_tree(model, feature_names=feature_names, class_names=['Deny', 'Approve'],
          filled=True, rounded=True)
plt.savefig('decision_tree.png', dpi=150, bbox_inches='tight')

For Logistic Regression

from sklearn.linear_model import LogisticRegression
import pandas as pd

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Get feature coefficients
coef_df = pd.DataFrame({
    'feature': feature_names,
    'coefficient': model.coef_[0],
    'odds_ratio': np.exp(model.coef_[0])
})
coef_df = coef_df.sort_values('coefficient', key=abs, ascending=False)
print(coef_df)

# Explain individual prediction
def explain_prediction(model, X_instance, feature_names):
    """Generate human-readable explanation for logistic regression."""
    contributions = X_instance * model.coef_[0]
    baseline = model.intercept_[0]

    explanation = f"Starting score: {baseline:.2f}\n"
    for feature, contrib in sorted(zip(feature_names, contributions),
                                   key=lambda x: abs(x[1]), reverse=True)[:5]:
        direction = "increases" if contrib > 0 else "decreases"
        explanation += f"  {feature} {direction} score by {abs(contrib):.2f}\n"

    final_score = baseline + contributions.sum()
    probability = 1 / (1 + np.exp(-final_score))
    explanation += f"Final probability: {probability:.2%}"

    return explanation

Using SHAP (SHapley Additive exPlanations)

import shap

# For tree-based models (fast)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# For any model (slower)
explainer = shap.KernelExplainer(model.predict_proba, X_train[:100])
shap_values = explainer.shap_values(X_test[:10])

# Global explanation: Feature importance
shap.summary_plot(shap_values, X_test, feature_names=feature_names)

# Local explanation: Single prediction
shap.force_plot(explainer.expected_value, shap_values[0], X_test[0],
                feature_names=feature_names)

# Waterfall plot for single prediction
shap.plots.waterfall(shap_values[0])

Using LIME (Local Interpretable Model-agnostic Explanations)

from lime.lime_tabular import LimeTabularExplainer

# Create explainer
explainer = LimeTabularExplainer(
    training_data=X_train,
    feature_names=feature_names,
    class_names=['Deny', 'Approve'],
    mode='classification'
)

# Explain single prediction
explanation = explainer.explain_instance(
    X_test[0],
    model.predict_proba,
    num_features=10
)

# Show explanation
explanation.show_in_notebook()

# Get as text
print(explanation.as_list())

Step 3: Generate Human-Readable Explanations

Template for Citizen-Facing Explanations

def generate_citizen_explanation(prediction, shap_values, feature_names,
                                  feature_values, threshold=0.1):
    """
    Generate a human-readable explanation for a citizen.

    Returns explanation suitable for inclusion in a decision letter.
    """
    # Sort features by importance
    importance = list(zip(feature_names, shap_values, feature_values))
    importance.sort(key=lambda x: abs(x[1]), reverse=True)

    # Filter to significant factors
    significant = [(f, s, v) for f, s, v in importance if abs(s) > threshold]

    # Generate explanation
    lines = []
    lines.append("Factors considered in this decision:\n")

    for feature, shap_val, value in significant[:5]:
        # Translate feature names to plain English
        plain_name = FEATURE_TRANSLATIONS.get(feature, feature)

        if shap_val > 0:
            lines.append(f"• {plain_name}: Your {plain_name.lower()} ({value}) "
                        f"supported a positive outcome")
        else:
            lines.append(f"• {plain_name}: Your {plain_name.lower()} ({value}) "
                        f"was a factor against approval")

    lines.append("\nThis assessment was made using an automated system. "
                "If you believe this decision is incorrect, you have the right "
                "to request a human review.")

    return "\n".join(lines)

# Feature name translations
FEATURE_TRANSLATIONS = {
    'income_annual': 'Annual income',
    'employment_status': 'Employment status',
    'application_history': 'Previous applications',
    'residency_years': 'Years at current address',
}

Explanation Templates by Use Case

Eligibility Decision

Your application for [SERVICE] has been [APPROVED/DENIED].

Key factors in this decision:
• [Factor 1]: [Plain English explanation]
• [Factor 2]: [Plain English explanation]
• [Factor 3]: [Plain English explanation]

The automated assessment considered [N] factors in total.
[If denied]: You may request a review by contacting [CONTACT].

Risk Assessment

Risk Level: [LOW/MEDIUM/HIGH]

This assessment is based on:
• [Factor 1 - weight %]
• [Factor 2 - weight %]
• [Factor 3 - weight %]

Note: This is an initial assessment. A human reviewer will make
the final determination based on all available information.

Recommendation System

Recommended: [OPTION]

This recommendation considers:
• Your stated preference for [X]
• Similar users have found [Y] helpful
• [Z] is available in your area

This is a suggestion only. You may choose any available option.

Step 4: Document Model Behavior

Create a Model Card

Document how the model works:

## Model Behavior Summary

### What the model predicts
[Clear description of the prediction task]

### Key factors the model considers
1. [Factor 1] - [How it affects predictions]
2. [Factor 2] - [How it affects predictions]
3. [Factor 3] - [How it affects predictions]

### What the model does NOT consider
- [Explicitly excluded factors]

### Limitations
- [Known limitations]
- [Edge cases where model may be less accurate]

### Human oversight
[Description of human review process]

Step 5: Test Your Explanations

Explanation Quality Checklist

  • Accurate: Does the explanation reflect actual model behavior?
  • Complete: Are all significant factors included?
  • Comprehensible: Can a non-technical person understand it?
  • Consistent: Do similar cases get similar explanations?
  • Actionable: Can the person understand what would change the outcome?

User Testing

Test explanations with real users:

1. Show user the prediction and explanation
2. Ask: "Why do you think you received this outcome?"
3. Compare their understanding to actual factors
4. Ask: "What could you do differently?"
5. Verify their answer matches model behavior

Implementation Patterns

Pattern 1: Explanation API

class ExplainableModel:
    """Wrapper that adds explainability to any model."""

    def __init__(self, model, explainer_type='shap'):
        self.model = model
        self.explainer_type = explainer_type
        self.explainer = None

    def fit(self, X, y):
        self.model.fit(X, y)
        if self.explainer_type == 'shap':
            self.explainer = shap.TreeExplainer(self.model)
        return self

    def predict(self, X):
        return self.model.predict(X)

    def explain(self, X, format='technical'):
        """Generate explanation for predictions."""
        shap_values = self.explainer.shap_values(X)

        if format == 'technical':
            return shap_values
        elif format == 'citizen':
            return [generate_citizen_explanation(
                self.predict(x.reshape(1, -1))[0],
                sv,
                self.feature_names,
                x
            ) for x, sv in zip(X, shap_values)]

Pattern 2: Logging Explanations

import json
from datetime import datetime

def log_prediction_with_explanation(prediction_id, model, X, prediction,
                                    shap_values, user_id=None):
    """Log prediction with explanation for audit trail."""
    log_entry = {
        'prediction_id': prediction_id,
        'timestamp': datetime.now().isoformat(),
        'user_id': user_id,
        'input_features': X.tolist(),
        'prediction': prediction,
        'model_version': model.version,
        'explanation': {
            'shap_values': shap_values.tolist(),
            'feature_importance': dict(zip(
                model.feature_names,
                sorted(zip(model.feature_names, abs(shap_values)),
                       key=lambda x: x[1], reverse=True)
            ))
        }
    }

    # Store in audit log
    with open('prediction_log.jsonl', 'a') as f:
        f.write(json.dumps(log_entry) + '\n')

    return log_entry

Quick Reference

When to Use Each Technique

Technique Best For Avoid When
SHAP Accurate importance, any model Very large datasets
LIME Quick local explanations Needs global view
Decision Tree Full transparency Need high accuracy
Feature Importance Quick overview Need case-by-case explanation
Attention NLP models Non-transformer models

Explainability Checklist

Before Deployment: - [ ] Choose explainability approach - [ ] Implement explanation generation - [ ] Test explanations with users - [ ] Document model behavior - [ ] Create citizen-facing templates - [ ] Set up explanation logging

After Deployment: - [ ] Monitor explanation quality - [ ] Collect user feedback - [ ] Update explanations as model changes - [ ] Audit explanation accuracy


Resources

Libraries

  • SHAP: pip install shap
  • LIME: pip install lime
  • InterpretML: pip install interpret
  • ELI5: pip install eli5

Further Reading

  • Christoph Molnar's "Interpretable Machine Learning" (online book)
  • OECD Principles on AI Transparency
  • Australian Government AI Ethics Framework