How to Implement AI Explainability¶

Ready to Use

Quick Reference

Legal basis: Administrative law, FOI, natural justice require transparency
Three levels: Global (model behavior), Local (individual decisions), Feature importance
Audience matters: Technical vs. citizen-facing explanations differ
Technique depends on model: Interpretable models vs. black-box with SHAP/LIME

Purpose¶

This guide provides practical approaches for making AI systems explainable, a critical requirement for government AI that affects citizens and requires accountability.

Why Explainability Matters¶

Legal and Policy Requirements¶

Administrative law: Citizens have a right to understand decisions affecting them
Freedom of Information: Agencies must explain decision-making processes
Natural justice: Procedural fairness requires transparency
AI Ethics Framework: Australian Government requires explainable AI

Practical Benefits¶

Builds public trust
Enables error detection and correction
Supports appeals and reviews
Helps identify bias
Improves model debugging

Explainability Levels¶

Level 1: Global Explainability¶

Explains the overall model behavior - what the model has learned.

Use when: - Documenting model for stakeholders - Audit and compliance reporting - Understanding general model behavior

Level 2: Local Explainability¶

Explains individual predictions - why this specific decision was made.

Use when: - Citizen-facing explanations - Case review and appeals - Debugging specific errors

Level 3: Feature Importance¶

Identifies which inputs matter most.

Use when: - Model validation - Identifying key decision factors - Simplifying explanations

Approach by Model Type¶

Inherently Interpretable Models¶

Some models are explainable by design:

Model	Explainability	Pros	Cons
Decision Trees	Very High	Visual rules	May overfit
Logistic Regression	High	Coefficient interpretation	Linear only
Rule-Based Systems	Very High	Clear logic	Manual maintenance
Scoring Models	High	Point-based	Limited complexity

Recommendation: For high-stakes government decisions, prefer inherently interpretable models when accuracy requirements allow.

Post-Hoc Explainability for Complex Models¶

When using neural networks, ensemble models, or other complex models:

Technique	Type	Works With	Output
SHAP	Local/Global	Any model	Feature contributions
LIME	Local	Any model	Local approximation
Permutation Importance	Global	Any model	Feature ranking
Partial Dependence Plots	Global	Any model	Feature effects
Attention Visualization	Local	Transformers	Token importance

Step 1: Choose Your Approach¶

Decision Framework¶

flowchart TB
    START([Start]) --> Q1{Accuracy requirement<br/>< 90%?}

    Q1 -->|Yes| INT["<strong>Interpretable Model</strong><br/>Decision Tree, Logistic Regression<br/>Built-in explainability"]

    Q1 -->|No| Q2{Is model a<br/>black box?}

    Q2 -->|No - Tree-based| TREE["<strong>TreeExplainer</strong><br/>SHAP<br/>Feature importance"]

    Q2 -->|Yes - Neural network| Q3{How many predictions<br/>need explaining?}

    Q3 -->|Individual cases| LIME["<strong>LIME or KernelSHAP</strong><br/>Local explanations"]

    Q3 -->|All predictions| KERNEL["<strong>KernelSHAP</strong><br/>or Surrogate Model"]

    style START fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style INT fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style TREE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style LIME fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style KERNEL fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px

Step 2: Implement Explainability¶

For Decision Trees¶

from sklearn.tree import DecisionTreeClassifier, export_text

# Train model
model = DecisionTreeClassifier(max_depth=5)
model.fit(X_train, y_train)

# Generate text explanation
tree_rules = export_text(model, feature_names=feature_names)
print(tree_rules)

# Visualize
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(20, 10))
plot_tree(model, feature_names=feature_names, class_names=['Deny', 'Approve'],
          filled=True, rounded=True)
plt.savefig('decision_tree.png', dpi=150, bbox_inches='tight')

For Logistic Regression¶

from sklearn.linear_model import LogisticRegression
import pandas as pd

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Get feature coefficients
coef_df = pd.DataFrame({
    'feature': feature_names,
    'coefficient': model.coef_[0],
    'odds_ratio': np.exp(model.coef_[0])
})
coef_df = coef_df.sort_values('coefficient', key=abs, ascending=False)
print(coef_df)

# Explain individual prediction
def explain_prediction(model, X_instance, feature_names):
    """Generate human-readable explanation for logistic regression."""
    contributions = X_instance * model.coef_[0]
    baseline = model.intercept_[0]

    explanation = f"Starting score: {baseline:.2f}\n"
    for feature, contrib in sorted(zip(feature_names, contributions),
                                   key=lambda x: abs(x[1]), reverse=True)[:5]:
        direction = "increases" if contrib > 0 else "decreases"
        explanation += f"  {feature} {direction} score by {abs(contrib):.2f}\n"

    final_score = baseline + contributions.sum()
    probability = 1 / (1 + np.exp(-final_score))
    explanation += f"Final probability: {probability:.2%}"

    return explanation

Using SHAP (SHapley Additive exPlanations)¶

import shap

# For tree-based models (fast)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# For any model (slower)
explainer = shap.KernelExplainer(model.predict_proba, X_train[:100])
shap_values = explainer.shap_values(X_test[:10])

# Global explanation: Feature importance
shap.summary_plot(shap_values, X_test, feature_names=feature_names)

# Local explanation: Single prediction
shap.force_plot(explainer.expected_value, shap_values[0], X_test[0],
                feature_names=feature_names)

# Waterfall plot for single prediction
shap.plots.waterfall(shap_values[0])

Using LIME (Local Interpretable Model-agnostic Explanations)¶

from lime.lime_tabular import LimeTabularExplainer

# Create explainer
explainer = LimeTabularExplainer(
    training_data=X_train,
    feature_names=feature_names,
    class_names=['Deny', 'Approve'],
    mode='classification'
)

# Explain single prediction
explanation = explainer.explain_instance(
    X_test[0],
    model.predict_proba,
    num_features=10
)

# Show explanation
explanation.show_in_notebook()

# Get as text
print(explanation.as_list())

Step 3: Generate Human-Readable Explanations¶

Template for Citizen-Facing Explanations¶

def generate_citizen_explanation(prediction, shap_values, feature_names,
                                  feature_values, threshold=0.1):
    """
    Generate a human-readable explanation for a citizen.

    Returns explanation suitable for inclusion in a decision letter.
    """
    # Sort features by importance
    importance = list(zip(feature_names, shap_values, feature_values))
    importance.sort(key=lambda x: abs(x[1]), reverse=True)

    # Filter to significant factors
    significant = [(f, s, v) for f, s, v in importance if abs(s) > threshold]

    # Generate explanation
    lines = []
    lines.append("Factors considered in this decision:\n")

    for feature, shap_val, value in significant[:5]:
        # Translate feature names to plain English
        plain_name = FEATURE_TRANSLATIONS.get(feature, feature)

        if shap_val > 0:
            lines.append(f"• {plain_name}: Your {plain_name.lower()} ({value}) "
                        f"supported a positive outcome")
        else:
            lines.append(f"• {plain_name}: Your {plain_name.lower()} ({value}) "
                        f"was a factor against approval")

    lines.append("\nThis assessment was made using an automated system. "
                "If you believe this decision is incorrect, you have the right "
                "to request a human review.")

    return "\n".join(lines)

# Feature name translations
FEATURE_TRANSLATIONS = {
    'income_annual': 'Annual income',
    'employment_status': 'Employment status',
    'application_history': 'Previous applications',
    'residency_years': 'Years at current address',
}

Explanation Templates by Use Case¶

Eligibility Decision¶

Your application for [SERVICE] has been [APPROVED/DENIED].

Key factors in this decision:
• [Factor 1]: [Plain English explanation]
• [Factor 2]: [Plain English explanation]
• [Factor 3]: [Plain English explanation]

The automated assessment considered [N] factors in total.
[If denied]: You may request a review by contacting [CONTACT].

Risk Assessment¶

Risk Level: [LOW/MEDIUM/HIGH]

This assessment is based on:
• [Factor 1 - weight %]
• [Factor 2 - weight %]
• [Factor 3 - weight %]

Note: This is an initial assessment. A human reviewer will make
the final determination based on all available information.

Recommendation System¶

Recommended: [OPTION]

This recommendation considers:
• Your stated preference for [X]
• Similar users have found [Y] helpful
• [Z] is available in your area

This is a suggestion only. You may choose any available option.

Step 4: Document Model Behavior¶

Create a Model Card¶

Document how the model works:

## Model Behavior Summary

### What the model predicts
[Clear description of the prediction task]

### Key factors the model considers
1. [Factor 1] - [How it affects predictions]
2. [Factor 2] - [How it affects predictions]
3. [Factor 3] - [How it affects predictions]

### What the model does NOT consider
- [Explicitly excluded factors]

### Limitations
- [Known limitations]
- [Edge cases where model may be less accurate]

### Human oversight
[Description of human review process]

Step 5: Test Your Explanations¶

Explanation Quality Checklist¶

Accurate: Does the explanation reflect actual model behavior?
Complete: Are all significant factors included?
Comprehensible: Can a non-technical person understand it?
Consistent: Do similar cases get similar explanations?
Actionable: Can the person understand what would change the outcome?

User Testing¶

Test explanations with real users:

1. Show user the prediction and explanation
2. Ask: "Why do you think you received this outcome?"
3. Compare their understanding to actual factors
4. Ask: "What could you do differently?"
5. Verify their answer matches model behavior

Implementation Patterns¶

Pattern 1: Explanation API¶

class ExplainableModel:
    """Wrapper that adds explainability to any model."""

    def __init__(self, model, explainer_type='shap'):
        self.model = model
        self.explainer_type = explainer_type
        self.explainer = None

    def fit(self, X, y):
        self.model.fit(X, y)
        if self.explainer_type == 'shap':
            self.explainer = shap.TreeExplainer(self.model)
        return self

    def predict(self, X):
        return self.model.predict(X)

    def explain(self, X, format='technical'):
        """Generate explanation for predictions."""
        shap_values = self.explainer.shap_values(X)

        if format == 'technical':
            return shap_values
        elif format == 'citizen':
            return [generate_citizen_explanation(
                self.predict(x.reshape(1, -1))[0],
                sv,
                self.feature_names,
                x
            ) for x, sv in zip(X, shap_values)]

Pattern 2: Logging Explanations¶

import json
from datetime import datetime

def log_prediction_with_explanation(prediction_id, model, X, prediction,
                                    shap_values, user_id=None):
    """Log prediction with explanation for audit trail."""
    log_entry = {
        'prediction_id': prediction_id,
        'timestamp': datetime.now().isoformat(),
        'user_id': user_id,
        'input_features': X.tolist(),
        'prediction': prediction,
        'model_version': model.version,
        'explanation': {
            'shap_values': shap_values.tolist(),
            'feature_importance': dict(zip(
                model.feature_names,
                sorted(zip(model.feature_names, abs(shap_values)),
                       key=lambda x: x[1], reverse=True)
            ))
        }
    }

    # Store in audit log
    with open('prediction_log.jsonl', 'a') as f:
        f.write(json.dumps(log_entry) + '\n')

    return log_entry

Quick Reference¶

When to Use Each Technique¶

Technique	Best For	Avoid When
SHAP	Accurate importance, any model	Very large datasets
LIME	Quick local explanations	Needs global view
Decision Tree	Full transparency	Need high accuracy
Feature Importance	Quick overview	Need case-by-case explanation
Attention	NLP models	Non-transformer models

Explainability Checklist¶

Before Deployment: - [ ] Choose explainability approach - [ ] Implement explanation generation - [ ] Test explanations with users - [ ] Document model behavior - [ ] Create citizen-facing templates - [ ] Set up explanation logging

After Deployment: - [ ] Monitor explanation quality - [ ] Collect user feedback - [ ] Update explanations as model changes - [ ] Audit explanation accuracy

Resources¶

Libraries¶

SHAP: pip install shap
LIME: pip install lime
InterpretML: pip install interpret
ELI5: pip install eli5