How to Implement AI Explainability¶
Ready to Use
- Legal basis: Administrative law, FOI, natural justice require transparency
- Three levels: Global (model behavior), Local (individual decisions), Feature importance
- Audience matters: Technical vs. citizen-facing explanations differ
- Technique depends on model: Interpretable models vs. black-box with SHAP/LIME
Purpose¶
This guide provides practical approaches for making AI systems explainable, a critical requirement for government AI that affects citizens and requires accountability.
Why Explainability Matters¶
Legal and Policy Requirements¶
- Administrative law: Citizens have a right to understand decisions affecting them
- Freedom of Information: Agencies must explain decision-making processes
- Natural justice: Procedural fairness requires transparency
- AI Ethics Framework: Australian Government requires explainable AI
Practical Benefits¶
- Builds public trust
- Enables error detection and correction
- Supports appeals and reviews
- Helps identify bias
- Improves model debugging
Explainability Levels¶
Level 1: Global Explainability¶
Explains the overall model behavior - what the model has learned.
Use when: - Documenting model for stakeholders - Audit and compliance reporting - Understanding general model behavior
Level 2: Local Explainability¶
Explains individual predictions - why this specific decision was made.
Use when: - Citizen-facing explanations - Case review and appeals - Debugging specific errors
Level 3: Feature Importance¶
Identifies which inputs matter most.
Use when: - Model validation - Identifying key decision factors - Simplifying explanations
Approach by Model Type¶
Inherently Interpretable Models¶
Some models are explainable by design:
| Model | Explainability | Pros | Cons |
|---|---|---|---|
| Decision Trees | Very High | Visual rules | May overfit |
| Logistic Regression | High | Coefficient interpretation | Linear only |
| Rule-Based Systems | Very High | Clear logic | Manual maintenance |
| Scoring Models | High | Point-based | Limited complexity |
Recommendation: For high-stakes government decisions, prefer inherently interpretable models when accuracy requirements allow.
Post-Hoc Explainability for Complex Models¶
When using neural networks, ensemble models, or other complex models:
| Technique | Type | Works With | Output |
|---|---|---|---|
| SHAP | Local/Global | Any model | Feature contributions |
| LIME | Local | Any model | Local approximation |
| Permutation Importance | Global | Any model | Feature ranking |
| Partial Dependence Plots | Global | Any model | Feature effects |
| Attention Visualization | Local | Transformers | Token importance |
Step 1: Choose Your Approach¶
Decision Framework¶
flowchart TB
START([Start]) --> Q1{Accuracy requirement<br/>< 90%?}
Q1 -->|Yes| INT["<strong>Interpretable Model</strong><br/>Decision Tree, Logistic Regression<br/>Built-in explainability"]
Q1 -->|No| Q2{Is model a<br/>black box?}
Q2 -->|No - Tree-based| TREE["<strong>TreeExplainer</strong><br/>SHAP<br/>Feature importance"]
Q2 -->|Yes - Neural network| Q3{How many predictions<br/>need explaining?}
Q3 -->|Individual cases| LIME["<strong>LIME or KernelSHAP</strong><br/>Local explanations"]
Q3 -->|All predictions| KERNEL["<strong>KernelSHAP</strong><br/>or Surrogate Model"]
style START fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style INT fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
style TREE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style LIME fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style KERNEL fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px Step 2: Implement Explainability¶
For Decision Trees¶
from sklearn.tree import DecisionTreeClassifier, export_text
# Train model
model = DecisionTreeClassifier(max_depth=5)
model.fit(X_train, y_train)
# Generate text explanation
tree_rules = export_text(model, feature_names=feature_names)
print(tree_rules)
# Visualize
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 10))
plot_tree(model, feature_names=feature_names, class_names=['Deny', 'Approve'],
filled=True, rounded=True)
plt.savefig('decision_tree.png', dpi=150, bbox_inches='tight')
For Logistic Regression¶
from sklearn.linear_model import LogisticRegression
import pandas as pd
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
# Get feature coefficients
coef_df = pd.DataFrame({
'feature': feature_names,
'coefficient': model.coef_[0],
'odds_ratio': np.exp(model.coef_[0])
})
coef_df = coef_df.sort_values('coefficient', key=abs, ascending=False)
print(coef_df)
# Explain individual prediction
def explain_prediction(model, X_instance, feature_names):
"""Generate human-readable explanation for logistic regression."""
contributions = X_instance * model.coef_[0]
baseline = model.intercept_[0]
explanation = f"Starting score: {baseline:.2f}\n"
for feature, contrib in sorted(zip(feature_names, contributions),
key=lambda x: abs(x[1]), reverse=True)[:5]:
direction = "increases" if contrib > 0 else "decreases"
explanation += f" {feature} {direction} score by {abs(contrib):.2f}\n"
final_score = baseline + contributions.sum()
probability = 1 / (1 + np.exp(-final_score))
explanation += f"Final probability: {probability:.2%}"
return explanation
Using SHAP (SHapley Additive exPlanations)¶
import shap
# For tree-based models (fast)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# For any model (slower)
explainer = shap.KernelExplainer(model.predict_proba, X_train[:100])
shap_values = explainer.shap_values(X_test[:10])
# Global explanation: Feature importance
shap.summary_plot(shap_values, X_test, feature_names=feature_names)
# Local explanation: Single prediction
shap.force_plot(explainer.expected_value, shap_values[0], X_test[0],
feature_names=feature_names)
# Waterfall plot for single prediction
shap.plots.waterfall(shap_values[0])
Using LIME (Local Interpretable Model-agnostic Explanations)¶
from lime.lime_tabular import LimeTabularExplainer
# Create explainer
explainer = LimeTabularExplainer(
training_data=X_train,
feature_names=feature_names,
class_names=['Deny', 'Approve'],
mode='classification'
)
# Explain single prediction
explanation = explainer.explain_instance(
X_test[0],
model.predict_proba,
num_features=10
)
# Show explanation
explanation.show_in_notebook()
# Get as text
print(explanation.as_list())
Step 3: Generate Human-Readable Explanations¶
Template for Citizen-Facing Explanations¶
def generate_citizen_explanation(prediction, shap_values, feature_names,
feature_values, threshold=0.1):
"""
Generate a human-readable explanation for a citizen.
Returns explanation suitable for inclusion in a decision letter.
"""
# Sort features by importance
importance = list(zip(feature_names, shap_values, feature_values))
importance.sort(key=lambda x: abs(x[1]), reverse=True)
# Filter to significant factors
significant = [(f, s, v) for f, s, v in importance if abs(s) > threshold]
# Generate explanation
lines = []
lines.append("Factors considered in this decision:\n")
for feature, shap_val, value in significant[:5]:
# Translate feature names to plain English
plain_name = FEATURE_TRANSLATIONS.get(feature, feature)
if shap_val > 0:
lines.append(f"• {plain_name}: Your {plain_name.lower()} ({value}) "
f"supported a positive outcome")
else:
lines.append(f"• {plain_name}: Your {plain_name.lower()} ({value}) "
f"was a factor against approval")
lines.append("\nThis assessment was made using an automated system. "
"If you believe this decision is incorrect, you have the right "
"to request a human review.")
return "\n".join(lines)
# Feature name translations
FEATURE_TRANSLATIONS = {
'income_annual': 'Annual income',
'employment_status': 'Employment status',
'application_history': 'Previous applications',
'residency_years': 'Years at current address',
}
Explanation Templates by Use Case¶
Eligibility Decision¶
Your application for [SERVICE] has been [APPROVED/DENIED].
Key factors in this decision:
• [Factor 1]: [Plain English explanation]
• [Factor 2]: [Plain English explanation]
• [Factor 3]: [Plain English explanation]
The automated assessment considered [N] factors in total.
[If denied]: You may request a review by contacting [CONTACT].
Risk Assessment¶
Risk Level: [LOW/MEDIUM/HIGH]
This assessment is based on:
• [Factor 1 - weight %]
• [Factor 2 - weight %]
• [Factor 3 - weight %]
Note: This is an initial assessment. A human reviewer will make
the final determination based on all available information.
Recommendation System¶
Recommended: [OPTION]
This recommendation considers:
• Your stated preference for [X]
• Similar users have found [Y] helpful
• [Z] is available in your area
This is a suggestion only. You may choose any available option.
Step 4: Document Model Behavior¶
Create a Model Card¶
Document how the model works:
## Model Behavior Summary
### What the model predicts
[Clear description of the prediction task]
### Key factors the model considers
1. [Factor 1] - [How it affects predictions]
2. [Factor 2] - [How it affects predictions]
3. [Factor 3] - [How it affects predictions]
### What the model does NOT consider
- [Explicitly excluded factors]
### Limitations
- [Known limitations]
- [Edge cases where model may be less accurate]
### Human oversight
[Description of human review process]
Step 5: Test Your Explanations¶
Explanation Quality Checklist¶
- Accurate: Does the explanation reflect actual model behavior?
- Complete: Are all significant factors included?
- Comprehensible: Can a non-technical person understand it?
- Consistent: Do similar cases get similar explanations?
- Actionable: Can the person understand what would change the outcome?
User Testing¶
Test explanations with real users:
1. Show user the prediction and explanation
2. Ask: "Why do you think you received this outcome?"
3. Compare their understanding to actual factors
4. Ask: "What could you do differently?"
5. Verify their answer matches model behavior
Implementation Patterns¶
Pattern 1: Explanation API¶
class ExplainableModel:
"""Wrapper that adds explainability to any model."""
def __init__(self, model, explainer_type='shap'):
self.model = model
self.explainer_type = explainer_type
self.explainer = None
def fit(self, X, y):
self.model.fit(X, y)
if self.explainer_type == 'shap':
self.explainer = shap.TreeExplainer(self.model)
return self
def predict(self, X):
return self.model.predict(X)
def explain(self, X, format='technical'):
"""Generate explanation for predictions."""
shap_values = self.explainer.shap_values(X)
if format == 'technical':
return shap_values
elif format == 'citizen':
return [generate_citizen_explanation(
self.predict(x.reshape(1, -1))[0],
sv,
self.feature_names,
x
) for x, sv in zip(X, shap_values)]
Pattern 2: Logging Explanations¶
import json
from datetime import datetime
def log_prediction_with_explanation(prediction_id, model, X, prediction,
shap_values, user_id=None):
"""Log prediction with explanation for audit trail."""
log_entry = {
'prediction_id': prediction_id,
'timestamp': datetime.now().isoformat(),
'user_id': user_id,
'input_features': X.tolist(),
'prediction': prediction,
'model_version': model.version,
'explanation': {
'shap_values': shap_values.tolist(),
'feature_importance': dict(zip(
model.feature_names,
sorted(zip(model.feature_names, abs(shap_values)),
key=lambda x: x[1], reverse=True)
))
}
}
# Store in audit log
with open('prediction_log.jsonl', 'a') as f:
f.write(json.dumps(log_entry) + '\n')
return log_entry
Quick Reference¶
When to Use Each Technique¶
| Technique | Best For | Avoid When |
|---|---|---|
| SHAP | Accurate importance, any model | Very large datasets |
| LIME | Quick local explanations | Needs global view |
| Decision Tree | Full transparency | Need high accuracy |
| Feature Importance | Quick overview | Need case-by-case explanation |
| Attention | NLP models | Non-transformer models |
Explainability Checklist¶
Before Deployment: - [ ] Choose explainability approach - [ ] Implement explanation generation - [ ] Test explanations with users - [ ] Document model behavior - [ ] Create citizen-facing templates - [ ] Set up explanation logging
After Deployment: - [ ] Monitor explanation quality - [ ] Collect user feedback - [ ] Update explanations as model changes - [ ] Audit explanation accuracy
Resources¶
Libraries¶
- SHAP:
pip install shap - LIME:
pip install lime - InterpretML:
pip install interpret - ELI5:
pip install eli5
Further Reading¶
- Christoph Molnar's "Interpretable Machine Learning" (online book)
- OECD Principles on AI Transparency
- Australian Government AI Ethics Framework