Data Governance for AI
Template Download Template
Purpose: Establish data governance practices specific to AI and machine learning projects. Covers data ownership, quality standards, security controls, ethical use policies, and lifecycle management.
At a Glance
- When to use: Before AI development begins
- Key roles: Data Owner, Data Steward, Data Custodian
- Covers: Training data, production data, outputs, and logs
- Review frequency: Annually or on significant changes
AI Projects Need Enhanced Data Governance
AI amplifies data quality issues. Poor governance leads to biased models, unreliable predictions, and compliance violations. Invest in governance before you invest in models.
Document Control
| Field | Value |
| Project Name | |
| Version | 1.0 |
| Author | |
| Date | |
| Status | Draft / Under Review / Approved |
| Next Review Date | |
1. Data Governance Overview
1.1 Scope
AI System Name: [Name]
Data Domains Covered: - [ ] Training data - [ ] Validation data - [ ] Test data - [ ] Production input data - [ ] Model outputs/predictions - [ ] Feedback/ground truth data - [ ] Monitoring data - [ ] Audit logs
Data Sources: | Source Name | Source Type | Data Owner | Classification | |-------------|-------------|------------|----------------| | | Internal DB / External API / File / Stream | | | | | | | | | | | | |
1.2 Governance Objectives
| Objective | Priority | Success Measure |
| Data quality | High/Med/Low | |
| Data security | | |
| Privacy compliance | | |
| Ethical use | | |
| Accessibility | | |
| Auditability | | |
2. Roles and Responsibilities
2.1 Data Governance Roles
| Role | Name | Responsibilities |
| Data Owner | | Accountable for data quality and appropriate use; approves access |
| Data Steward | | Day-to-day data management; implements policies |
| Data Custodian | | Technical management; security controls |
| AI/ML Lead | | Ensures data fit for ML purposes |
| Privacy Officer | | Privacy compliance oversight |
| Security Officer | | Security control implementation |
| Ethics Lead | | Ethical use oversight |
2.2 RACI for Data Governance Activities
| Activity | Data Owner | Data Steward | Data Custodian | AI/ML Lead | Privacy | Security |
| Define data requirements | A | R | C | R | C | C |
| Approve data access | A | R | I | C | C | C |
| Implement security controls | I | C | R | C | C | A |
| Monitor data quality | A | R | C | R | I | I |
| Conduct privacy review | C | C | I | C | A | C |
| Document data lineage | I | R | C | R | I | I |
| Respond to data incidents | A | R | R | C | R | R |
R = Responsible, A = Accountable, C = Consulted, I = Informed
3. Data Inventory and Classification
3.1 Data Asset Register
| Dataset ID | Dataset Name | Description | Owner | Classification | PII? | Retention |
| DS-001 | | | | | Yes/No | |
| DS-002 | | | | | | |
| DS-003 | | | | | | |
3.2 Data Classification
Classification Scheme:
| Level | Description | Handling Requirements | Examples |
| OFFICIAL | General business data | Standard controls | Aggregated statistics |
| OFFICIAL: Sensitive | Sensitive business data | Enhanced controls | Personal information |
| PROTECTED | High-value government data | Strict controls | Sensitive PII, financial data |
| SECRET+ | National security data | Maximum controls | Classified information |
Dataset Classifications:
| Dataset | Classification | Justification | Reviewer | Date |
| | | | |
| | | | |
| Dataset | PI Categories | Sensitivity | Consent Basis | Collection Purpose |
| Names, DOB, Address, etc. | Low/Med/High | Consent/Contract/Legal | |
| | | | |
4. Data Quality Framework
4.1 Quality Dimensions
| Dimension | Definition | Measurement Method | Target | Current |
| Completeness | Required fields populated | % non-null values | 98% | |
| Accuracy | Data matches reality | Validation checks | 99% | |
| Consistency | Data consistent across sources | Cross-source validation | 100% | |
| Timeliness | Data current for use | Age of data | <24hrs | |
| Uniqueness | No unintended duplicates | Duplicate detection | 99.9% | |
| Validity | Data conforms to rules | Format/range validation | 99% | |
4.2 Quality Rules
| Rule ID | Dataset | Rule Description | Check Frequency | Action if Failed |
| QR-001 | | | Daily/Weekly/Monthly | |
| QR-002 | | | | |
| QR-003 | | | | |
4.3 ML-Specific Quality Requirements
| Requirement | Description | Measurement | Target | Status |
| Representativeness | Training data reflects production distribution | Distribution analysis | N/A | |
| Label quality | Labels are accurate and consistent | Label audit sampling | 99% accuracy | |
| Feature coverage | Features available for all records | Feature completeness | 95%+ | |
| Temporal validity | Data not stale for prediction | Data recency check | Per use case | |
| Bias indicators | Data not systematically biased | Bias analysis | Documented | |
4.4 Quality Monitoring
Monitoring Approach: - [ ] Automated quality checks in data pipeline - [ ] Regular quality reports to stakeholders - [ ] Alert thresholds for quality degradation - [ ] Remediation procedures documented
Quality Dashboard Location: [Link]
5. Data Lineage and Provenance
5.1 Data Flow Diagram
flowchart LR
subgraph SRC["<strong>Source Systems</strong>"]
S1[Source 1]
S2[Source 2]
S3[Source 3]
end
subgraph PROC["<strong>Processing</strong>"]
ETL[ETL/Pipeline]
DQ[Data Quality Checks]
end
subgraph AI["<strong>AI/ML System</strong>"]
FS[Feature Store]
MT[Model Training]
MR[Model Registry]
INF[Inference]
OUT[Predictions/Output]
end
S1 --> ETL
S2 --> ETL
S3 --> ETL
ETL --> DQ
ETL --> FS
FS --> MT
MT --> MR
FS --> INF
INF --> OUT
style SRC fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style PROC fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style AI fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
5.2 Lineage Documentation
| Output | Source(s) | Transformations | Pipeline | Last Updated |
| Feature: [name] | | | | |
| Model input | | | | |
| Prediction output | | | | |
| Transformation | Description | Business Rule | Owner | Version |
| | | | |
| | | | |
6. Data Access and Security
6.1 Access Control
Access Principles: - Least privilege access - Role-based access control (RBAC) - Just-in-time access for sensitive data - Regular access reviews
Access Levels:
| Level | Description | Approval Required | Review Frequency |
| Read | View data only | Data Steward | Quarterly |
| Write | Modify data | Data Owner | Monthly |
| Admin | Full control | Senior Executive | Monthly |
| ML Training | Use in model training | Data Owner + Ethics | Per project |
| Production | Use in live predictions | Data Owner + Security | Per deployment |
6.2 Access Request Process
flowchart LR
subgraph S1["1. Request submitted"]
RF[Request form]
end
subgraph S2["2. Steward review"]
CP[Check policy]
end
subgraph S3["3. Owner approval"]
RR[Risk review]
end
subgraph S4["4. Access granted"]
AL[Audit log]
end
S1 --> S2 --> S3 --> S4
style S1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style S2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style S3 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style S4 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
Access Request Requirements: - [ ] Business justification - [ ] Data to be accessed - [ ] Purpose of access - [ ] Duration required - [ ] Security clearance confirmation
6.3 Security Controls
| Control Type | Control | Implementation Status | Owner |
| Encryption | At rest | Implemented/Planned/N/A | |
| Encryption | In transit | | |
| Authentication | Multi-factor | | |
| Logging | Access logging | | |
| Monitoring | Anomaly detection | | |
| DLP | Data loss prevention | | |
6.4 Third-Party Access
| Third Party | Purpose | Data Shared | Controls | Agreement | Expiry |
| | | | | |
| | | | | |
7. Privacy and Ethics
7.1 Privacy Compliance
Applicable Legislation: - [ ] Privacy Act 1988 - [ ] Australian Privacy Principles (APPs) - [ ] Agency-specific privacy legislation - [ ] State/Territory privacy legislation
Privacy Impact Assessment:
| Requirement | Status | Reference |
| PIA completed | Yes/No/In Progress | [Link to PIA] |
| PIA reviewed | | |
| Privacy notice updated | | |
| Consent mechanisms | | |
7.2 Privacy by Design
| Principle | Implementation |
| Minimize collection | Only collect data necessary for purpose |
| Limit use | Use only for specified purpose |
| Limit disclosure | Restrict sharing to approved parties |
| Data quality | Keep data accurate and current |
| Security | Protect from unauthorized access |
| Transparency | Clear privacy notices |
| Access and correction | Enable individual access rights |
7.3 De-identification Requirements
| Dataset | De-identification Method | Verification | Re-identification Risk |
| Anonymization/Pseudonymization/Aggregation | | Low/Med/High |
| | | |
7.4 Ethical Data Use
| Principle | Commitment | Evidence |
| Lawful purpose | Data used only for lawful government purposes | |
| Fairness | Data use does not discriminate | Bias testing |
| Transparency | Data use is documented and explainable | Model cards |
| Proportionality | Data collection proportionate to purpose | PIA |
| Accountability | Clear ownership and oversight | This document |
8. Data Lifecycle Management
8.1 Lifecycle Stages
flowchart LR
subgraph C["<strong>Collection</strong>"]
C1[Consent]
C2[Purpose]
end
subgraph ST["<strong>Storage</strong>"]
ST1[Security]
ST2[Controls]
end
subgraph P["<strong>Processing</strong>"]
P1[Validation]
P2[Transform]
end
subgraph U["<strong>Use</strong>"]
U1[Access]
U2[Control]
end
subgraph R["<strong>Retention</strong>"]
R1[Archival]
R2[Policies]
end
subgraph D["<strong>Disposal</strong>"]
D1[Destruction]
D2[Verification]
end
C --> ST --> P --> U --> R --> D
style C fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style ST fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style P fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style U fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style R fill:#e0f2f1,stroke:#00796b,stroke-width:2px
style D fill:#eceff1,stroke:#607d8b,stroke-width:2px
8.2 Retention Schedule
| Dataset | Retention Period | Basis | Archive Location | Disposal Method |
| Training data | | Legal/Business/Regulatory | | |
| Model inputs | | | | |
| Predictions | | | | |
| Audit logs | 7 years | Regulatory | | |
8.3 Disposal Procedures
Secure Disposal Requirements: - [ ] Data catalogued before disposal - [ ] Approval obtained from Data Owner - [ ] Secure deletion method appropriate to classification - [ ] Disposal verified and documented - [ ] Backup/archive copies included
Disposal Methods by Classification:
| Classification | Disposal Method | Verification |
| OFFICIAL | Secure delete | Confirmation log |
| OFFICIAL: Sensitive | Cryptographic erase | Certificate |
| PROTECTED+ | Physical destruction | Witness + certificate |
9. Data Sharing
9.1 Sharing Principles
- Share data to improve services and outcomes
- Share only with appropriate controls
- Document all sharing arrangements
- Review sharing arrangements regularly
9.2 Sharing Agreements
| Partner | Agreement Type | Data Shared | Purpose | Review Date |
| MOU/Contract/DAT | | | |
| | | | |
9.3 Data Availability and Transparency Act (DATA)
If sharing under DATA scheme:
| Requirement | Status |
| Data sharing purpose identified | |
| Data sharing principles applied | |
| Data sharing agreement in place | |
| Accredited data service provider (if applicable) | |
10. Incident Management
10.1 Data Incident Types
| Type | Description | Severity | Response Time |
| Data breach | Unauthorized access/disclosure | Critical | Immediate |
| Data loss | Data unavailable or destroyed | High | 4 hours |
| Quality failure | Data quality below threshold | Medium | 24 hours |
| Access violation | Unauthorized access attempt | Medium | 24 hours |
| Bias detected | Unfair outcomes identified | High | 24 hours |
10.2 Incident Response
Immediate Response (Data Breach): 1. Contain the incident 2. Assess scope and impact 3. Notify Data Owner and Privacy Officer 4. Document incident details 5. Report to OAIC if notifiable breach (within 30 days)
Incident Register Location: [Link]
10.3 Notifiable Data Breach Scheme
| Question | Answer |
| Is personal information involved? | Yes/No |
| Is serious harm likely? | Yes/No |
| Remedial action taken? | |
| OAIC notification required? | Yes/No |
| Affected individuals notified? | Yes/No |
11. Compliance and Audit
11.1 Compliance Requirements
| Requirement | Source | Status | Evidence | Next Review |
| Privacy Act compliance | Legislation | | | |
| PSPF INFOSEC-3 | Policy | | | |
| ISM controls | Framework | | | |
| AI Ethics Framework | Policy | | | |
| Agency data policy | Internal | | | |
11.2 Audit Schedule
| Audit Type | Frequency | Last Completed | Next Due | Owner |
| Data quality audit | Quarterly | | | |
| Access review | Quarterly | | | |
| Security audit | Annual | | | |
| Privacy audit | Annual | | | |
| Ethics review | Annual | | | |
11.3 Audit Findings Tracker
| Finding ID | Date | Finding | Severity | Status | Due Date | Owner |
| | | High/Med/Low | Open/In Progress/Closed | | |
| | | | | | |
12. Governance Metrics
| KPI | Definition | Target | Current | Trend |
| Data quality score | Average across dimensions | 95% | | |
| Access request SLA | % processed in 5 days | 90% | | |
| Incident response | % resolved in SLA | 95% | | |
| Training completion | Staff trained in data governance | 100% | | |
| Audit findings closed | % closed within SLA | 90% | | |
12.2 Reporting
| Report | Audience | Frequency | Owner |
| Data quality dashboard | Operations | Real-time | Data Steward |
| Governance scorecard | Leadership | Monthly | Data Owner |
| Compliance report | Audit Committee | Quarterly | Privacy Officer |
| Incident summary | Executive | As needed | Security Officer |
13. Training and Awareness
13.1 Training Requirements
| Role | Training | Frequency | Status |
| All staff | Data governance basics | Annual | |
| Data handlers | Data handling procedures | Annual | |
| AI/ML team | ML data ethics | Annual | |
| Data stewards | Governance procedures | Quarterly | |
13.2 Awareness Activities
| Activity | Target Audience | Frequency |
| Data governance newsletter | All staff | Monthly |
| Privacy awareness campaign | All staff | Annual |
| Governance intranet updates | All staff | Ongoing |
14. Continuous Improvement
14.1 Improvement Process
- Collect feedback - From users, audits, incidents
- Analyze trends - Identify patterns and root causes
- Propose improvements - Document change proposals
- Implement changes - Update policies and procedures
- Monitor effectiveness - Measure improvement impact
14.2 Improvement Register
| ID | Improvement | Source | Status | Benefit | Owner |
| | Audit/Incident/Feedback | | | |
| | | | | |
15. Approvals
| Role | Name | Signature | Date |
| Data Owner | | | |
| AI/ML Lead | | | |
| Privacy Officer | | | |
| Security Officer | | | |
| Executive Sponsor | | | |
Appendices
Appendix A: Data Dictionary
[Link to data dictionary or attach]
Appendix B: Data Flow Diagrams
[Attach detailed data flow diagrams]
[Attach or link to access request form]
Appendix D: Incident Report Template
[Attach or link to incident report template]
Appendix E: Glossary
| Term | Definition |
| Data Owner | Senior person accountable for the data asset |
| Data Steward | Person responsible for day-to-day data management |
| Data Custodian | Person responsible for technical data storage and security |
| PII | Personally Identifiable Information |
| DATA | Data Availability and Transparency Act 2022 |