Skip to content

Data Governance for AI

Template Download Template

Purpose: Establish data governance practices specific to AI and machine learning projects. Covers data ownership, quality standards, security controls, ethical use policies, and lifecycle management.
At a Glance
  • When to use: Before AI development begins
  • Key roles: Data Owner, Data Steward, Data Custodian
  • Covers: Training data, production data, outputs, and logs
  • Review frequency: Annually or on significant changes

AI Projects Need Enhanced Data Governance

AI amplifies data quality issues. Poor governance leads to biased models, unreliable predictions, and compliance violations. Invest in governance before you invest in models.


Document Control

Field Value
Project Name
Version 1.0
Author
Date
Status Draft / Under Review / Approved
Next Review Date

1. Data Governance Overview

1.1 Scope

AI System Name: [Name]

Data Domains Covered: - [ ] Training data - [ ] Validation data - [ ] Test data - [ ] Production input data - [ ] Model outputs/predictions - [ ] Feedback/ground truth data - [ ] Monitoring data - [ ] Audit logs

Data Sources: | Source Name | Source Type | Data Owner | Classification | |-------------|-------------|------------|----------------| | | Internal DB / External API / File / Stream | | | | | | | | | | | | |

1.2 Governance Objectives

Objective Priority Success Measure
Data quality High/Med/Low
Data security
Privacy compliance
Ethical use
Accessibility
Auditability

2. Roles and Responsibilities

2.1 Data Governance Roles

Role Name Responsibilities
Data Owner Accountable for data quality and appropriate use; approves access
Data Steward Day-to-day data management; implements policies
Data Custodian Technical management; security controls
AI/ML Lead Ensures data fit for ML purposes
Privacy Officer Privacy compliance oversight
Security Officer Security control implementation
Ethics Lead Ethical use oversight

2.2 RACI for Data Governance Activities

Activity Data Owner Data Steward Data Custodian AI/ML Lead Privacy Security
Define data requirements A R C R C C
Approve data access A R I C C C
Implement security controls I C R C C A
Monitor data quality A R C R I I
Conduct privacy review C C I C A C
Document data lineage I R C R I I
Respond to data incidents A R R C R R

R = Responsible, A = Accountable, C = Consulted, I = Informed


3. Data Inventory and Classification

3.1 Data Asset Register

Dataset ID Dataset Name Description Owner Classification PII? Retention
DS-001 Yes/No
DS-002
DS-003

3.2 Data Classification

Classification Scheme:

Level Description Handling Requirements Examples
OFFICIAL General business data Standard controls Aggregated statistics
OFFICIAL: Sensitive Sensitive business data Enhanced controls Personal information
PROTECTED High-value government data Strict controls Sensitive PII, financial data
SECRET+ National security data Maximum controls Classified information

Dataset Classifications:

Dataset Classification Justification Reviewer Date

3.3 Personal Information Inventory

Dataset PI Categories Sensitivity Consent Basis Collection Purpose
Names, DOB, Address, etc. Low/Med/High Consent/Contract/Legal

4. Data Quality Framework

4.1 Quality Dimensions

Dimension Definition Measurement Method Target Current
Completeness Required fields populated % non-null values 98%
Accuracy Data matches reality Validation checks 99%
Consistency Data consistent across sources Cross-source validation 100%
Timeliness Data current for use Age of data <24hrs
Uniqueness No unintended duplicates Duplicate detection 99.9%
Validity Data conforms to rules Format/range validation 99%

4.2 Quality Rules

Rule ID Dataset Rule Description Check Frequency Action if Failed
QR-001 Daily/Weekly/Monthly
QR-002
QR-003

4.3 ML-Specific Quality Requirements

Requirement Description Measurement Target Status
Representativeness Training data reflects production distribution Distribution analysis N/A
Label quality Labels are accurate and consistent Label audit sampling 99% accuracy
Feature coverage Features available for all records Feature completeness 95%+
Temporal validity Data not stale for prediction Data recency check Per use case
Bias indicators Data not systematically biased Bias analysis Documented

4.4 Quality Monitoring

Monitoring Approach: - [ ] Automated quality checks in data pipeline - [ ] Regular quality reports to stakeholders - [ ] Alert thresholds for quality degradation - [ ] Remediation procedures documented

Quality Dashboard Location: [Link]


5. Data Lineage and Provenance

5.1 Data Flow Diagram

flowchart LR
    subgraph SRC["<strong>Source Systems</strong>"]
        S1[Source 1]
        S2[Source 2]
        S3[Source 3]
    end

    subgraph PROC["<strong>Processing</strong>"]
        ETL[ETL/Pipeline]
        DQ[Data Quality Checks]
    end

    subgraph AI["<strong>AI/ML System</strong>"]
        FS[Feature Store]
        MT[Model Training]
        MR[Model Registry]
        INF[Inference]
        OUT[Predictions/Output]
    end

    S1 --> ETL
    S2 --> ETL
    S3 --> ETL
    ETL --> DQ
    ETL --> FS
    FS --> MT
    MT --> MR
    FS --> INF
    INF --> OUT

    style SRC fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style PROC fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style AI fill:#e8f5e9,stroke:#388e3c,stroke-width:2px

5.2 Lineage Documentation

Output Source(s) Transformations Pipeline Last Updated
Feature: [name]
Model input
Prediction output

5.3 Transformation Documentation

Transformation Description Business Rule Owner Version

6. Data Access and Security

6.1 Access Control

Access Principles: - Least privilege access - Role-based access control (RBAC) - Just-in-time access for sensitive data - Regular access reviews

Access Levels:

Level Description Approval Required Review Frequency
Read View data only Data Steward Quarterly
Write Modify data Data Owner Monthly
Admin Full control Senior Executive Monthly
ML Training Use in model training Data Owner + Ethics Per project
Production Use in live predictions Data Owner + Security Per deployment

6.2 Access Request Process

flowchart LR
    subgraph S1["1. Request submitted"]
        RF[Request form]
    end
    subgraph S2["2. Steward review"]
        CP[Check policy]
    end
    subgraph S3["3. Owner approval"]
        RR[Risk review]
    end
    subgraph S4["4. Access granted"]
        AL[Audit log]
    end

    S1 --> S2 --> S3 --> S4

    style S1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style S2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style S3 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style S4 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px

Access Request Requirements: - [ ] Business justification - [ ] Data to be accessed - [ ] Purpose of access - [ ] Duration required - [ ] Security clearance confirmation

6.3 Security Controls

Control Type Control Implementation Status Owner
Encryption At rest Implemented/Planned/N/A
Encryption In transit
Authentication Multi-factor
Logging Access logging
Monitoring Anomaly detection
DLP Data loss prevention

6.4 Third-Party Access

Third Party Purpose Data Shared Controls Agreement Expiry

7. Privacy and Ethics

7.1 Privacy Compliance

Applicable Legislation: - [ ] Privacy Act 1988 - [ ] Australian Privacy Principles (APPs) - [ ] Agency-specific privacy legislation - [ ] State/Territory privacy legislation

Privacy Impact Assessment:

Requirement Status Reference
PIA completed Yes/No/In Progress [Link to PIA]
PIA reviewed
Privacy notice updated
Consent mechanisms

7.2 Privacy by Design

Principle Implementation
Minimize collection Only collect data necessary for purpose
Limit use Use only for specified purpose
Limit disclosure Restrict sharing to approved parties
Data quality Keep data accurate and current
Security Protect from unauthorized access
Transparency Clear privacy notices
Access and correction Enable individual access rights

7.3 De-identification Requirements

Dataset De-identification Method Verification Re-identification Risk
Anonymization/Pseudonymization/Aggregation Low/Med/High

7.4 Ethical Data Use

Principle Commitment Evidence
Lawful purpose Data used only for lawful government purposes
Fairness Data use does not discriminate Bias testing
Transparency Data use is documented and explainable Model cards
Proportionality Data collection proportionate to purpose PIA
Accountability Clear ownership and oversight This document

8. Data Lifecycle Management

8.1 Lifecycle Stages

flowchart LR
    subgraph C["<strong>Collection</strong>"]
        C1[Consent]
        C2[Purpose]
    end
    subgraph ST["<strong>Storage</strong>"]
        ST1[Security]
        ST2[Controls]
    end
    subgraph P["<strong>Processing</strong>"]
        P1[Validation]
        P2[Transform]
    end
    subgraph U["<strong>Use</strong>"]
        U1[Access]
        U2[Control]
    end
    subgraph R["<strong>Retention</strong>"]
        R1[Archival]
        R2[Policies]
    end
    subgraph D["<strong>Disposal</strong>"]
        D1[Destruction]
        D2[Verification]
    end

    C --> ST --> P --> U --> R --> D

    style C fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style ST fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style P fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style U fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style R fill:#e0f2f1,stroke:#00796b,stroke-width:2px
    style D fill:#eceff1,stroke:#607d8b,stroke-width:2px

8.2 Retention Schedule

Dataset Retention Period Basis Archive Location Disposal Method
Training data Legal/Business/Regulatory
Model inputs
Predictions
Audit logs 7 years Regulatory

8.3 Disposal Procedures

Secure Disposal Requirements: - [ ] Data catalogued before disposal - [ ] Approval obtained from Data Owner - [ ] Secure deletion method appropriate to classification - [ ] Disposal verified and documented - [ ] Backup/archive copies included

Disposal Methods by Classification:

Classification Disposal Method Verification
OFFICIAL Secure delete Confirmation log
OFFICIAL: Sensitive Cryptographic erase Certificate
PROTECTED+ Physical destruction Witness + certificate

9. Data Sharing

9.1 Sharing Principles

  • Share data to improve services and outcomes
  • Share only with appropriate controls
  • Document all sharing arrangements
  • Review sharing arrangements regularly

9.2 Sharing Agreements

Partner Agreement Type Data Shared Purpose Review Date
MOU/Contract/DAT

9.3 Data Availability and Transparency Act (DATA)

If sharing under DATA scheme:

Requirement Status
Data sharing purpose identified
Data sharing principles applied
Data sharing agreement in place
Accredited data service provider (if applicable)

10. Incident Management

10.1 Data Incident Types

Type Description Severity Response Time
Data breach Unauthorized access/disclosure Critical Immediate
Data loss Data unavailable or destroyed High 4 hours
Quality failure Data quality below threshold Medium 24 hours
Access violation Unauthorized access attempt Medium 24 hours
Bias detected Unfair outcomes identified High 24 hours

10.2 Incident Response

Immediate Response (Data Breach): 1. Contain the incident 2. Assess scope and impact 3. Notify Data Owner and Privacy Officer 4. Document incident details 5. Report to OAIC if notifiable breach (within 30 days)

Incident Register Location: [Link]

10.3 Notifiable Data Breach Scheme

Question Answer
Is personal information involved? Yes/No
Is serious harm likely? Yes/No
Remedial action taken?
OAIC notification required? Yes/No
Affected individuals notified? Yes/No

11. Compliance and Audit

11.1 Compliance Requirements

Requirement Source Status Evidence Next Review
Privacy Act compliance Legislation
PSPF INFOSEC-3 Policy
ISM controls Framework
AI Ethics Framework Policy
Agency data policy Internal

11.2 Audit Schedule

Audit Type Frequency Last Completed Next Due Owner
Data quality audit Quarterly
Access review Quarterly
Security audit Annual
Privacy audit Annual
Ethics review Annual

11.3 Audit Findings Tracker

Finding ID Date Finding Severity Status Due Date Owner
High/Med/Low Open/In Progress/Closed

12. Governance Metrics

12.1 Key Performance Indicators

KPI Definition Target Current Trend
Data quality score Average across dimensions 95%
Access request SLA % processed in 5 days 90%
Incident response % resolved in SLA 95%
Training completion Staff trained in data governance 100%
Audit findings closed % closed within SLA 90%

12.2 Reporting

Report Audience Frequency Owner
Data quality dashboard Operations Real-time Data Steward
Governance scorecard Leadership Monthly Data Owner
Compliance report Audit Committee Quarterly Privacy Officer
Incident summary Executive As needed Security Officer

13. Training and Awareness

13.1 Training Requirements

Role Training Frequency Status
All staff Data governance basics Annual
Data handlers Data handling procedures Annual
AI/ML team ML data ethics Annual
Data stewards Governance procedures Quarterly

13.2 Awareness Activities

Activity Target Audience Frequency
Data governance newsletter All staff Monthly
Privacy awareness campaign All staff Annual
Governance intranet updates All staff Ongoing

14. Continuous Improvement

14.1 Improvement Process

  1. Collect feedback - From users, audits, incidents
  2. Analyze trends - Identify patterns and root causes
  3. Propose improvements - Document change proposals
  4. Implement changes - Update policies and procedures
  5. Monitor effectiveness - Measure improvement impact

14.2 Improvement Register

ID Improvement Source Status Benefit Owner
Audit/Incident/Feedback

15. Approvals

Role Name Signature Date
Data Owner
AI/ML Lead
Privacy Officer
Security Officer
Executive Sponsor

Appendices

Appendix A: Data Dictionary

[Link to data dictionary or attach]

Appendix B: Data Flow Diagrams

[Attach detailed data flow diagrams]

Appendix C: Access Request Form

[Attach or link to access request form]

Appendix D: Incident Report Template

[Attach or link to incident report template]

Appendix E: Glossary

Term Definition
Data Owner Senior person accountable for the data asset
Data Steward Person responsible for day-to-day data management
Data Custodian Person responsible for technical data storage and security
PII Personally Identifiable Information
DATA Data Availability and Transparency Act 2022