Most organisations discover they have a data governance problem in one of three ways. A data breach that turns out to have been made worse by nobody knowing who owned the affected data set. An AI project that stalls because nobody can certify the training data is accurate or appropriately sourced. A regulatory audit that reveals different departments are reporting the same metric differently because there's no single authoritative definition.
In each case, the root cause is the same: data exists in the organisation, but nobody is formally accountable for it, nobody can vouch for its quality, and nobody has thought carefully about who should have access to it and under what conditions. Data governance is the set of practices that closes this gap.
The four things data governance actually does
It assigns ownership. Every significant data asset — a customer database, a transaction log, a model training set — should have a named owner. The data owner is accountable for the quality of the data, for ensuring it's used appropriately, and for managing access. Without a named owner, accountability is diffuse and data quality decays silently.
It defines what the data means. In most organisations, the same term means different things in different systems. "Active customer" in the CRM means something different from "active customer" in the finance system. "Revenue" in the board pack may include items excluded from the management accounts. A data dictionary — a shared, authoritative definition of what each significant data element means — is the foundation of reliable reporting. Without it, every meeting that discusses data can devolve into an argument about which system to believe.
It sets access controls. Not everyone should have access to everything. Data governance defines who can access which data, under what conditions, for what purposes. This matters for regulatory compliance — GDPR, the EU AI Act, sector regulations — and for commercial and operational security. The access control model doesn't need to be complex, but it needs to exist and be enforced.
It establishes quality standards. Data quality has multiple dimensions: accuracy, completeness, consistency, timeliness, and fitness for purpose. Data governance defines what quality means for each significant data asset, sets minimum standards, and creates a process for monitoring and remediating quality issues. This is particularly critical for AI, where data quality directly determines model quality.
"If your data governance exists only in a policy document, it doesn't exist. Governance is only real when it changes what people do day to day."
What good data governance is not
It's not a data catalogue project. Many organisations confuse data governance with building an inventory of their data assets. A catalogue is useful, but it's a tool — not a governance framework. You can have a detailed catalogue and still have no clarity about ownership, no agreed definitions, and no quality standards.
It's not primarily a technology problem. There are excellent data governance platforms — Collibra, Alation, Microsoft Purview, and others. These tools help at scale. But the organisational design questions — who owns which data, how disputes about definitions are resolved, how access decisions are made — are human and process questions. Technology implements governance decisions; it doesn't make them.
It's not a project with an end date. Data governance is an ongoing operating model, not a one-time implementation. The data landscape changes as systems change, as regulations change, and as the organisation's use of data evolves. Governance that was fit for purpose 18 months ago may not be fit for purpose today.
Where to start if you don't have it
The most effective starting point is not a comprehensive framework — it's a narrow, high-value problem. Choose the data domain that is causing the most pain: the inconsistent customer data that's causing reporting disagreements, the product data that's holding up the AI project, the financial data that's under regulatory scrutiny. Fix the governance for that domain first — assign an owner, agree definitions, establish access controls, set quality standards. Make it work in one place before you try to scale it.
This approach has two advantages. It produces a visible outcome quickly, which builds organisational credibility for the broader programme. And it surfaces the process and organisational design questions in a context where the stakes are understood — making it much easier to get the decisions made.
The AI dimension
If your organisation is serious about using AI, data governance is not optional. AI systems are trained on data and operated on data. The quality of that data directly determines the quality of the AI output. More critically, the provenance of that data — where it came from, whether it was collected with appropriate consent, whether it contains biases that will propagate into the model — is increasingly a regulatory matter under the EU AI Act and sector-specific regulation.
Organisations that build AI on ungoverned data are building on an unstable foundation. When the model behaves unexpectedly, or when a regulator asks to audit the training data, or when a data breach exposes training data that shouldn't have been used — the absence of governance becomes the story. Data governance, done properly, is the foundation on which trustworthy AI is built.
Working on data strategy or AI governance?
This is an area I help leadership teams navigate. Let's talk →
