The majority of organisations approaching AI adoption start in the wrong place. They start with the model, which AI tool to use, which vendor to partner with, what use cases to prioritise. This is understandable. The model is the exciting part. The data is the unglamorous part that nobody wants to fund or talk about. But getting the order wrong is why most AI projects deliver so little.
The order matters more than the technology
AI, whether you're using foundation models, building custom systems, or deploying off-the-shelf tools, depends entirely on data. The quality of your AI output is a direct function of the quality, completeness, and structure of your data. A world-class model with mediocre data produces mediocre results. A well-designed AI application with clean, consistent, well-governed data can produce remarkable results.
Most organisations don't have good data. They have data spread across disconnected systems, with inconsistent definitions, poor governance, and significant gaps. They've never had to confront this because the previous generation of reporting tools could be configured to work around these problems. AI can't. It exposes them immediately.
What data readiness actually means
Data readiness is not just about volume. Bigger datasets don't automatically produce better AI. What matters is:
- Completeness: Do you have the data that the AI use case actually requires? Missing fields, sparse records, and historical gaps all limit what's possible.
- Consistency: Is the same concept represented the same way across different systems? Customer IDs that don't match across databases, product categories that differ between sales and operations, dates in incompatible formats, all of these create problems that are surprisingly hard to fix at scale.
- Accuracy: Does your data actually reflect reality? Data that was accurate three years ago may not be now. Data that was accurate in one context may not be in another.
- Accessibility: Can the data actually be accessed by the systems that need it? Many organisations have valuable data locked in systems that don't expose APIs, or in formats that require significant transformation before use.
The four data problems that kill AI projects
After working through multiple AI implementations, the same failure patterns recur. Understanding them early saves significant time and money.
Siloed systems with no single truth. The most common problem: customer data in CRM, financial data in ERP, operational data in a separate system, and no reliable way to connect them. Every AI project that requires a unified view of the business, which is most of the valuable ones, hits this wall immediately.
Data that's collected but not structured for use. Organisations often have vast quantities of data they can't use because it was collected without thinking about downstream applications. Unstructured notes, inconsistent tagging, free-text fields where structured fields were needed. Cleaning this data is laborious and expensive.
Governance gaps that create legal and ethical risk. AI that uses personal data creates compliance obligations that many organisations haven't addressed. Understanding what data you hold, where it came from, whether you have the right to use it for AI, and how to manage it appropriately is a legal requirement, not just good practice.
Data pipelines that don't exist yet. Some AI use cases require data that simply isn't being collected. Before you can build the AI, you have to build the infrastructure to capture the data, which may take months and require changes to operational systems.
What building a data foundation looks like
A data strategy for AI doesn't need to be comprehensive from day one. It needs to be sequenced correctly. Start with the use cases you're planning to pursue first, and work backwards to understand what data those use cases require. This is a much more tractable problem than trying to fix all your data before starting.
The practical steps: audit what data you have and where it lives; identify the gaps relative to your priority AI use cases; establish clear data ownership and governance; build or procure the infrastructure to move, clean, and serve data to AI systems; and define the data quality standards the organisation will hold itself to.
How long it takes — and why that matters
This is where most AI roadmaps underestimate the timeline. Data foundation work, depending on the complexity of your systems landscape, takes months, not weeks. Organisations that haven't done this work and are promising AI outcomes in 60 days are either working on very narrow, low-risk use cases or setting themselves up for disappointment.
The honest planning assumption for a meaningful AI implementation is: 3–6 months of data foundation work before you can reliably build on top of it. That doesn't mean you can't run experiments or learn during that period. But it does mean that the AI use cases that will matter to the business will take longer than the vendor demos suggest.
Starting before you're fully ready
The data foundation work and the AI strategy work don't have to be sequential. The most effective approach is to run them in parallel: start scoping your priority use cases while simultaneously auditing your data. Use the use case scoping to inform which data problems to fix first. Use the data audit to inform which use cases are actually viable in a realistic timeframe.
The key is to be honest about what's a proof of concept and what's a production system. Proofs of concept can work with imperfect data. Production systems can't. Build the data foundation before you commit to production timelines, and include the data investment in your AI budget from the start.