AI Readiness Checklist: 20 Questions Before Your First ML Project

The pattern is depressingly consistent. A CEO reads about AI transforming an industry. The CTO is tasked with launching a machine learning initiative. A data science team is hired. Six months and significant budget later, the pilot is struggling: the data is not ready, the infrastructure cannot support model training, the business team does not trust the outputs, and nobody established governance for model decisions. The initiative is quietly shelved, and the organization becomes cynical about AI.

This failure mode is not about the technology. Machine learning algorithms are mature, well-documented, and increasingly accessible. The failure is almost always about organizational readiness — the data foundations, infrastructure, talent, governance, and cultural conditions that must be in place before any ML project can succeed.

This checklist provides 20 practical questions, organized into five categories, that every organization should answer honestly before committing resources to a machine learning initiative. These are not theoretical — they are drawn from real patterns we see in organizations that succeed with AI and those that do not. If you cannot answer "yes" to at least 15 of these questions, you are not ready. That is not a failure. It is valuable information that tells you exactly where to invest before launching your first ML project.

For a structured, quantitative version of this assessment, see our Data & AI Readiness Framework.

Category 1: Data Readiness (Questions 1-5)

Data is the raw material of machine learning. Without sufficient, quality data in an accessible format, no algorithm — however sophisticated — will produce useful results.

Question 1: Do you have at least 12 months of historical data for your target use case?

Most ML models need historical data to learn patterns. For predictive use cases (churn prediction, demand forecasting, risk scoring), 12 months is a reasonable minimum — and 24 to 36 months is better, especially if your business has seasonal patterns. If your data only goes back 3 months, or if it is trapped in systems that were not designed for extraction, you have a data availability problem that must be solved first.

Red flag: "We have lots of data, but it is in 14 different systems and nobody has consolidated it." Availability is not just about existence — it is about accessibility.

Question 2: Is your data quality sufficient for model training?

ML models amplify the patterns in their training data — including the noise, biases, and errors. If your customer data has 25% duplicates, your churn model will learn from phantom customers. If your transaction amounts contain systematic rounding errors, your forecasting model will learn those errors as features.

Assess accuracy, completeness, and consistency for the specific datasets your model will use. If accuracy is below 90% or completeness below 85%, invest in data quality remediation before model development. The model can wait. Bad data cannot be fixed by better algorithms.

Question 3: Do you have clearly defined labels or outcomes for supervised learning?

Most business ML use cases are supervised learning problems: you need labeled examples to train the model. For churn prediction, you need a clear, consistent definition of what constitutes "churned." For fraud detection, you need labeled historical transactions (fraud vs. legitimate). For demand forecasting, you need reliable historical demand data — not just order data, which may exclude stockouts and lost sales.

The labeling problem is often underestimated. If your labels are noisy, inconsistent, or biased, the model will learn the wrong patterns. Spend time getting labels right before touching an algorithm.

Question 4: Can you access your data programmatically through APIs or structured exports?

If the only way to get data out of a system is through manual CSV exports, screenshot parsing, or asking someone in IT to run a query, you are not ready for ML. Machine learning requires automated, repeatable data pipelines that can extract, transform, and load data into a training environment without human intervention.

The minimum requirement is programmatic access to your source systems — either through APIs, database connections, or structured file exports that can be scheduled and automated.

Question 5: Do you have a data catalog or documentation that describes what each dataset contains?

Data scientists spend an estimated 60-80% of their time understanding and preparing data. A data catalog that documents dataset contents, business definitions, quality scores, and lineage dramatically reduces this overhead. Without documentation, your data science team will spend months reverse-engineering what tables mean, which fields are reliable, and how different datasets relate to each other.

If you do not have a catalog, create documentation for at least the datasets relevant to your first ML use case. This is not optional — it is foundational.

Category 2: Infrastructure Readiness (Questions 6-10)

ML workloads have different infrastructure requirements than traditional analytics. Training models requires compute capacity, experiment tracking, model versioning, and deployment pipelines that most organizations do not have.

Question 6: Do you have a compute environment that can handle model training workloads?

Training even a moderately complex model on a meaningful dataset requires more compute than a standard analyst laptop or shared database server can provide. You need either cloud-based compute resources (AWS SageMaker, Google Vertex AI, Azure ML) or on-premise GPU/CPU clusters dedicated to ML workloads.

The key question is whether your data science team can access sufficient compute without going through a multi-week procurement process every time they need to train a model.

Question 7: Do you have a development environment where data scientists can experiment safely?

Data scientists need a sandbox environment where they can experiment with data, test hypotheses, and iterate on models without risking production systems. This environment needs access to representative data (ideally production data with appropriate anonymization), version control for code and notebooks, and isolation from production workloads.

If your data scientists are experimenting on their laptops with data extracts they downloaded manually, you have an environment problem that will become a security, reproducibility, and collaboration problem.

Question 8: Do you have a path to deploy models into production?

Training a model is only half the job. The model needs to be deployed where business processes can consume its predictions — whether that is a real-time API, a batch scoring pipeline, or an embedded component in an existing application. This requires MLOps capabilities: model packaging, deployment automation, monitoring, and rollback procedures.

Many organizations discover this gap only after building a promising model. "We have a great model in a Jupyter notebook" is not deployment. Plan the production path before starting model development.

Question 9: Can you monitor model performance in production?

Models degrade over time as the real world changes. A fraud detection model trained on 2023 patterns may miss new fraud tactics in 2025. A demand forecasting model trained on pre-pandemic data will produce wildly inaccurate forecasts in a disrupted supply chain. You need monitoring infrastructure that tracks model accuracy, detects drift, and alerts when performance drops below acceptable thresholds.

If you cannot monitor, you cannot maintain. And an unmaintained model is a ticking time bomb — it will produce increasingly wrong predictions with decreasing visibility.

Question 10: Do you have version control for data, code, and models?

Reproducibility is essential in ML. If you cannot recreate the exact conditions under which a model was trained — the data version, the code version, the hyperparameters, the training environment — you cannot debug problems, reproduce results, or satisfy audit requirements. Version control for code (Git) is table stakes. Version control for data and models (DVC, MLflow, or equivalent) is increasingly essential.

Category 3: Talent Readiness (Questions 11-14)

Technology is only as effective as the people operating it. ML requires a specific combination of skills that many organizations lack.

Question 11: Do you have at least one experienced data scientist (or access to one)?

ML projects need someone who understands statistical modeling, feature engineering, model selection, and evaluation metrics — not just someone who can run a tutorial notebook. An experienced data scientist has built models that went into production, dealt with messy real-world data, and understands the difference between a model that performs well on test data and one that creates business value.

If you do not have this person in-house, consider engaging an experienced consulting partner for your first project. But plan to build internal capability — long-term dependence on external data science is not sustainable.

Question 12: Do you have data engineering capability to build and maintain ML data pipelines?

Data scientists build models. Data engineers build the pipelines that feed those models with production data and serve predictions back to business systems. Without data engineering, your data science team will spend 80% of their time on plumbing instead of modeling. This is the most common bottleneck we see in early ML programs.

The minimum viable team for an ML initiative is one data scientist and one data engineer. Trying to do both with one person creates an unsustainable workload.

Question 13: Does your business team include someone who can translate business problems into ML-solvable problems?

The gap between "we want to reduce churn" and "we need a binary classification model predicting 90-day churn probability using customer behavior features" is enormous. Someone needs to bridge that gap — translating business objectives into problem formulations that ML can address, and translating model outputs back into actionable business recommendations.

This person might be a technically-minded business analyst, a product manager with data experience, or a data scientist with strong business acumen. Without this translator, the data science team will build technically impressive models that solve the wrong problem.

Question 14: Is your team prepared for the iterative, experimental nature of ML development?

ML development is not like traditional software development. There is no clear specification at the start. The first model will likely be mediocre. Progress comes through iteration: try an approach, evaluate, learn, adjust. A team accustomed to waterfall delivery, fixed timelines, and guaranteed outcomes will be frustrated by this uncertainty.

Set expectations explicitly: the first ML project is a learning investment. Define success criteria in terms of learning outcomes ("we will know whether ML can meaningfully predict churn for our business") rather than performance guarantees ("the model will achieve 95% accuracy").

Category 4: Governance Readiness (Questions 15-18)

AI governance is not optional. Models make decisions that affect customers, employees, and business outcomes. Without governance, you are deploying automated decision-making with no oversight.

Question 15: Have you defined ethical guidelines for AI use in your organization?

Before building any model, your organization needs to articulate which use cases are acceptable, which data can be used for model training, and which decisions can be automated versus which require human oversight. This does not need to be a 50-page policy — a one-page set of principles, endorsed by leadership, is a strong start.

At minimum, address: fairness (will the model discriminate against protected groups?), transparency (can we explain why the model made a specific prediction?), accountability (who is responsible when the model makes a wrong decision?), and privacy (are we using personal data in ways that customers have consented to?).

Question 16: Do you have a process for validating model decisions before deployment?

No model should go directly from development to production without validation. Validation includes testing on holdout data, bias testing across demographic groups, stress testing with edge cases, and business validation where domain experts review a sample of model decisions for reasonableness.

Define a model validation checklist that every model must pass before deployment. This checklist should be approved by both the data science team and the business owner of the use case.

Question 17: Do you have a plan for handling model failures and escalations?

Models will make wrong predictions. What happens when they do? If a fraud detection model flags a legitimate high-value transaction, is there a process for rapid review and override? If a recommendation engine suggests inappropriate content, can it be corrected immediately? If a pricing model produces anomalous prices, does someone notice before customers are affected?

Define escalation procedures for each use case before deployment. The question is not whether the model will fail — it is whether your organization can detect and respond to failures before they cause damage.

Question 18: Can you explain to regulators and auditors how your models make decisions?

Depending on your industry, regulators may require model explainability. Financial services, healthcare, and insurance are particularly scrutinized. Even in unregulated industries, being able to explain model decisions builds trust with stakeholders and protects against reputational risk.

If you are building models in a regulated industry, ensure you have access to explainability tools (SHAP, LIME, or equivalent) and that your team understands how to generate and interpret explanations. This is not an afterthought — it is a deployment requirement.

Category 5: Culture and Organizational Readiness (Questions 19-20)

The most underrated dimension of AI readiness is culture. Technical capability without cultural readiness produces models that nobody uses.

Question 19: Does your leadership team genuinely understand what ML can and cannot do?

If your CEO expects ML to "solve" problems the way software solves problems — write the code, deploy, done — your project will be set up for disappointment. ML produces probabilistic outputs with error rates. It requires ongoing maintenance. It needs clean data that the organization may not have. It cannot replace human judgment in complex, contextual decisions.

Before launching an ML initiative, invest in leadership education. Not a two-hour vendor demo — a genuine session where leaders understand the mechanics, the limitations, the time horizons, and the organizational requirements. Leaders who understand these realities set appropriate expectations and provide the sustained support that ML programs need.

Question 20: Are the business teams who will use model outputs willing to change their decision-making processes?

The most technically brilliant model is worthless if the business team ignores its outputs and continues making decisions the way they always have. Adoption requires that business teams trust the model, understand how to interpret its outputs, and are willing to modify their workflows to incorporate predictions.

This is a change management challenge, not a technology challenge. Engage the business team from day one. Involve them in problem formulation, show them intermediate results, let them validate outputs, and co-design the workflow integration. Models built in isolation and thrown over the wall to the business team have an adoption rate near zero.

Scoring Your Readiness

Count your "yes" answers across all 20 questions. Here is our assessment guide:

16-20 yes: You are ready. Choose a well-scoped use case and start building. Focus your energy on execution rather than further preparation.

11-15 yes: You are conditionally ready. You can start an ML initiative, but identify the gaps and address them in parallel. Choose a forgiving first use case that does not depend on the areas where you are weakest.

6-10 yes: You are not ready for ML but you are ready for a focused readiness program. Invest 6 to 12 months in building the foundations — data quality, infrastructure, talent, governance — before committing to an ML project. This is not wasted time. It is the investment that makes future ML projects successful.

0-5 yes: You need foundational data capabilities before thinking about ML. Focus on basic data management: consolidate key datasets, establish quality baselines, build a small data team, and create a data-informed culture. ML is a future-state aspiration, not a near-term initiative.

The Most Common Readiness Gaps

Across the organizations we work with, certain gaps appear disproportionately:

Data quality (Questions 2-3): The single most common blocker. Organizations consistently overestimate the quality and usability of their data. The gap between "we have data" and "we have ML-ready data" is almost always larger than expected.

MLOps and deployment (Questions 8-9): Many organizations can train a model but cannot deploy it to production or monitor its performance. The "last mile" problem is real and chronically underinvested.

Business translation (Question 13): The gap between business problems and ML problem formulations is a persistent blind spot. Organizations hire data scientists but forget to create the connective tissue between data science and business domains.

Cultural readiness (Questions 19-20): Leadership expectations are often misaligned with ML realities, and business teams are often not prepared to change their processes. This is the gap that derails the most projects — not because the technology fails, but because the organization is not ready to use it.

From Checklist to Action

This checklist is a diagnostic tool, not a destination. Its value lies in identifying specific, actionable gaps that you can address before investing in ML. Use it to build a readiness roadmap that sequences foundational investments — data quality, infrastructure, talent, governance — in a way that progressively unlocks your ability to execute ML initiatives successfully.

The organizations that succeed with AI are not the ones that start first. They are the ones that start ready. Readiness is not about perfection — it is about having sufficient foundations in place so that when you invest in ML, the investment has a credible path to production value rather than becoming another abandoned pilot.

Take the assessment honestly. Address the gaps systematically. And when you do start building, start with a use case that is well-scoped, well-supported, and aligned with a genuine business need. That first success — even if modest — builds the organizational confidence and capability that makes every subsequent AI initiative easier.