The NAIC AI Evaluation Tool Is Live in 12 States. Here's What It Actually Asks.

AI GovernanceLast updated on
The NAIC AI Evaluation Tool Is Live in 12 States. Here's What It Actually Asks.

On March 2, 2026, the NAIC launched its AI Systems Evaluation Tool pilot across 12 states. The insurance industry had formally objected. Trade groups argued the program was "voluntary for regulators while compulsory for companies": states could opt in or out, but carriers in participating states had no choice. Regulators heard the objections, released a revised version of the tool, and launched the pilot on schedule.

The pilot is now active. States are sending inquiries to carriers and holding monthly coordination calls. If your carrier writes business in California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, or Wisconsin, an inquiry from the evaluation tool may already be in transit.

What the Tool Actually Asks

The Big Data and Artificial Intelligence (H) Working Group developed the AI Systems Evaluation Tool to give examiners a standardized framework for assessing how carriers deploy AI. It's organized into four exhibits, each progressively more detailed.

Exhibit A: AI Usage Inventory. Pilot states are focusing here first. Exhibit A asks carriers to quantify how extensively they use AI: how many systems, in which functions, affecting which decisions. Before regulators can assess governance quality, they need a map of the territory. A carrier with a current model registry can respond to Exhibit A in days. Without one, the exercise gets complicated fast, because "AI systems" includes vendor-embedded models, automated decision components inside larger platforms, and machine learning features that nobody in the organization categorizes as AI.

Exhibit B: Governance Risk Assessment. How is AI governed inside your organization? Exhibit B examines oversight structures, risk management policies, accountability chains, and documentation practices. For carriers who built governance infrastructure around their AI deployments, this is a documentation exercise. A policy document and a quarterly committee meeting, by contrast, are not governance infrastructure, and Exhibit B will make that distinction painfully clear.

Exhibit C: High-Risk AI System Details. This is where the specificity requirement jumps. Exhibit C focuses on AI systems that regulators classify as high-risk: claims decisions, underwriting, pricing, fraud detection. For each high-risk system, carriers need detailed documentation on model design, training data, validation procedures, performance metrics, and bias testing results.

Exhibit D: AI Data Details. What data feeds your AI, and where does it come from? Exhibit D examines data sources, quality controls, representativeness, and potential for proxy discrimination. The NAIC is particularly focused on data used for rate setting, screening for proxies for race and ethnicity, social media data, and aerial imagery that may correlate with protected characteristics.

How Pilot States Are Using the Tool

Each participating state decides independently which carriers to examine and how to integrate the tool into existing workflows. Some states are embedding it in market conduct reviews; others are folding it into financial examinations. The pilot tests how the tool performs across different regulatory contexts rather than prescribing a single methodology.

Companies selected span property/casualty, life, and health insurance, and include carriers of various sizes. Regional and mid-sized carriers in pilot states should expect they could receive an inquiry.

Regulators have stated they will prioritize AI systems most likely to produce consumer harm. Internal workflow tools and back-office automation will draw less scrutiny than systems that deny claims, set premiums, or flag policyholders for fraud investigation. The NAIC's stated position: existing insurance laws apply to AI-driven decisions the same way they apply to decisions made by human adjusters. Where the AI touches the policyholder, the regulatory interest follows.

"We Bought It From a Vendor" Is Not an Answer

One provision warrants specific attention. Carriers are required to take full responsibility for AI platforms purchased from third-party vendors. A carrier using a vendor's claims triage model needs to answer Exhibit C and Exhibit D questions about that model: its design, its training data, its performance characteristics, its bias testing history.

Many carrier vendor contracts pre-date the NAIC's AI governance expectations. Provisions for model documentation access, bias testing transparency, and performance data sharing often weren't negotiated because nobody anticipated regulators would ask for them. Discovering that gap during an examination inquiry leaves no good options on a regulatory timeline. Vendor negotiations and contract amendments take months, and the clock on a regulatory response does not pause for procurement.

The Path to November

The pilot runs through September 2026. The tool will be updated based on feedback and re-exposed for public comment through October. The NAIC expects adoption at its Fall National Meeting in November 2026. After adoption, every state insurance department in the country can deploy it.

Separately, the NAIC appointed a Market Conduct Regulation Modernization Working Group at its Spring 2026 meeting. Illinois DOI Director Ann Gillespie framed the scope: "Artificial intelligence, new distribution models, national scale vendors, and other technological advances are significantly changing both consumers' expectations and insurers' business models and practices." Recommendations are expected by year-end.

Between the evaluation tool providing standardized questions and the modernization working group overhauling examination procedures, state regulators will have both the instrument and the methodology by early 2027.

What Carriers Should Build Before November

Carriers in pilot states may already be receiving inquiries. Everyone else has roughly six months before the tool reaches their domestic regulators.

Start with the AI system inventory. If you cannot produce a complete list of every AI and machine learning system in your operations, that's the first gap to close. Exhibit A asks exactly this, and pilot states are focused there first. Include vendor-embedded models. Include ML features inside larger platforms that nobody calls "AI." Classify each system by risk tier based on policyholder impact.

Audit vendor contracts against Exhibit C and D requirements. For every third-party AI system, verify whether your contract provides access to model design documentation, training data details, validation procedures, and bias testing results. Where those provisions are absent, start the amendment process. These negotiations take time you may not have later.

Close the gap between governance policies and operational evidence. An AI governance policy and an oversight committee are the starting point, not the finish line. Examiners will ask for what those structures actually produce: bias testing results with dates, performance monitoring data across segments, decision audit logs, incident response records. Automated evaluation and continuous supervision generate this evidence as a byproduct of normal operations, which is both more efficient and more convincing to an examiner than documentation assembled under pressure.

Prioritize Exhibit C readiness for high-risk systems. Identify every AI system that directly affects policyholders: claims triage, underwriting models, pricing engines, fraud detection. For each, you need bias testing results, performance metrics, and validation documentation. Swept AI's evaluation framework produces exactly this evidence, and Trust Reports package it into the audit-ready format examiners expect.

The 12-state pilot is a calibration exercise. The NAIC is refining how the tool works in practice, not debating whether to deploy it. Six months is enough time to build governance infrastructure that produces evidence continuously, but not to reconstruct the historical evidence trail that examiners have already learned to look for.

If your carrier operates in a pilot state or needs to be ready before November, we can show you what examination-ready AI governance looks like in practice.

Join our newsletter for AI Insights