The trade groups lost the procedural fight. The next fight is the findings, and those start showing up in the September report.
In December 2025, a coalition of trade associations representing life, health, property and casualty, mutual, and reinsurance insurers sent a joint letter to the NAIC objecting to the planned AI Systems Evaluation Tool pilot. The letter ran five concerns. The headline phrase, the one that made it into trade press coverage and that carrier executives have repeated to each other ever since, was that the pilot was "one-sided, voluntary for regulators while compulsory for companies." States could opt in or out. Carriers in opting-in states could not.
The other concerns were specific. The pilot duration was undefined. The tool could be used in financial or market conduct exams, or both. There was a possibility of penalties tied to "negative" findings from pilot data. And the pilot might launch before the final tool version went through public comment.
The pilot launched on March 2, 2026 across twelve states. The trade groups got a revised version of the tool. They did not get a delay.
That outcome is the tell. Read it correctly, because the implications run further than the procedural complaint suggests.
The Asymmetry Is Not Going Away. It Is the Whole Point.
Trade groups objected to the asymmetry as if it were an oversight in the pilot's design. It wasn't. The asymmetry is structural, and it reflects a regulatory posture the NAIC has been moving toward for several years. Regulators want a tool they can adopt, deploy, and keep using on their own timeline. They do not want a tool whose deployment is contingent on industry consent at every stage.
The "voluntary for regulators, compulsory for companies" framing reads in the trade groups' letter as an unfairness argument. From the regulator's seat, it reads as a basic feature of how examinations work. State DOIs do not ask carriers' permission to issue market conduct examinations. They issue them under their own statutory authority. The AI Systems Evaluation Tool is, functionally, an examination instrument. Of course the carrier cannot opt out. Carriers do not get to opt out of examinations.
What is genuinely new is that the tool is standardized across states, will produce comparable findings, and will be used to populate a shared regulatory understanding of how the industry deploys AI. That is the part the trade groups understood and that drove the letter. Standardized findings travel. A single state's exam report on a single carrier rarely affects how examiners in other states approach their work. A standardized tool, applied across twelve pilot states, produces a corpus of findings that the rest of the country's examiners will read and use to calibrate their own questions before the tool gets formally adopted in November.
Treat the asymmetry as the operating reality rather than a problem to solve. The problem to solve is what the findings will say about your carrier, and whether the data the pilot collects will reflect a defensible AI governance posture or an unprepared one.
What Regulators Have Signaled and What They Have Not
The public regulatory response to the trade group letter has been measured. Iowa Commissioner Doug Ommen, filling in for the Big Data and Artificial Intelligence Working Group's leadership at a public session, discussed the pilot's structure without addressing the "voluntary/compulsory" complaint head-on. The substantive response from the working group came in the form of the revised tool itself: minor procedural adjustments, no concession on the underlying authority structure, no delay.
Read the silence. Regulators did not engage the asymmetry argument because they did not need to. The pilot proceeded. The states picked their carriers. The inquiries went out. The trade groups' letter became, retroactively, a position document for the lobbying that will surround the November adoption vote, not a working argument that altered the pilot's trajectory.
What regulators did not signal, and this matters, is any willingness to limit the use of pilot findings in subsequent regulatory action. The trade groups asked for assurances that "negative" findings from pilot data would not be used punitively. The revised tool does not contain those assurances. The pilot's data, by the working group's own structuring, will inform the September report and the October re-exposure, and there is no mechanism that prevents specific findings from being cited in subsequent market conduct or financial examinations of pilot participants.
A carrier that produces a weak Exhibit B governance narrative in the pilot now has that narrative on file with a state regulator. That document does not become irrelevant when the pilot ends. It becomes a baseline. The next examination cycle, in 2027 or 2028, will start from that baseline. Carriers who treated the pilot as a one-time exercise rather than as the establishment of a durable regulatory record are setting themselves up for a re-examination posture they will not enjoy.
The September Report Is the Inflection Point
The pilot runs through September 2026. The NAIC working group has stated it will produce a report on findings, and that report will be the basis for the October re-exposure of the tool for public comment. Adoption is targeted for the Fall National Meeting in November.
That sequence has a feature carriers should be planning around. The September report will, almost certainly, identify governance gaps the pilot uncovered. It will not name carriers, but it will name patterns. If forty percent of pilot respondents could not produce a current AI inventory, the report will say so. If a meaningful share of high-risk model documentation was incomplete or inaccessible, the report will say so. Those findings will travel. They will appear in trade press coverage in October. They will inform the October public comments, which will themselves inform the final tool version. They will inform every state DOI's planning for AI examinations in 2027.
For carriers in the twelve pilot states, the practical implication is that the response to the pilot inquiry is also a contribution to the September report. A weak response does not just affect that carrier's standing with that state regulator. It contributes to a national pattern that will, by year-end, harden into the regulatory baseline. Carriers participating in the pilot have an unusually direct opportunity to shape that baseline by responding well, with documented governance, current model inventories, and substantive evidence of continuous AI supervision. They also have an unusually direct exposure to shaping it badly.
For carriers in the thirty-eight non-pilot states, the response interval is different. They do not have an inquiry on their desk. They have approximately seven months before the November adoption vote, after which any state may begin issuing inquiries. That is enough time to do a serious self-assessment using the public version of the tool. It is not enough time to build, from scratch, a governance program that will survive an examination using it.
Self-Assessment Is the Only Hedge That Still Closes
The carriers least exposed to the September report's findings are the ones who are running the tool against themselves now. A non-pilot carrier that walks the four exhibits as a self-assessment in Q2 and Q3 has time to address the gaps the exercise reveals before any state regulator runs the tool against them.
The exercise produces three concrete artifacts. The first is a current AI model inventory that maps every AI system in use across the carrier's operations, with vendor attribution, business function, decision-impact classification, and governance owner. Most carriers do not have this inventory. Building it surfaces vendor-embedded models, decision-component AI inside larger platforms, and machine learning features that no internal team has classified as AI. The pilot will demand this inventory in Exhibit A. Carriers who build it in self-assessment have it ready.
The second is a documented governance framework that survives Exhibit B's scrutiny. A policy document is not a framework. A committee meeting is not a framework. A framework is a documented set of accountability chains, oversight mechanisms, risk classification procedures, and incident response protocols, with evidence that each one is actually executed. Carriers running self-assessment now find out where their stated governance and their actual operations diverge. They have months to close the gap.
The third is a documented audit trail and bias-testing record for high-risk systems, the kind that satisfies Exhibit C and Exhibit D. This is the artifact that takes longest to build, because it requires retroactive documentation of model design decisions, training data sources, validation procedures, and ongoing performance monitoring. Carriers without this record cannot create it under examination deadline. They can only create it if they start before the deadline arrives.
The carriers running this self-assessment in Q2 and Q3 will go into Q4 with a documented baseline of where they stand against the tool. They will not be guessing at the November adoption. They will know, from their own internal exercise, which of their AI systems would survive the tool's scrutiny and which would not. That knowledge converts into a Q3 governance playbook that prioritizes remediation by examination exposure rather than by management instinct.
The procedural fight over the pilot is over. The September report is coming. Carriers who treat the trade groups' letter as a vindication of their right to wait are reading the wrong document. The right document is the calendar. The calendar says self-assessment now, remediation through Q3, defensible posture by November. There is no version of this story where the carriers who waited come out ahead of the carriers who moved.
