Most of the AI inside a mid-sized mutual was built by someone else. The underwriting score comes from a vendor. So does the fraud screen, the claims-triage tool, and most of what sits between a first notice of loss and a payment. That arrangement is sound. A 300-person cooperative has no reason to staff a data-science team to rebuild models that already exist and work. The complication shows up at the examination, where the carrier, not the vendor, has to answer for how those models behave.
Regulators have settled the question of where responsibility sits. The NAIC Model Bulletin, now adopted in roughly two dozen states and the District of Columbia, treats a carrier as accountable for any AI that touches a regulated decision, including AI the carrier licensed rather than built. The NAIC's Third-Party Data and Models Working Group has gone further and adopted a broad definition of "third party" that covers any outside entity supplying data, models, or model outputs for insurance activities. A model law on third-party oversight is expected later in 2026, possibly including licensing requirements for the vendors that sell into the industry.
The blind spot in the numbers
The industry has not absorbed this yet. In one 2026 survey of insurance executives, only about 18 percent named third-party model risk as a concern, while roughly 68 percent said outside vendors were a source of their AI. Around 40 percent of the models in use came from more than 70 different vendors. A carrier can be accountable for dozens of models it did not build, cannot fully inspect, and in many cases cannot explain, and still not record that as a risk worth tracking.
The exam is where the gap becomes concrete. An examiner asks for the inventory of AI systems in production, then the validation records for a model that helps set rates, then the change history showing who approved the last version and what testing preceded it, then the trail from a specific input to a specific decision. If the vendor cannot supply those records, and most vendors will not, the missing documentation sits with the carrier, in the room, during the examination. The vendor's business is unaffected, while the examination finding belongs entirely to the carrier.
The trade-secret wall makes this harder. Ask a vendor for the training data, the validation methodology, or the feature weights behind a fraud-scoring model, and the usual answer is that the information is proprietary. The carrier is left accountable for a decision engine it is contractually barred from inspecting. That is a defensible position for the vendor and an untenable one for the carrier standing in front of an examiner, which is why the carrier has to generate its own evidence about how the model behaves, independent of anything the vendor is willing to share.
Why this falls hardest on mutuals
The gap is widest at mutuals, because mutuals depend on vendors the most. A national carrier with an internal data-science group can reconstruct a vendor's validation work, document how the model behaves, and assemble the records an examiner wants. A regional mutual usually cannot. It licensed the model precisely because it lacks that bench. The obligation to prove the model is sound lands on the organization least equipped to satisfy it from internal resources, unless that organization builds the oversight layer on purpose.
A mutual carries one more weight a stock carrier does not. The policyholders affected by a mispriced rate or a wrongly routed claim are the company's owners. A vendor model that discriminates without anyone noticing is more than an exam exposure; it breaks the equitable-treatment promise that defines the cooperative. The accountability a regulator assigns and the accountability a mutual already owes its members point at the same work.
Three things to require before a vendor model goes live
Verify the model on your book, not the vendor's benchmark
A vendor demo runs on the vendor's data. It shows the model can work somewhere. It says nothing about how the model performs on your policyholders, your geographies, your claim types, and your historical mix. Before a licensed model reaches production, evaluate it on your own data against thresholds you set for accuracy, consistency, and hallucination, then keep measuring after go-live so drift surfaces before a policyholder or a regulator finds it. Reliability is something a carrier confirms, not something a contract asserts.
Keep policyholder data inside your boundary
Many vendor models, particularly anything built on a large language model, send data outward to return an answer. For a mutual, that data is a member's claim history, medical detail, or financial record. Private AI access keeps that information inside an environment the carrier controls while still letting the team use whichever model fits the task. The data stays in. Nothing feeds a third party's training set. Access maps to the roles already defined in the organization, so the underwriting team sees underwriting data and nothing more.
Produce the documentation the vendor will not
When the examiner asks for the inventory, the validation records, the change approvals, and the data-to-decision trail, the vendor is rarely the party that answers. Governance tooling logs every input, output, and configuration before go-live, holds a live inventory of every AI application in production, and turns that record into a board-ready Trust Report. The result is the evidence that closes the distance between what the vendor handed over and what the regulator requires.
The discipline is buying without buying on faith
None of this argues that a mutual should build its own models. The 300-person cooperative is right to license. The discipline is in refusing to license on faith. We have seen carriers automate the majority of routine inquiries using vendor and frontier models while holding customer-facing hallucinations at zero, because the models ran inside a governed perimeter, were measured on real data before launch, and left a record an examiner could read. The models came from outside vendors, but the proof that they behaved was the carrier's own to assemble, and it could.
A mutual can outsource the model. It cannot outsource the answer it owes an examiner, a board, and the members whose policies those models touch. The oversight layer is what lets a small carrier give that answer with the same confidence as a company ten times its size.