Last updated on

Explainable AI in Insurance Underwriting: What Regulators Actually Want to See in a Rate Filing

AI Governance
Explainable AI in Insurance Underwriting: What Regulators Actually Want to See in a Rate Filing

A SHAP plot is not an explanation. It is a graph.

That distinction is the source of more rate-filing rejections, deficiency notices, and supplemental data requests than any other single artifact in the modern insurance regulatory file. Carriers submit color-coded feature attribution charts that show which inputs moved which predictions, and examiners send back questions that the chart was never designed to answer. Why does the model assign weight to ZIP code? What proxies for protected class were tested? How does this feature attribution map to the rate factors filed under the actuarial memorandum?

The NAIC Model Bulletin on the Use of Algorithms, Predictive Models, and Artificial Intelligence Systems by Insurers, now adopted in some form by more than half of state insurance departments, sets the expectation directly. Insurers must be able to explain how AI systems reach decisions, demonstrate that those decisions are consistent with filed assumptions, and produce evidence that protected-class proxies were identified and excluded. None of those three requirements is satisfied by handing an examiner a SHAP plot.

What examiners want is a narrative artifact. A document they can read, adopt as their working understanding of the model, and reference in a deficiency response or market conduct exam. The SHAP plot is an input to that artifact. It is not the artifact itself.

What an Examiner-Grade Explanation Contains

An explanation that satisfies a state insurance examiner has three load-bearing components. Each one corresponds to a question the examiner is required to answer before approving a rate filing.

Data lineage. The first question an examiner asks is where the training data came from and how it was constructed. The expected answer covers source systems, extraction dates, the rules used to define the modeling sample, the treatment of missing values, and any transformations applied to inputs before model training. A rate filing that says "the model was trained on five years of policy and claims data" will not survive the first round of questions. A rate filing that documents the specific date range, the inclusion and exclusion rules, the volume of records at each stage of preparation, and the rationale for each transformation gives the examiner what they need to evaluate whether the modeling sample is representative of the rated population.

Feature attribution mapped to filed factors. This is the component most carriers get wrong. A SHAP plot ranks features by their contribution to model predictions. A rate filing requires that contribution to be reconciled against the specific rating factors filed in the actuarial memorandum. If the model uses 47 features and the rate filing names 12 rating factors, the carrier must show how the 47 features roll up into, support, or modify the 12 filed factors. Examiners are looking for the seam between the technical model and the filed rate. Where features do not map cleanly to filed factors, the carrier must explain why the feature is permissible under state rating law and what its actuarial justification is.

Fairness test results. The third component is the documented evidence that the model was tested for disparate impact across protected classes and that proxies for protected class were identified and excluded. The Buchanan Ingersoll & Rooney analysis of state regulator activity finds that explainable AI requirements increasingly include affirmative documentation of bias testing methodology, not just an attestation that bias testing occurred. The expected artifact specifies which protected classes were tested, what proxy variables were evaluated, what statistical tests were applied, what thresholds were used, and what remediation was applied where disparities were found. This bias-testing evidence belongs in the rate filing alongside the explanation, and the methodology behind it is detailed in our companion piece on bias testing methodology for market conduct exams.

Technical Explainability vs. Regulatory Explainability

The gap between what data science teams produce and what regulators accept is, more often than not, a vocabulary problem.

Technical explainability is a body of methods. SHAP values quantify each feature's marginal contribution to a model output. LIME builds a local linear approximation of a complex model around a specific prediction. Partial dependence plots show how predictions change as one feature varies while others are held constant. Counterfactual explanations describe the minimum input change required to flip a prediction. These methods produce mathematically rigorous descriptions of model behavior, and they are useful for model validation and debugging.

Regulatory explainability is a different deliverable. It is a written narrative that an examiner can adopt as their working understanding of the model, reference in correspondence with the carrier, and use as the basis for a recommendation to approve, deny, or request additional information on a rate filing. The narrative incorporates outputs from technical methods, but it speaks the examiner's language: filed factors, rating variables, actuarial justification, protected class, disparate impact, market conduct.

A SHAP plot delivered in isolation forces the examiner to do translation work. They have to read the chart, hypothesize about which filed factors each feature might support, formulate questions about the relationship, and send those questions back to the carrier. Each round of correspondence delays the filing.

A regulatory explanation does the translation in advance. It opens with a plain-language description of the model's purpose and scope. It documents the data lineage. It maps every model feature to a filed rating factor and provides actuarial justification for each. It presents the fairness test results, including the specific proxy variables tested. It closes with a description of the human review process governing model outputs and the conditions under which the model's recommendations would be overridden.

The carrier that produces this artifact submits a rate filing once. The carrier that produces a SHAP plot submits a rate filing, responds to deficiency notices for three months, and then produces this artifact under deadline pressure.

The Examiner Checklist

State insurance department examiners working AI rate filings under the NAIC Model Bulletin framework apply a consistent set of evaluation criteria. The checklist below reflects the questions that surface most frequently in deficiency notices and pre-filing meetings across jurisdictions that have adopted some version of the bulletin.

  1. Does the filing identify every model that contributes to the rate, including vendor-supplied components?
  2. Does the data lineage documentation specify source systems, sample construction rules, and transformation logic in enough detail that an examiner could reconstruct the modeling sample?
  3. Are all model features mapped to specific filed rating factors, with actuarial justification provided where the mapping is not one-to-one?
  4. Is there documented evidence of bias testing, including the specific protected classes evaluated, the proxy variables tested, the statistical methods applied, and the thresholds used?
  5. Where disparities were identified during bias testing, is there documentation of the remediation applied and post-remediation test results?
  6. Does the filing describe the human review process for model outputs, including the conditions that trigger human override?
  7. Is there a model inventory entry for this model that includes version, training date, validation date, deployment scope, and ownership?
  8. Does the filing commit to ongoing monitoring for drift and disparate impact, with a defined cadence and escalation process?
  9. Are vendor models accompanied by documentation that the vendor has provided sufficient technical detail to satisfy these requirements?
  10. Does the filing specify retention periods for model artifacts, training data, and decision logs that align with the state's bad-faith statute of limitations?

A filing that answers all ten questions before the examiner asks them moves through review materially faster than one that answers them in response to deficiency notices. The cost of building the explanation artifact correctly the first time is recovered many times over in reduced cycle time and avoided remediation work.

Where the Explanation Lives

The regulatory explanation is not a one-time document. It is a living artifact that must be regenerated each time the model is retrained, each time features are added or removed, and each time the rating plan is refiled. Carriers that treat explanation as a documentation task assign it to a single person who manually compiles the narrative from data science outputs, actuarial memoranda, and validation reports. That approach works for one filing. It does not scale across a portfolio of dozens of models and hundreds of rate filings per year.

Certification infrastructure automates the generation of the regulatory explanation from the underlying model artifacts, the bias test results, the model inventory entry, and the actuarial filing materials. It produces a document an examiner can read, in the format examiners expect, with the specific evidence each section requires. The data science team continues to use SHAP, LIME, and partial dependence in model validation. The regulatory explanation is generated downstream from those validation outputs and incorporates them where they are useful evidence.

The carriers that have built this capability are not the ones that produce the most sophisticated SHAP plots. They are the ones that have stopped confusing the technical artifact with the regulatory artifact. They produce both, they understand the role of each, and they know which one the examiner reads.

A rate filing is approved or denied on the strength of the narrative the examiner can construct from the materials submitted. The carrier that writes that narrative for the examiner controls the timeline. The carrier that hands over a SHAP plot and waits for questions has surrendered it.

Join our newsletter for AI Insights