Last updated on

Predictive Model Regulation Is Coming for Insurance. Rate Filings Will Never Be the Same.

AI Governance
Predictive Model Regulation Is Coming for Insurance. Rate Filings Will Never Be the Same.

A top-20 personal auto carrier submitted a rate filing in Colorado last year that included a gradient-boosted decision tree for territorial risk segmentation. The model outperformed their prior generalized linear model on every actuarial metric: better Gini coefficient, tighter residual distributions, lower combined ratio on holdout data. The state's Division of Insurance rejected the filing. The rejection letter cited a single issue: the carrier could not demonstrate, to the regulator's satisfaction, that the model's rating variables did not serve as proxies for race or income.

The model was actuarially superior. It was also unexplainable in the specific way that regulators require. That gap between actuarial performance and regulatory explainability defines the central tension in predictive model regulation for insurance.

Rating Statutes Were Written for a Different World

Every state's insurance rating statute rests on three principles: rates must be adequate, not excessive, and not unfairly discriminatory. These principles were codified decades ago, and the models they were designed to govern were straightforward. Generalized linear models produce coefficients: a specific multiplicative factor for each rating variable. A regulator reviewing a GLM-based rate filing can examine each variable, understand its directional effect, assess its magnitude, and evaluate whether it correlates with protected class characteristics.

The transparency is inherent to the method. A GLM that uses credit score as a rating variable produces a coefficient that a regulator can isolate. The regulator can test whether credit score's effect on predicted loss correlates with race by analyzing the variable's distribution across demographic groups. If it does, the regulator can require the carrier to demonstrate actuarial justification or remove the variable. The process is well-established, and the tools for conducting it exist at every state department of insurance.

Predictive models, specifically gradient-boosted trees, random forests, and neural networks, do not produce coefficients in the same way. A gradient-boosted tree may use 200 features, with interactions between features driving predictions in ways that no single coefficient can summarize. Feature importance scores indicate which variables contribute most to predictions in aggregate, but they do not reveal how a specific variable affects the rate for a specific policyholder. Two policyholders with the same credit score may receive different rate impacts because the model's treatment of credit score depends on its interaction with dozens of other variables.

The NAIC recognized this gap. Its Predictive Analytics Working Group has been developing guidance on how state regulators should evaluate predictive models within existing rate-filing frameworks. The draft white paper, circulated among state insurance departments, addresses the fundamental question: how do regulators apply "not unfairly discriminatory" to a model whose internal logic resists the variable-by-variable inspection that GLMs allow?

The Explainability Problem Is Not Abstract

The regulatory challenge is concrete, and it has specific dimensions that carriers must address.

Variable-level effects versus system-level effects. GLMs allow regulators to isolate the effect of each variable. Predictive models produce system-level effects where variables interact. A regulator asking "what is the effect of credit score on this policyholder's rate?" expects a number. A gradient-boosted model produces a different answer depending on the values of every other input variable.

Explainability tools like SHAP values and partial dependence plots can help bridge this gap. In plain terms: a SHAP value measures how much a single variable pushed a specific prediction above or below the model's average, while a partial dependence plot shows how changing one variable affects predictions across the full dataset. These are useful approximations, but they remain approximations. A SHAP value for credit score tells the regulator how much credit score contributed to this prediction relative to the model's average prediction. It does not tell the regulator what would happen if credit score were removed from the model entirely, because removing a variable from a non-linear model changes the behavior of every other variable.

Proxy discrimination in high-dimensional models. A GLM with 15 rating variables can be tested for proxy effects by examining correlations between each variable and protected class membership. A gradient-boosted tree with 200 features creates a combinatorial explosion: the proxy effect may not reside in any single variable but in the interaction between multiple variables that individually show no correlation with protected characteristics. ZIP code alone may not correlate with race at a level that triggers regulatory concern. ZIP code interacted with vehicle age, credit score, and commute distance may produce rating patterns that closely track racial demographics. Detecting this requires testing the model's outputs across demographic groups, not just testing individual inputs.

Reproducibility under model updates. GLMs change slowly. A carrier might update territorial relativities annually and refit the full model every three to five years. Predictive models, particularly those retrained on rolling data windows, can change with every update cycle. A model approved in a rate filing may behave differently six months later after retraining on new data. The filing approved a specific model. The model in production is a different model. Regulators have not yet established whether predictive model approvals attach to the model architecture, the specific trained instance, or the outputs within a defined tolerance band.

What NAIC Guidance Is Shaping

The NAIC's draft guidance addresses several dimensions that will change how carriers prepare rate filings involving predictive models.

Model documentation standards. The guidance contemplates requiring carriers to submit detailed model documentation with rate filings, including training data descriptions, feature engineering processes, model architecture specifications, hyperparameter choices, and validation methodology. For a GLM, this documentation is typically a few pages of coefficient tables and actuarial memoranda. For a gradient-boosted model, the equivalent documentation runs to dozens of pages and requires explaining choices that have no actuarial precedent. Why 500 trees and not 1,000? Why a maximum depth of 6? These are engineering decisions with actuarial consequences, and regulators want to understand the connection.

Disparate impact testing protocols. Multiple states already require or encourage testing predictive models for disparate impact on protected classes. The NAIC guidance is moving toward standardizing the methodology. The emerging consensus favors testing model outputs, the actual rates produced, against protected class distributions rather than testing individual input variables for correlation. This output-based approach addresses the proxy interaction problem: if the model's rates produce significantly different loss ratio patterns across racial groups after controlling for actuarial risk, the model has a disparate impact problem regardless of which variables caused it.

Ongoing monitoring requirements. The most significant departure from current practice: the guidance contemplates requiring carriers to monitor predictive model performance and fairness metrics on an ongoing basis, not just at the point of filing. A model approved in January must continue to meet fairness standards in June. This creates a persistent compliance obligation that did not exist under GLM-based rate-making. The model does not stand still, and the regulatory requirement does not end at approval.

Materiality thresholds. The guidance is exploring standards for when model changes are material enough to require a new filing. If a model is retrained and the maximum rate change for any individual policyholder exceeds a defined threshold, does the carrier need to refile? If aggregate rate adequacy remains within tolerance but individual rate assignments shift significantly, is that a material change? These questions have no established answers, and the guidance will create the first frameworks for addressing them.

What Carriers Must Build

Carriers that wait for final NAIC guidance before building compliance tooling will find themselves retrofitting systems under regulatory pressure. The direction is clear enough to act on now.

Explainability as a production system. SHAP values, partial dependence plots, and local interpretable model-agnostic explanations cannot be generated ad hoc for regulatory examinations. They must be computed and stored for every rating decision the model produces. When a regulator asks why a specific policyholder received a specific rate, the carrier must produce the explanation from production records, not regenerate it from a model that may have been retrained since the decision was made. This requires building explainability into the model serving pipeline, not bolting it on after the fact.

Fairness testing pipelines. Disparate impact testing on model outputs requires access to protected class data that carriers often do not collect directly. The emerging practice uses proxy methodologies, Bayesian Improved Surname Geocoding being the most common, to estimate protected class membership for testing purposes. Building this into a certification pipeline that runs automatically on every model update ensures that disparate impact testing is persistent rather than episodic. A model that passes fairness testing at filing time and fails six months later after retraining creates regulatory exposure that periodic testing cannot prevent.

Model versioning and audit trails. Every model version that serves production rating decisions must be preserved with its complete training configuration, training data fingerprint, and validation results. When a regulator examines rates from March, the carrier must be able to reproduce the exact model that generated those rates, even if the production model has been retrained five times since. This is standard practice in regulated industries like pharmaceuticals and aerospace. Insurance is arriving at the same requirement through a different path.

Rate impact simulation. Before deploying any model update, carriers must simulate the rate impact across their entire book of business, segmented by geography, demographic proxy, and coverage characteristics. The simulation must quantify how many policyholders experience rate changes, the magnitude distribution of those changes, and whether the changes produce disparate patterns across protected class proxies. This simulation capability is the pre-filing gate that prevents regulatory rejection.

The Transition Has Already Started

Colorado's rejection of the gradient-boosted territorial model was not an outlier. Several states including Connecticut, New York, and Illinois have introduced requirements for predictive model documentation and bias testing in insurance rate filings. Each state is moving at its own pace, but the direction is uniform: predictive models used in rate-making will face scrutiny that GLMs never did, because predictive models create risks that GLMs never could.

The ability to explain what your model does is becoming as important as what the model actually does.

Join our newsletter for AI Insights