White Box Models: When Interpretability Matters

The term "white box" comes from software engineering. It means software whose internals you can view, compared to a "black box" whose internals you cannot. By this definition, a neural network could be a white box model if you can see the weights.

However, by white box people really mean something they can understand. A white box model is a model whose internals a person can see and reason about. This is subjective, but most would agree that millions of neural network weights linked together do not give information about how the model works in a way we could usefully describe or predict.

The Power of Understanding

Compare viewing neural network weights to examining a graph showing risk of death from pneumonia by age. The graph shows the impact of one feature on the risk score with error bars and a best estimate line.

From this single graph of one model feature, we can observe that risk is flat until about age 50, rises sharply at 65 (possibly due to retirement), has narrowest error bars in ages 66-85 where the most data exists, rises again at 85 with wider error bars, and drops above 100 possibly due to lack of data.

All this from one graph of one model feature. There are facts about the shape, and then speculation about why it behaves that way. The facts help understand the data. The speculation may suggest further actions: collecting new features, gathering more data for certain age ranges, or conducting new analyses.

These are not simulations of what the model would do. They are the actual internals of the model. This is the power of a white box model.

This power comes with dangers. By seeing everything, we may believe we understand everything and speculate wildly or "fix" inappropriately. We still must exercise judgment to use data properly.

Why Choose White Box Models

Make a white box model to:

Learn about your model: Not from simulations or approximations, but the actual internals.

Improve your model: By giving you ideas of directions to pursue.

Align your model with domain knowledge: By identifying and correcting behaviors that contradict what experts know about the problem.

Satisfy regulatory requirements: When regulations dictate that you must fully describe your model, human-readable internals become essential.

Logistic Regression: The Classic

Logistic regression was developed in the early 1800s and remains popular because it solves a common problem (predict probability of an event) and is interpretable.

In logistic regression, a unit increase in a feature corresponds to a certain log-odds increase in the outcome. For example, if a lending model has a coefficient of 0.15 for loan amount, then a unit increase in loan amount corresponds to a 16% increase in the odds of default, holding other factors constant.

This statement is what people mean when they say logistic regression is interpretable. The input response terms can be interpreted independently, and the coefficients are in interpretable units.

Why would we use anything other than logistic regression? If features and log odds do not have a linear relationship, the model will not fit well. Trying to fit a line to a curve never works well.

Generalized Additive Models (GAMs)

GAMs were developed in the 1990s and address the linearity limitation. Instead of a linear term for each feature, GAMs use a function for each feature. This function might be a smooth curve like a cubic spline.

GAMs retain white box properties. The input response terms can be interpreted independently. Instead of reporting log odds as a number, we visualize it with a graph. The pneumonia risk by age graph described earlier is one term in a GAM.

Why would we use anything other than a GAM? If there are interactions between features that the model needs to capture, independent functions for each feature will not suffice.

GA2Ms: GAMs with Interactions

GA2Ms add interaction terms to GAMs. The model equation includes functions that can account for two feature variables at once.

These remain white box models because the shape function for an interaction term is a heatmap. Two features are along the axes, and color shows the function response.

The heatmaps are harder to reason about than single-feature graphs. This is likely the interaction effect without the primary effects. To investigate fully requires looking at examples around the borders. But the model behavior remains inspectable.

In practice, you fit all single-feature functions, then add N interaction terms where you choose N. It is not easy to pick N. Interaction terms are worthwhile if they add enough accuracy to justify the extra complexity of interpreting heatmaps.

When to Choose Each Model Type

The models exist on a spectrum from interpretability to modeling feature interactions.

Use GAMs if they are accurate enough. They give the advantages of a white box model: separable terms with interpretable units.

Use GA2Ms if they are significantly more accurate than GAMs, especially if domain knowledge suggests real feature interactions exist that are not too complex.

Try boosted trees (XGBoost or LightGBM) if you do not know much about the data, since they are robust to quirks. These are black box models.

Use neural networks when features interact highly with each other, like pixels in images or context in audio. These are deeply black box.

The Trade-Off

For responsible AI, the choice between interpretable and complex models is a governance decision, not just a technical one.

Regulated industries often have strong incentives for interpretability. Financial services regulations require that credit denials be explainable. Healthcare applications may need to justify treatment recommendations. These requirements push toward white box models even when black box alternatives might be more accurate.

The accuracy difference matters. If a black box model is only marginally better than an interpretable one, the interpretability benefits often outweigh the accuracy cost. If the accuracy gap is large, the decision becomes harder.

AI governance frameworks should establish criteria for when interpretability requirements override accuracy considerations. These decisions should be documented and reviewed as part of model validation.

Building for Understanding

The organizations that succeed with AI in regulated environments often start with interpretable models and only move to more complex architectures when the accuracy requirements demand it.

Starting simple has multiple benefits. You learn about your data and problem. You establish baselines for comparison. You build explainability infrastructure that will be useful even if you later deploy more complex models.

White box models are not always the right choice. But they should be the first choice considered, with complexity added only when necessary.

White Box Models: When Interpretability Matters

The Power of Understanding

Why Choose White Box Models

Logistic Regression: The Classic

Generalized Additive Models (GAMs)

GA2Ms: GAMs with Interactions

When to Choose Each Model Type

The Trade-Off

Building for Understanding

Related Posts

Who Should Explain Your AI

What Is Explainable AI and Why It Matters

Join our newsletter for AI Insights