AI Bias and Fairness: Detection, Metrics & Mitigation

AI bias occurs when models produce systematically unfair outcomes for certain groups, typically based on protected characteristics like race, gender, age, or disability. Fairness is the practice of detecting, measuring, and mitigating these disparities.

Why it matters: Biased AI can cause real harm. It denies loans, rejects job candidates, misdiagnoses patients, and provides worse service to certain populations. Beyond the ethical imperative, regulations increasingly require bias testing and documentation for high-risk AI systems.

Sources of AI Bias

Training Data Bias

Historical bias: Training data reflects past discrimination. A hiring model trained on historical decisions learns to replicate those biases.

Sampling bias: Training data doesn't represent the deployment population. A facial recognition system trained mostly on lighter skin tones performs worse on darker skin tones.

Measurement bias: The labels or outcomes used for training are themselves biased. Using arrest records to predict crime incorporates policing biases.

Aggregation bias: Combining data from different groups obscures important differences. A medical model trained on aggregated data may work well on average but fail for specific populations.

Model and Algorithm Bias

Feature selection: Including or excluding certain features can encode bias. Using zip code as a feature may proxy for race.

Optimization objectives: Models optimize for overall accuracy, which may come at the expense of accuracy for minority groups.

Architecture choices: Some model architectures amplify small biases in training data into large disparities in outputs.

Deployment Bias

Population shift: The people using the system differ from those in training data.

Feedback loops: Biased outputs influence future training data, amplifying initial disparities over time.

Context mismatch: A model developed for one context performs differently in another.

Fairness Definitions

Fairness is central to AI ethics frameworks. Different fairness definitions capture different intuitions, and they're often mathematically incompatible:

Group Fairness Metrics

Demographic parity: Positive outcomes should occur at equal rates across groups. Problem: ignores differences in underlying qualifications.

Equalized odds: True positive and false positive rates should be equal across groups. Balances benefit (catching qualified candidates) with harm (false positives).

Equal opportunity: True positive rates should be equal across groups. Focuses on ensuring qualified members of each group have equal chances.

Calibration: Predicted probabilities should mean the same thing across groups. A 70% risk score should have the same meaning for all populations.

Individual Fairness

Similarity-based fairness: Similar individuals should receive similar predictions. Challenge: defining "similarity" appropriately.

Counterfactual fairness: Predictions should be the same in a counterfactual world where protected attributes were different.

Impossibility Results

Mathematical proofs show you can't satisfy all fairness definitions simultaneously except in trivial cases. Organizations must:

Choose which fairness criteria matter most for their use case
Accept trade-offs with other definitions
Document and justify their choices

Advanced Fairness Metrics

Standard metrics like demographic parity and equalized odds examine groups in isolation. Advanced approaches address their limitations.

Intersectional Fairness

Traditional metrics examine single protected attributes (gender OR race). Real-world bias often compounds across intersections. Black women may experience different bias than the combination of "Black" and "women" groups separately.

Worst-case comparison methods identify the most disadvantaged subgroup across all attribute combinations. Rather than averaging across groups, these methods surface where harm concentrates.

Implementation approaches:

Enumerate all reasonable intersections of protected attributes
Calculate fairness metrics for each subgroup
Report the worst-performing subgroup, not just aggregate statistics
Set thresholds based on the most disadvantaged group

Intersectional analysis is computationally expensive as subgroup count grows exponentially. Focus on intersections most likely to experience compounded bias.

Quantile Demographic Drift (QDD)

Rather than comparing group averages, QDD examines how the full distribution of model outputs differs across groups. This catches cases where:

Averages are similar but distributions differ
Bias concentrates at the tails (highest/lowest scores)
Different groups have different variance in outcomes

QDD compares quantile functions across groups, surfacing where the disparities are largest and whether they're at the high end, low end, or throughout the distribution.

Subgroup Robustness

A model might perform fairly on average while failing catastrophically for small subgroups:

Rare combinations of feature values
Edge cases not well-represented in training
Populations that emerged after training

Robustness testing probes these corners systematically, using adversarial testing to find where fairness breaks down.

Detecting Bias

Disaggregated Performance Analysis

Break down model performance by protected groups. Look for disparities in:

Accuracy, precision, recall
Error rates and error types
Confidence distributions
Outcomes and recommendations

Slice Analysis

Examine performance across intersections of attributes (e.g., Black women vs. white men) to catch intersectional bias that aggregate metrics miss.

Adversarial Testing

Test with synthetic data designed to surface bias using adversarial testing techniques. Include edge cases, counterfactuals, and adversarial examples.

Production Monitoring

Bias can emerge over time. Monitor for:

Demographic shift in users
Outcome disparities across groups
Feedback loop effects

When bias is detected, AI supervision can act on it: triggering alerts, enforcing fallback behaviors, or routing decisions to human review until the bias is addressed.

Mitigating Bias

Pre-Processing

Rebalance or resample training data
Remove or transform problematic features
Synthesize data for underrepresented groups

In-Processing

Add fairness constraints to optimization objectives
Adjust learning algorithms to reduce disparities
Use adversarial training to remove protected attribute information

Post-Processing

Adjust thresholds differently for different groups
Apply calibration corrections
Implement disparate impact constraints

Process and Governance

Diverse development teams
Stakeholder input from affected communities
Mandatory bias testing in deployment gates
Ongoing monitoring and remediation

Regulatory Requirements

Bias testing is a key component of AI compliance programs:

EU AI Act: High-risk AI systems must be tested for bias and discriminatory impacts. Documentation of testing methodology and results required.

US Fair Lending: Models used in credit decisions must comply with ECOA and Fair Housing Act. Disparate impact testing required.

NYC Local Law 144: Bias audits required for automated employment decision tools. Results must be published.

Sector-specific: Healthcare, insurance, and housing have additional non-discrimination requirements that apply to AI.

How Swept AI Addresses Bias and Fairness

Swept AI provides systematic bias detection and monitoring:

Evaluate: Pre-deployment bias testing across demographic groups. Intersectional analysis to catch bias that aggregate metrics miss. Adversarial testing for fairness edge cases.
Supervise: Continuous monitoring of outcome disparities in production. Alert when performance diverges across populations. Track feedback loop effects over time.
Certify: Documentation of bias testing methodology and results for regulatory compliance. Evidence generation for audits and assessments.

Fairness isn't a one-time checkbox. It's continuous vigilance against disparities that can emerge, shift, and compound over time. See also: The Responsibility Gap.

What is AI Bias and Fairness?