Skip to content

Latest commit

 

History

History
250 lines (172 loc) · 10.6 KB

File metadata and controls

250 lines (172 loc) · 10.6 KB

Explainer: What Are SHAP Values?

The reason "the model said so" is not an explanation - and what to do instead.


The One-Sentence Definition

SHAP values (SHapley Additive exPlanations) are a method for explaining individual predictions made by any machine learning model - they tell you exactly how much each feature contributed to a specific output, in a unit you can reason about.


Why This Matters

Most AI systems deployed in high-stakes domains - criminal justice, hiring, lending, healthcare - are black boxes. They produce a score or a decision, and they don't explain why. This is not just an inconvenience. It is a fairness problem.

If you cannot inspect how a model arrived at a decision, you cannot:

  • Detect whether a protected attribute or proxy variable drove the outcome
  • Challenge a decision that affected you
  • Comply with regulations that require explainability (GDPR Article 22, EEOC, ECOA)
  • Audit the system for bias before deployment

SHAP values make the black box transparent. They are grounded in cooperative game theory - specifically, Shapley values - which provide a mathematically principled way to attribute credit (or blame) across features. Unlike simpler techniques, SHAP values are consistent, locally accurate, and model-agnostic.


The Intuition: Fair Credit Attribution

Imagine three people - Alice, Bob, and Carol - collaborate on a project that earns $300,000. Who gets credit for what? The Shapley value from game theory answers this by asking: on average, across every possible ordering in which these people could join the project, how much does each person's addition increase the outcome?

SHAP applies this same logic to features. For a single prediction:

  • Your model outputs a score of 0.82 (high recidivism risk)
  • SHAP asks: across all possible subsets of features, how much does each feature shift the prediction away from the average?
  • The result is a signed contribution value for every feature - positive means it pushed the prediction up, negative means it pushed it down

The key property: the SHAP values for all features sum exactly to the difference between the model's prediction and the global average prediction.

model_output = base_value + shap(age) + shap(custody_status) + shap(race) + ...

Real-World Proof: COMPAS Analysis

We applied SHAP to the ProPublica COMPAS dataset to inspect what the biased model was actually using.

Setup

import shap
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

df = pd.read_csv('compas-scores-raw.csv')

# Biased feature set (includes race + proxy)
X = pd.get_dummies(df[['race', 'Sex_Code_Text', 'CustodyStatus', 'MaritalStatus']])
y = (df['DecileScore'] >= 5).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Compute SHAP values
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

What SHAP Revealed

--- MEAN ABSOLUTE SHAP VALUES (biased model) ---

Feature                     Mean |SHAP|
----------------------------------------------
race_African-American       0.1847   ← highest driver
CustodyStatus_Jail          0.1623   ← second highest (proxy)
race_Caucasian              0.1401
Sex_Code_Text_Male          0.0412
MaritalStatus_Single        0.0298

The model's single largest driver was race. The second largest was custody status - a known proxy for race. Together they accounted for more than 60% of total feature influence.

Without SHAP, you would not know this. The model's accuracy numbers alone tell you nothing about which features drove individual predictions.

After the Fix - Fair Model SHAP

# Fair feature set (race + proxy removed)
X_fair = pd.get_dummies(df[['Sex_Code_Text', 'MaritalStatus']])
model_fair = RandomForestClassifier(n_estimators=100, random_state=42)
model_fair.fit(X_train_fair, y_train_fair)

explainer_fair = shap.TreeExplainer(model_fair)
shap_values_fair = explainer_fair.shap_values(X_test_fair)
--- MEAN ABSOLUTE SHAP VALUES (fair model) ---

Feature                     Mean |SHAP|
----------------------------------------------
MaritalStatus_Single        0.0531
Sex_Code_Text_Male          0.0489
MaritalStatus_Married       0.0214

Race is gone from the feature set. The model now distributes influence across legitimate remaining features. SHAP confirms the fix worked - not just at the aggregate fairness gap level, but at the level of individual decision logic.


How to Use SHAP in Practice

Installation

pip install shap

For Tree Models (Random Forest, XGBoost, LightGBM)

import shap

# TreeExplainer is fast and exact for tree-based models
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# shap_values shape: [n_samples, n_features] for binary classification (class 1)
# For multi-class: list of arrays, one per class

For Any Model (Model-Agnostic)

# KernelExplainer works with any model; slower but universal
explainer = shap.KernelExplainer(model.predict_proba, shap.sample(X_train, 100))
shap_values = explainer.shap_values(X_test[:10])

Explaining a Single Prediction

def explain_decision(model, explainer, X_test, sample_idx):
    """
    Return a sorted explanation for one prediction.
    Positive values → pushed prediction UP (toward flagged/rejected/denied)
    Negative values → pushed prediction DOWN
    """
    sv = explainer.shap_values(X_test.iloc[[sample_idx]])[1][0]
    feature_names = X_test.columns.tolist()

    explanation = sorted(
        zip(feature_names, sv),
        key=lambda x: abs(x[1]),
        reverse=True
    )
    return explanation

# Example output for a high-risk COMPAS score:
# [('race_African-American', +0.21), ('CustodyStatus_Jail', +0.18), ...]

Bias Audit Using SHAP

def shap_bias_audit(shap_values, feature_names, protected_features):
    """
    Flag any protected attribute or known proxy in the top drivers.
    Returns sorted mean absolute SHAP values, with bias flags.
    """
    mean_abs_shap = pd.Series(
        shap_values.mean(axis=0),
        index=feature_names
    ).abs().sort_values(ascending=False)

    result = mean_abs_shap.reset_index()
    result.columns = ['feature', 'mean_abs_shap']
    result['bias_flag'] = result['feature'].apply(
        lambda f: 'PROTECTED OR PROXY' if any(p in f for p in protected_features) else ''
    )
    return result

# Run before deployment
protected = ['race', 'gender', 'age', 'CustodyStatus']
audit = shap_bias_audit(shap_values[1], X_test.columns, protected)
print(audit.head(10))

SHAP vs Other Explainability Methods

Method Scope Consistency Model-Agnostic Speed
SHAP Local + Global ✓ Mathematically guaranteed ✓ Yes Medium–Fast
LIME Local only ✗ Approximate ✓ Yes Slow
Feature Importance (Gini) Global only ✗ Biased toward high-cardinality ✗ Tree models only Fast
Permutation Importance Global only Reasonable ✓ Yes Medium

SHAP is the only method that is locally accurate (values sum exactly to the prediction), consistent (if a feature matters more, its SHAP value will be higher), and unified across model types. For fairness auditing specifically, local accuracy is essential - you need to know what drove this specific decision, not just what matters on average.


Limitations

1. SHAP values explain the model, not the world. If your model is trained on biased data, SHAP will faithfully explain the biased decision. High SHAP for a legitimate feature doesn't mean the feature is fair - it means the model used it heavily. Use SHAP alongside the proxy variable audit, not instead of it.

2. Correlation is not causation. SHAP measures how much each feature shifted the prediction. It does not tell you whether that feature should shift the prediction. A model can assign high SHAP to zip code without zip code being a valid criterion for the decision.

3. Computational cost at scale. KernelExplainer is slow for large datasets. TreeExplainer is fast for tree-based models, but SHAP for deep neural networks (DeepExplainer, GradientExplainer) is approximate and can be expensive.

4. SHAP values can mislead with correlated features. When two features are highly correlated (like age and employment_tenure), SHAP distributes the shared contribution between them in ways that can understate either feature's individual influence. This is one more reason to run proxy variable detection before computing SHAP - so you already know which features are collinear.


The Bigger Picture

Explainability is not a nice-to-have. It is the mechanism by which accountability becomes possible. A person denied a loan, flagged as high-risk in a courtroom, or rejected by a hiring algorithm has a legitimate interest in knowing why. A company deploying these systems has a legal and ethical obligation to be able to answer that question.

SHAP values do not fix bias. Removing protected attributes and proxy variables fixes bias. What SHAP does is make the model's reasoning visible - so you can catch problems before deployment, explain decisions to the people affected by them, and demonstrate to auditors and regulators that the system is operating as intended.

The pipeline is not complete until you can explain every decision the model makes.


Related Projects in This Repo


Further Reading


Part of The Fair Code Project - exposing and fixing algorithmic bias with real data and open code.