Benchmark XGBoost explanations

These benchmark notebooks compare different types of explainers across a variety of metrics. They are all generated from Jupyter notebooks available on GitHub.

  • Model: XGBoost
  • Dataset: Boston Housing (Tabular)

Build Explainers

# use an independent masker
masker = shap.maskers.Independent(X_train)
pmasker = shap.maskers.Partition(X_train)

# build the explainers
explainers = [
    ("Permutation", shap.explainers.Permutation(model.predict, masker)),
    ("Permutation part.", shap.explainers.Permutation(model.predict, pmasker)),
    ("Partition", shap.explainers.Partition(model.predict, pmasker)),
    ("Tree", shap.explainers.Tree(model, masker)),
    ("Tree approx.", shap.explainers.Tree(model, masker, approximate=True)),
    ("Exact", shap.explainers.Exact(model.predict, masker)),
    ("Random", shap.explainers.other.Random(model.predict, masker))


# shap/maskers/
from ._masker import Masker
from ._tabular import Independent, Partition, Impute
from ._image import Image
from ._text import Text
from ._fixed import Fixed
from ._composite import Composite
from ._fixed_composite import FixedComposite
from ._output_composite import OutputComposite

The two types of masker used during building the explainers:

  • masker: Independent masks out tabular features by integrating over the given background dataset.
  • pmasker: Partition Unlike Independent, Partition respects a hierarchial structure of the data.
    • param clusteringstring (distance metric to use for creating the clustering of the features) or numpy.ndarray (the clustering of the features).

The following two types of masker is used during benchmarking:

  • cmasker: Composite merges several maskers for different inputs together into a single composite masker.
  • Fixed leaves the input unchanged during masking, and is used for things like scoring labels.


背景知识 - IG 和 SmoothGrad

Expected gradients combines ideas from Integrated Gradients, SHAP, and SmoothGrad into a single expected value equation.

Integrated Gradients (IG)




核心思想是“removing noise by adding noise”,来源于“给图片加微小扰动会造成梯度解释不稳定”的发现,解决方法是“将 n 张扰动图片的梯度平均”。


实验代码参考: Interpretability: LIME and SHAP in prose and code


实验数据集(Kaggle: Telco Customer Churn)

运行商用户流失预测:字段0~19是用户属性,字段20为标签(Churn: True 表示客户流失)。

Available features:  ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges']

Label Balance - [No Churn, Churn] :  [5163, 1869]

数据集包含 7,043 个用户,其中约 25% 的为流失用户。每个用户的 20 个特征中包含用户的固有属性(gender 性别、SeniorCitizen 是否老年人、Partner 是否单身 等),以及描述开通服务(PhoneService 电话业务、MultipleLines 多线业务、InternetService 网络服务 等)、用户账户(Contract 合同方式、PaperlessBilling 电子账单、MonthlyCharges 月费用 等)的信息。

  • 数据集的特征中即包含连续数据,又包含类别数据;
  • 根据模型的类型,可以将类别字段用不同的方法表示。例如,基于树的模型可以直接使用类别编码来训练,而其他模型(线性回归、神经网络等)使用独热编码的类别变量会取得更好的效果。