0

How to use Shapley Interaction Index (SII) to discover and visualize feature interactions in machine learning models How to use the Shapley Interaction Index (SII) to discover and visualize feature interactions

In this tutorial, we explore how to use the Shap-IQ package to reveal and visualize feature interactions in machine learning models using the Shap-IQ software package, which is based on the traditional Shapley value foundation.

Shapley values are ideal for explaining individual feature contributions in AI models, but cannot capture feature interactions. Shapley Interactive provides deeper insights by separating individual influence from interactions, such as longitude and latitude that jointly affect housing prices. In this tutorial, we will start using the Shapiq package to compute and explore Shapley interactions for any model. Check The complete code is here

Install dependencies

!pip install shapiq overrides scikit-learn pandas numpy

Data loading and preprocessing

In this tutorial, we will use OpenML’s bike sharing dataset. After loading the data, we divide it into training and test sets to prepare for model training and evaluation. Check The complete code is here

import shapiq
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import numpy as np

# Load data
X, y = shapiq.load_bike_sharing(to_numpy=True)

# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model training and performance evaluation

# Train model
model = RandomForestRegressor()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print(f"R² Score: {r2:.4f}")
print(f"Mean Absolute Error: {mae:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")

Setting up the interpreter

We set up a Tabularexplainer using the Shapiq package to calculate Shapley interaction values based on the K-SII (K-order Shapley interaction index) method. By specifying max_order = 4, we allow the interpreter to consider the interaction of up to 4 functions simultaneously, resulting in a deeper understanding of how feature groups collectively influence model predictions. Check The complete code is here

# set up an explainer with k-SII interaction values up to order 4
explainer = shapiq.TabularExplainer(
    model=model,
    data=X,
    index="k-SII",
    max_order=4
)

Explain local instances

We select a specific test instance (index 100) to generate local explanations. This code prints the real and predicted values for this instance and then decomposes its eigenvalues. This helps us understand the exact input passed to the model and set the context for the Shapley interactive explanation below. Check The complete code is here

from tqdm.asyncio import tqdm
# create explanations for different orders
feature_names = list(df[0].columns)  # get the feature names
n_features = len(feature_names)

# select a local instance to be explained
instance_id = 100
x_explain = X_test[instance_id]
y_true = y_test[instance_id]
y_pred = model.predict(x_explain.reshape(1, -1))[0]
print(f"Instance {instance_id}, True Value: {y_true}, Predicted Value: {y_pred}")
for i, feature in enumerate(feature_names):
    print(f"{feature}: {x_explain[i]}")

Analyze interaction values

We use the dublicter.clain() method to calculate the sapli interaction value (x) for a specific data instance[100]) The budget is 256 models evaluated. This returns an interactive value object that captures how individual features and their combinations affect the output of the model. max_order = 4 means we consider interactions involving up to 4 functions. Check The complete code is here

interaction_values = explainer.explain(X[100], budget=256)
# analyse interaction values
print(interaction_values)

First-order interaction value

To make things simple, we compute the first-order interaction value – IE, which captures only the standard Shapley value of a single feature contribution (no interaction).

By setting Max_order = 1 in TreePlainer, we say:

“Tell me how much each feature contributes individually without taking into account any interaction effects.”

These values are called standard sapli values. For each function, it estimates the average marginal contribution of all possible features including the permutation list. Check The complete code is here

feature_names = list(df[0].columns)
explainer = shapiq.TreeExplainer(model=model, max_order=1, index="SV")
si_order = explainer.explain(x=x_explain)
si_order

Draw a waterfall

The waterfall chart visually breaks down the model’s predictions into a single feature contribution. It starts with the baseline prediction and adds/subtracts the Shapley value of each function to achieve the final prediction output.

In our case, we will use max_order = 1 (i.e., individual contributions only) to visualize the contribution of each function. Check The complete code is here

si_order.plot_waterfall(feature_names=feature_names, show=True)

In our case, the baseline value (i.e., the expected output of the model has no functional information) is 190.717.

When we add contributions from a single feature (order 1 Shapley value), we can observe how everyone pushes the prediction or delays it:

  • Functions such as weather and humidity have positive contributions, thereby increasing predictions above baseline.
  • Characteristics such as temperature and year have strong negative effects, reducing the forecasts by -35.4 and -45, respectively.

Overall, the waterfall chart helps us understand which features are driving predictions and in which direction (providing valuable insights into model decisions).


Check The complete code is here. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.


I am a civil engineering graduate in Islamic Islam in Jamia Milia New Delhi (2022) and I am very interested in data science, especially neural networks and their applications in various fields.