0

Tutorial: Exploring Shap -iq Visualization – Marktechpost

In this tutorial, we will explore a series of Shap-iq visualizations that provide insights into how machine learning models reach their predictions. These visual effects help break down complex model behavior into interpretable components, thus reflecting features on specific predicted individual and interactive contributions. Check The complete code is here.

Install dependencies

!pip install shapiq overrides scikit-learn pandas numpy seaborn
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from tqdm.asyncio import tqdm

import shapiq

print(f"shapiq version: {shapiq.__version__}")

Importing datasets

In this tutorial, we will use the MPG (Mile per gallon) dataset that we will load directly from the Seaborn Library. This dataset contains information about a variety of car models, including features such as horsepower, weight, and origin. Check The complete code is here.

import seaborn as sns
df = sns.load_dataset("mpg")
df

Processing datasets

We use coding tags to convert the classification columns into numeric formats to make them suitable for model training.

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Drop rows with missing values
df = df.dropna()

# Encoding the origin column
le = LabelEncoder()
df.loc[:, "origin"] = le.fit_transform(df["origin"])
df['origin'].unique()
for i, label in enumerate(le.classes_):
    print(f"{label} → {i}")

Divide the data into training and testing subsets

# Select features and target
X = df.drop(columns=["mpg", "name"])
y = df["mpg"]

feature_names = X.columns.tolist()
x_data, y_data = X.values, y.values

# Train-test split
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2, random_state=42)

Model training

We train random forest regressors with maximum depths of 10 and 10 decision trees (N_Estimators = 10). Fixed Random_State ensures repeatability.

# Train model
model = RandomForestRegressor(random_state=42, max_depth=10, n_estimators=10)
model.fit(x_train, y_train)

Model evaluation

# Evaluate
mse = mean_squared_error(y_test, model.predict(x_test))
r2 = r2_score(y_test, model.predict(x_test))
print(f"Mean Squared Error: {mse:.2f}")
print(f"R2 Score: {r2:.2f}")

Explain local instances

We select a specific test instance (with instance_id = 7) to explore how the model reaches its predictions. We will print the true value, predicted value, and eigenvalue of this instance. Check The complete code is here.

# select a local instance to be explained
instance_id = 7
x_explain = x_test[instance_id]
y_true = y_test[instance_id]
y_pred = model.predict(x_explain.reshape(1, -1))[0]
print(f"Instance {instance_id}, True Value: {y_true}, Predicted Value: {y_pred}")
for i, feature in enumerate(feature_names):
    print(f"{feature}: {x_explain[i]}")

An explanation for generating multiple interactive orders

We used the Shapiq package to generate a Shapley-based explanation. Specifically, we calculate:

  • Order 1 (Standard Shapley Value): Personal Characteristic Contribution
  • Order 2 (pair interaction): Combination effect of feature pairs
  • Order n (full interaction): The total number of all interactions reaches the total number of features
# create explanations for different orders
feature_names = list(X.columns)  # get the feature names
n_features = len(feature_names)

si_order: dict[int, shapiq.InteractionValues] = {}
for order in tqdm([1, 2, n_features]):
    index = "k-SII" if order > 1 else "SV"  # will also be set automatically by the explainer
    explainer = shapiq.TreeExplainer(model=model, max_order=order, index=index)
    si_order[order] = explainer.explain(x=x_explain)
si_order

1. Strength

Strive is a powerful visualization tool that helps us understand how machine learning models achieve specific predictions. It displays baseline predictions (i.e., the expected values of the model before looking at any feature), and then shows how each feature “pushs” the predictions higher or lower.

In this picture:

  • The red bar represents the characteristics or interactions that increase predictions.
  • Blue bars represent those bars that reduce it.
  • The length of each bar corresponds to the size of its effect.

When using sapli interaction values, try to see not only individual contributions, but also interactions between features. This makes it particularly insightful when interpreting complex models, because it visually decomposes how the combination of features collectively affects the results. Check The complete code is here.

sv = si_order[1]  # get the SV
si = si_order[2]  # get the 2-SII
mi = si_order[n_features]  # get the Moebius transform

sv.plot_force(feature_names=feature_names, show=True)
si.plot_force(feature_names=feature_names, show=True)
mi.plot_force(feature_names=feature_names, show=True)

Starting from the first graph, we can see that the base value is 23.5. Functions such as weight, cylinder, horsepower, and displacement have a positive impact on the prediction, pushing it above the baseline. On the other hand, the model year and acceleration have negative effects, thus making predictions downward.

2. Waterfall map

Similar to the Strength, the waterfall chart is another popular way to introduce Shapley values initially in the Shap library. It shows how different characteristics push predictions higher or lower than baseline. A major advantage of a waterfall chart is that it automatically groups features with a small impact into “other” categories, making the chart cleaner easier to understand. Check The complete code is here.

sv.plot_waterfall(feature_names=feature_names, show=True)
si.plot_waterfall(feature_names=feature_names, show=True)
mi.plot_waterfall(feature_names=feature_names, show=True)

3. Network diagram

The network graph shows how features interact with each other using first-order and second-order Shapley interactions. Node size reflects the influence of individual characteristics, while edge width and color represent the intensity and direction of interaction. This is especially useful when dealing with many features, revealing complex interactions that simpler plots can miss. Check The complete code is here.

si.plot_network(feature_names=feature_names, show=True)
mi.plot_network(feature_names=feature_names, show=True)

4. Si diagram

SI graphs extend network graphs by visualizing all higher-order interactions to connect multiple functions. Node size shows the effect of individual characteristics, while edge width, color and transparency reflect the intensity and direction of the interaction. It provides a comprehensive perspective on how functions collectively influence model predictions. Check The complete code is here.

# we abbreviate the feature names since, they are plotted inside the nodes
abbrev_feature_names = shapiq.plot.utils.abbreviate_feature_names(feature_names)
sv.plot_si_graph(
    feature_names=abbrev_feature_names,
    show=True,
    size_factor=2.5,
    node_size_scaling=1.5,
    plot_original_nodes=True,
)
si.plot_si_graph(
    feature_names=abbrev_feature_names,
    show=True,
    size_factor=2.5,
    node_size_scaling=1.5,
    plot_original_nodes=True,
)
mi.plot_si_graph(
    feature_names=abbrev_feature_names,
    show=True,
    size_factor=2.5,
    node_size_scaling=1.5,
    plot_original_nodes=True,
)

5. Bar chart

The bar chart is tailored to global interpretations. While other graphs can be used locally and globally, bar graphs summarize the overall importance of features (or feature interactions) by showing average absolute shapley (or interactions) values in all cases. In Shapiq, it highlights the largest average contribution of the interaction. Check The complete code is here.

explanations = []
explainer = shapiq.TreeExplainer(model=model, max_order=2, index="k-SII")
for instance_id in tqdm(range(20)):
    x_explain = x_test[instance_id]
    si = explainer.explain(x=x_explain)
    explanations.append(si)
shapiq.plot.bar_plot(explanations, feature_names=feature_names, show=True)

“Distance” and “Horsepower” are the most influential features overall, meaning they have the greatest personal impact on the model’s predictions. This is evident from the high average absolute sapli interaction values in the bar chart.

Furthermore, when looking at second-order interactions (i.e. how the two features interact), combining “Horsepower × Weight” and “Distance × Horsepower” shows significant joint effects. Their combined attribution is approximately 1.4, indicating that these interactions play an important role in shaping the predictions of the model, rather than contributing each function individually. This highlights the nonlinear relationship between features in the model.


Check The complete code is here. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.


I am a civil engineering graduate in Islamic Islam in Jamia Milia New Delhi (2022) and I am very interested in data science, especially neural networks and their applications in various fields.