Explaining a Machine Learning Model using XAI Methods
Understanding the Factors Behind Airline Passenger Satisfaction through XAI Approaches
Explainable Artificial Intelligence (XAI) aims to provide understandable explanations of AI models and their predictions to individuals without a strong background in AI. In recent years, XAI has become a highly sought-after area of research due to the growing demand for transparency in AI systems. The three key principles of XAI are: transparency, which refers to making the inner workings of a model easily accessible; interpretability, which involves the ability to comprehend the model’s decisions; and explainability, which pertains to the provision of clear, human-understandable explanations of the model’s outputs.
Interpretable machine learning can be achieved through two approaches. One approach involves designing a predictive model that inherently provides interpretable results, such as linear regression or decision trees. The other option involves using a black-box model and applying a post-training explanation method, referred to as agnostic methods. I will outline some of these methods and provide illustrations using a binary classification model as a context.
In this article, we explore the application of XAI methods to enhance the understanding of a machine learning model designed to predict airline passenger satisfaction. Through the use of XAI techniques, we aim to uncover the key factors that contribute to passenger satisfaction and provide a clear, human-understandable explanation of the model’s predictions.
Airline Passenger Satisfaction Dataset
This dataset contains an airline passenger satisfaction survey. Here, the goal is to predict passenger satisfaction. The data was made public by Klein TJ in Kaggle, and the columns are:
- Gender (male or female).
- Customer type (loyal or disloyal customer).
- Type of travel (personal or business travel).
- Flight class (business, eco, or eco plus).
- Flight distance.
- Arrival delay in minutes.
- Airline satisfaction level (satisfaction, or neutral or dissatisfaction).
- Satisfaction level among the following services, rated from 0 to 5, 0=not applicable, 5=most satisfaction:
|Inflight wifi service||Departure/arrival time||Ease online booking||Gate location|
|Food and drink||Online boarding||Seat comfort||Inflight entertainment|
|On-board service||Leg room service||Baggage handling||Check-in service|
Global Model-Agnostic Methods
In XAI, we refer to global methods to algorithms that give a comprehensive explanation of the entire data set.
Permutation Feature Importance
- Permutation feature importance measures the increase in the prediction error–or decrease in model score–after permutating the feature values.
- The permutation breaks the relationship between feature and target. Increase in prediction error is an indicative of the model dependence on the feature.
- We obtain the importance of the features with the following expression.
ij = importance of feature j
s = fitted model score on training or validation dataset
K = number of different permutations
skj = model score on permutated dataset.
We observe, in Figure 1, that after randomly shuffling the features Personal travel and Inflight wifi service there is a decrease in the recall by 0.193 and 0.189. The decrease in the score means that the ML model depends heavily on these features to predict passenger satisfaction.
Figure 1. Permutation Feature Importance Plot.
Partial Dependence Plot
- The partial dependence plot (PDP) shows the marginal effect of a set of features on the outcome.
- This helps to discover the nature of the relationship between the features and the target (e.g., linear, non-linear).
- For regression, the partial dependence function is defined by:
S= set of features of interest
C= set of other features
xS= features of interest
xC= other features
f ̂s= partial function
f ̂= ml model
- In practice, we estimate the function using the following expression:
- Given values of the features in , the partial function shows:
- The average marginal prediction effect, for regression.
- The average target class probability, for classification.
- The PDP assumes that the features in and are not
- A correlation between features can bias the estimated effect due to unlikely data points generated in the computation of the PDP.
- PDP also hidden heterogeneous effects – since it is the mean of change in marginal effects.
On average, the passenger satisfaction probability when it is a business travel is 0.54. For inflight wifi service, no service and 5 rating reach equal or more than 0.70. Loyal customers have 0.48 probability of satisfaction (Fig. 2).
We can see some strong correlations between features in the training data (Figure 3). For example, the feature of interest inflight wifi service plotted above is strongly correlated with ease of online booking. In this case we should trust more the ALE (Accumulated Local Effect) plots, which are not affected by strong correlations.
Accumulated Local Effect (ALE) Plot
- ALE plots describe how the features influence the predictions, on average.
- ALE plots calculate differences in predictions in small windows around the feature value.
- Divide the feature in intervals.
- Compute differences in predictions for each instance inside the intervals.
- Average the difference in predictions for each interval.
- Accumulate average across all intervals.
Nj(k): neighborhood defined by the k-th interval of feature xj
nj(k): size of neighborhood (number of instances)
kj(x): number of intervals of feature xj
xj(i): i-th instance of j-th column
zkj: grid value
- Center the effect so the mean is zero.
- The value of the ALE can be interpreted as the main effect that a feature has at certain value compared to the average prediction of the data.
- Example: = -2 ( = 3) then the prediction is lower by 2 compared to the average prediction.
- The grid intervals can be specified with the feature quantiles.
- Works when features are correlated.
- Easy interpretation.
We see that a passenger with no service of inflight wifi service has 0.55 more probability of satisfaction that the average passenger. The personal travel plot shows that a passenger on a personal travel has 0.21 less probability of satisfaction than the average passenger, while passengers on a business travel, has 0.21 more probability (Figure 4).
- When features interact with others, the sum of the independent feature effects does not fully express the prediction, since the feature effect depends on values of other features.
- One method to measure the effect between features is the Partial Dependence Variance method.
- The intuition is that weak interaction effect between two features and on the response Y suggest that the importance has little variance when one of the features varies and the other is left constant.
- Construct the PD (Partial Dependence) function
- Compute the feature importance of while is constant, for all values of .
- Take the standard deviation of the resulting importance scores across all values of .
- Similarly, we compute the same standard deviation across all values of
- Compute the feature interaction averaging the two results.
There are some interactions detected, such as disloyal customer and personal travel; personal travel and inflight wifi service; or disloyal customer and inflight wifi service (Figure 5).
Local Model-Agnostic Methods
Local model-agnostic methods aim to explain individual predictions.
Individual Conditional Expectation
- Individual conditional expectation (ICE) plots are the PDP equivalent for individual data instances.
- An ICE plot shows the prediction dependence of all instances, while PDP averages them.
- The average relationship between feature and the predicted value – PDP output – works when there is a weak interaction between set S and set C.
- ICE plots provide more insights when there are interactions.
We see (Figure 6) that the ICE for type of travel gives us additional information. ICE lines for disloyal customers are flat while loyal customers show a decrease of dependence when it is personal travel. We observe similar patterns for the interaction between inflight wifi service and personal travel or disloyal customers: personal travels and disloyal customers have a low probability for values 1 to 4, while if it is a business travel or a loyal customer, the probability is higher and, in some cases, remains flat at 80%.
- Counterfactual explanations express a causal situation in the form: “if X (causes) hadn’t occurred, then Y (event) wouldn’t have occurred.”
- In the ML context, Y is the model prediction and X are the feature values.
- Counterfactual thinking requires imagining a hypothetical situation that contradicts the observed facts.
- The goal of counterfactuals Is to provide actionable guidance, in the form of steps that a consumer might take to achieve a different output in the future.
Here I found three counterfactual explanations for a dissatisfied random passenger. The XGB model predicts dissatisfaction with a probability of 18%. The first counterfactual explanation says that by receiving a better inflight wifi service the passenger is predicted to be satisfied with 71% probability. Similarly, by the second counterfactual, the passenger would have been satisfied if the cleanliness service were a bit better.
|Type of travel||Business travel||–||–||–|
|Inflight wifi service||2||5||–||3|
|Ease of Online booking||5||–||–||–|
|Food and Drink||1||–||–||–|
|Leg room service||2||–||–||–|
|Arrive Delay in Minutes||7||–||–||–|
|Table 1. Counterfactual Explanations, only changes on features are displayed.|
In this article, we present various agnostic methods, both global and local, to enhance our understanding of the XGBoost model used for binary classification in the context of airline passenger satisfaction. These XAI techniques provide a way to fulfill the right to explanation of machine learning models and provide insights into the key factors that influence passenger satisfaction. Through the application of these methods, we aim to provide a clear, human-understandable explanation of the XGBoost model’s predictions and contribute to the field of Explainable Artificial Intelligence.
If you found this article on using XAI methods to explain a machine learning model informative, it’s time to take the next step with HyperSense. As a leader in the AI and machine learning space, HyperSense AI provides a comprehensive platform for building, deploying, and explaining models. With HyperSense AI, you can leverage cutting-edge XAI techniques to gain a deeper understanding of your models and make data-driven decisions with confidence. So why wait?
Start unlocking the full potential of your data.
- Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2nd). christophm.github.io/interpretable-ml-book/
- L, Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.
- Goldstein, A. Kapelner, J. Bleich, and E. Pitkin, “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation” Journal of Computational and Graphical Statistics, 24(1): 44-65, Springer, 2015.
- Ramavirind K. Mothilal, Amit Sharma, and Chenhao Tan (2020). Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.
Request a demo