Understanding the Factors Behind Airline Passenger Satisfaction through XAI Approaches

Explaining a Machine Learning Model using XAI Methods

Introduction

Explainable Artificial Intelligence (XAI) aims to provide understandable explanations of AI models and their predictions to individuals without a strong background in AI. In recent years, XAI has become a highly sought-after area of research due to the growing demand for transparency in AI systems. The three key principles of XAI are: transparency, which refers to making the inner workings of a model easily accessible; interpretability, which involves the ability to comprehend the model’s decisions; and explainability, which pertains to the provision of clear, human-understandable explanations of the model’s outputs.

Interpretable machine learning can be achieved through two approaches. One approach involves designing a predictive model that inherently provides interpretable results, such as linear regression or decision trees. The other option involves using a black-box model and applying a post-training explanation method, referred to as agnostic methods. I will outline some of these methods and provide illustrations using a binary classification model as a context.

In this article, we explore the application of XAI methods to enhance the understanding of a machine learning model designed to predict airline passenger satisfaction. Through the use of XAI techniques, we aim to uncover the key factors that contribute to passenger satisfaction and provide a clear, human-understandable explanation of the model’s predictions.

Airline Passenger Satisfaction Dataset

This dataset contains an airline passenger satisfaction survey. Here, the goal is to predict passenger satisfaction. The data was made public by Klein TJ in Kaggle, and the columns are:

Gender (male or female).
Customer type (loyal or disloyal customer).
Age.
Type of travel (personal or business travel).
Flight class (business, eco, or eco plus).
Flight distance.
Arrival delay in minutes.
Airline satisfaction level (satisfaction, or neutral or dissatisfaction).
Satisfaction level among the following services, rated from 0 to 5, 0=not applicable, 5=most satisfaction:

Inflight wifi service	Departure/arrival time	Ease online booking	Gate location
Food and drink	Online boarding	Seat comfort	Inflight entertainment
On-board service	Leg room service	Baggage handling	Check-in service
Inflight service	Cleanliness

Global Model-Agnostic Methods

In XAI, we refer to global methods to algorithms that give a comprehensive explanation of the entire data set.

Permutation Feature Importance

Permutation feature importance measures the increase in the prediction error–or decrease in model score–after permutating the feature values.
The permutation breaks the relationship between feature and target. Increase in prediction error is an indicative of the model dependence on the feature.
We obtain the importance of the features with the following expression.

i_j = importance of feature j
s = fitted model score on training or validation dataset
K = number of different permutations
s_kj = model score on permutated dataset.

Example

We observe, in Figure 1, that after randomly shuffling the features Personal travel and Inflight wifi service there is a decrease in the recall by 0.193 and 0.189. The decrease in the score means that the ML model depends heavily on these features to predict passenger satisfaction.

Figure 1. Permutation Feature Importance Plot.

Explaining a Machine Learning Model using XAI Methods — Figure 1. Permutation Feature Importance Plot.

Partial Dependence Plot

The partial dependence plot (PDP) shows the marginal effect of a set of features on the outcome.
This helps to discover the nature of the relationship between the features and the target (e.g., linear, non-linear).
For regression, the partial dependence function is defined by:

where,
S= set of features of interest
C= set of other features
x_S= features of interest
x_C= other features
f ̂s= partial function
f ̂= ml model

In practice, we estimate the function using the following expression:

Given values of the features in , the partial function shows:

- The average marginal prediction effect, for regression.
- The average target class probability, for classification.

Disadvantages

The PDP assumes that the features in and are not
A correlation between features can bias the estimated effect due to unlikely data points generated in the computation of the PDP.
PDP also hidden heterogeneous effects – since it is the mean of change in marginal effects.

Example

On average, the passenger satisfaction probability when it is a business travel is 0.54. For inflight wifi service, no service and 5 rating reach equal or more than 0.70. Loyal customers have 0.48 probability of satisfaction (Fig. 2).

We can see some strong correlations between features in the training data (Figure 3). For example, the feature of interest inflight wifi service plotted above is strongly correlated with ease of online booking. In this case we should trust more the ALE (Accumulated Local Effect) plots, which are not affected by strong correlations.

Accumulated Local Effect (ALE) Plot

Intuition

ALE plots describe how the features influence the predictions, on average.
ALE plots calculate differences in predictions in small windows around the feature value.

Estimation

Divide the feature in intervals.
Compute differences in predictions for each instance inside the intervals.
Average the difference in predictions for each interval.
Accumulate average across all intervals.

N_j(k): neighborhood defined by the k-th interval of feature x_j
n_j(k): size of neighborhood (number of instances)
k_j(x): number of intervals of feature x_j
x_j⁽ⁱ⁾: i-th instance of j-th column
z_kj: grid value
Center the effect so the mean is zero.

Interpretation

The value of the ALE can be interpreted as the main effect that a feature has at certain value compared to the average prediction of the data.
Example: = -2 ( = 3) then the prediction is lower by 2 compared to the average prediction.
The grid intervals can be specified with the feature quantiles.

Advantages

Works when features are correlated.
Easy interpretation.

Example

We see that a passenger with no service of inflight wifi service has 0.55 more probability of satisfaction that the average passenger. The personal travel plot shows that a passenger on a personal travel has 0.21 less probability of satisfaction than the average passenger, while passengers on a business travel, has 0.21 more probability (Figure 4).

Feature Interaction

When features interact with others, the sum of the independent feature effects does not fully express the prediction, since the feature effect depends on values of other features.
One method to measure the effect between features is the Partial Dependence Variance method.
The intuition is that weak interaction effect between two features and on the response Y suggest that the importance has little variance when one of the features varies and the other is left constant.

Estimation

Construct the PD (Partial Dependence) function
Compute the feature importance of while is constant, for all values of .
Take the standard deviation of the resulting importance scores across all values of .
Similarly, we compute the same standard deviation across all values of
Compute the feature interaction averaging the two results.

There are some interactions detected, such as disloyal customer and personal travel; personal travel and inflight wifi service; or disloyal customer and inflight wifi service (Figure 5).

Local Model-Agnostic Methods

Local model-agnostic methods aim to explain individual predictions.

Individual Conditional Expectation

Individual conditional expectation (ICE) plots are the PDP equivalent for individual data instances.
An ICE plot shows the prediction dependence of all instances, while PDP averages them.
The average relationship between feature and the predicted value – PDP output – works when there is a weak interaction between set S and set C.
ICE plots provide more insights when there are interactions.

Example

We see (Figure 6) that the ICE for type of travel gives us additional information. ICE lines for disloyal customers are flat while loyal customers show a decrease of dependence when it is personal travel. We observe similar patterns for the interaction between inflight wifi service and personal travel or disloyal customers: personal travels and disloyal customers have a low probability for values 1 to 4, while if it is a business travel or a loyal customer, the probability is higher and, in some cases, remains flat at 80%.

Counterfactual Explanations

Counterfactual explanations express a causal situation in the form: “if X (causes) hadn’t occurred, then Y (event) wouldn’t have occurred.”
In the ML context, Y is the model prediction and X are the feature values.
Counterfactual thinking requires imagining a hypothetical situation that contradicts the observed facts.
The goal of counterfactuals Is to provide actionable guidance, in the form of steps that a consumer might take to achieve a different output in the future.

Example

Here I found three counterfactual explanations for a dissatisfied random passenger. The XGB model predicts dissatisfaction with a probability of 18%. The first counterfactual explanation says that by receiving a better inflight wifi service the passenger is predicted to be satisfied with 71% probability. Similarly, by the second counterfactual, the passenger would have been satisfied if the cleanliness service were a bit better.

Feature	Values
Gender	Female	–	–	–
Customer type	Loyal	–	–	–
Type of travel	Business travel	–	–	–
Class	Business	–	–	–
Age	33	–	–	–
Flight Distance	325	–	–	–
Inflight wifi service	2	5	–	3
Departure/Arrival time	5	–	–	–
Ease of Online booking	5	–	–	–
Gate location	5	–	–	–
Food and Drink	1	–	–	–
Online Boarding	3	–	–	5
Seat comfort	4	–	–	–
Inflight entertainment	2	–	–	–
On-board service	2	–	–	–
Leg room service	2	–	–	–
Baggage handling	2	–	–	–
Check-in service	3	–	–	–
Inflight service	2	–	–	–
Cleanliness	4	–	5	–
Arrive Delay in Minutes	7	–	–	–
Satisfied	0	1	1	1
Probability	0.18	0.71	0.53	0.64
Table 1. Counterfactual Explanations, only changes on features are displayed.

Conclusion

In this article, we present various agnostic methods, both global and local, to enhance our understanding of the XGBoost model used for binary classification in the context of airline passenger satisfaction. These XAI techniques provide a way to fulfill the right to explanation of machine learning models and provide insights into the key factors that influence passenger satisfaction. Through the application of these methods, we aim to provide a clear, human-understandable explanation of the XGBoost model’s predictions and contribute to the field of Explainable Artificial Intelligence.

If you found this article on using XAI methods to explain a machine learning model informative, it’s time to take the next step with HyperSense. As a leader in the AI and machine learning space, HyperSense AI provides a comprehensive platform for building, deploying, and explaining models. With HyperSense AI, you can leverage cutting-edge XAI techniques to gain a deeper understanding of your models and make data-driven decisions with confidence. So why wait?

Start unlocking the full potential of your data.

References

Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2^nd). christophm.github.io/interpretable-ml-book/
L, Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.
Goldstein, A. Kapelner, J. Bleich, and E. Pitkin, “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation” Journal of Computational and Graphical Statistics, 24(1): 44-65, Springer, 2015.
Ramavirind K. Mothilal, Amit Sharma, and Chenhao Tan (2020). Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.

Manuel Meza

Manuel is currently working as Data science Intern at Subex – AI Labs. He is a statistician from the Catholic University of Chile. His areas of interest are computational statistics and data visualization.