Open Access Senior Honors Thesis
Department or School
Mathematics and Statistics
Andrew Ross, Ph.D.
Debra Ingram, Ph.D.
Ann Eisenberg, Ph.D.
With the rapid growth and application of machine learning and artificial intelligence models that can not be understood by humans, there is a growing movement calling for an increase in interpretability. There are numerous methods that attempt to explain these models that vary drastically in the process of evaluating models. In this paper, we investigate a local post-hoc method called SHAP. SHAP utilizes Shapley values from game theory to attribute an importance value to each input in a model at each datapoint. Shapley values can require significant computation time, especially as the number of inputs increases. In order to shorten the computation time, samples are used as the background datasets. In this paper, we investigate the variation in the Shapley values calculated for numerous background samples. We test multiple different SHAP explainers, or calculation methods, for tree and logistic models. In most of our datasets, the explainers that were based on the same model tend to return very similar results. We find that KernelSHAP for the logistic model tends to perform the best, leading to the smallest amount of variance between the background datasets of all the explainers for both models.
Lawless, Jason, "How reliable are SHAP values when trying to explain machine learning models" (2023). Senior Honors Theses and Projects. 773.