Author

Jason Lawless

Date Approved

2023

Degree Type

Open Access Senior Honors Thesis

Department or School

Mathematics and Statistics

First Advisor

Andrew Ross, Ph.D.

Second Advisor

Debra Ingram, Ph.D.

Third Advisor

Ann Eisenberg, Ph.D.

Abstract

With the rapid growth and application of machine learning and artificial intelligence models that can not be understood by humans, there is a growing movement calling for an increase in interpretability. There are numerous methods that attempt to explain these models that vary drastically in the process of evaluating models. In this paper, we investigate a local post-hoc method called SHAP. SHAP utilizes Shapley values from game theory to attribute an importance value to each input in a model at each datapoint. Shapley values can require significant computation time, especially as the number of inputs increases. In order to shorten the computation time, samples are used as the background datasets. In this paper, we investigate the variation in the Shapley values calculated for numerous background samples. We test multiple different SHAP explainers, or calculation methods, for tree and logistic models. In most of our datasets, the explainers that were based on the same model tend to return very similar results. We find that KernelSHAP for the logistic model tends to perform the best, leading to the smallest amount of variance between the background datasets of all the explainers for both models.

Included in

Mathematics Commons

Share

COinS