Date Approved

2025

Degree Type

Open Access Senior Honors Thesis

Department or School

Mathematics and Statistics

First Advisor

Andrew Ross, Ph.D.

Second Advisor

Mary-Elizabeth Murphy, Ph.D

Third Advisor

Ann R. Eisenberg, Ph.D.

Abstract

Early initiation of substance use during adolescence poses significant risks to long-term health, educational attainment, and social outcomes, making early identification a critical public health priority. Machine learning models have increasingly been used to predict substance use risk; however, many such models do not explicitly examine whether predictive performance differs across demographic groups. This senior project examines the fairness of logistic regression models used to predict first-time alcohol use among adolescents. Using nationally representative survey data from the Youth Risk Behavior Surveillance System (YRBSS), pooled across the 2017, 2019, 2021, and 2023 survey cycles, this study develops logistic regression–based predictive models to estimate the likelihood of first-time alcohol use among middle school students. Model performance is evaluated using standard classification metrics, and fairness is assessed through group-level comparisons of predicted outcomes and error rates across demographic groups. The results indicate that while the baseline models achieve acceptable overall predictive performance, differences in predictions and error rates across demographic groups are observed. These findings underscore the importance of evaluating fairness alongside accuracy when applying machine learning models to sensitive public health contexts such as adolescent substance use prediction.

Included in

Mathematics Commons

Share

COinS