Data auditing for explainable and fair machine learning

Abstract.: With the increasing use of machine learning in high-stakes domains, there is a growing need for explainability, responsible data governance, and preemptive fairness auditing. Most prior work audits pre-trained predictors for these purposes while overlooking a primary determinant of model performance and fairness characteristics: the training data. This talk first discusses existing feature attribution approaches in the literature. We then highlight our recent work on data-auditing methods that do not require access to trained models, supporting scenarios where data collection and model development occur in silos. Our framework introduces information-theoretic proxy measures to quantify feature utility and bias, derived axiomatically to accommodate diverse data correlation structures and group-fairness criteria. We leverage Shapley-based aggregation to deduce marginal contributions and assess how effectively these measures capture intended bias. Empirical studies on numerous real and synthetic datasets substantiate our framework’s theoretical soundness, interpretability, and robustness across alternative dependency measures.  

Acknowledgment.: This work is supported by NSF award 2452330. 

Speaker: Dr. Mohamed Nafea

Dr. Nafea is an assistant professor in computer engineering at Missouri S&T. Before joining S&T, he was an assistant professor at University of Detroit and before that he spent a year as a postdoc at Georgia Tech. He received his Ph.D. in electrical engineering and master’s in mathematics from Penn State, University Park in 2018 and 2017, respectively. His research lies at the intersection of statistical learning, information and data sciences, and causal reasoning, and aims to solve problems in responsible artificial intelligence including issues of fairness, explainability, privacy, and safety of learning systems.