This is the talk given by Graham Ganssle and Steve Purves at Data Day Texas 2018. This talk was given in conjunction with Lynn Pausic and Chris LaCava's talk about how human bias is preserved in machine learning systems.
We show how biased training data biases results of model outputs by assessing the qualification of loan applicants based on US Census data. We train our model on a dense, varied dataset and quantify the difference in apparent loan-worthiness with respect to applicant gender.
Are female loan applicants automatically screened out of credit applications by biased computer models?
We use a graph convolutional network to predict a node property (credit worthiness) from other node properties and edge connections to other credit applicants.
The data used in this experiment is extracted from the 1994 US Census data. It is the commonly referenced Census-Income dataset, AKA the "Adult" dataset. We got it from the UCI ML Repo, here.
You first have to condition the data by running the data_cleaning
and test_cleaning
notebooks. Then you have to run the graphicator
notebook to build the graph and associated files out of the clean csv files.
Before you run train the GCN you have to build the GCN code. Do this by cd gcn; python setup.py install;
. Then to train, cd into the one-level-deeper gcn/ and run the training script: cd gcn; python train --dataset credit
.
data-day-TX-2018 by Lynn Pausic, Graham Ganssle, Steve Purves, Expero Inc is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.
This work borrows heavily from Graph Convolutional Networks by Thomas Kipf and Max Welling, licensed MIT: ©Thomas Kipf, 2016. You can find their excellent paper here.
The data used in this experiment was obtained from the UCI ML Repository: Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.