Code used for capstone project/thesis during MSc in Applied Social Data Science at LSE.
The published article for this project can be read in the Journal of Computational Social Science
DISCLAIMER: This work was produced using statistical data accessed via the ONS Secure Research Service. The use of this data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. This work uses research datasets which may not exactly reproduce National Statistics aggregates.
The COVID-19 pandemic meant that in 2020 students in England were unable to sit their examinations and instead received predicted grades, or “centre assessment grades” (CAGs), from their teachers to allow them to progress. Using the Grading and Admissions Data for England dataset for students in 2020 and 2018-2019, this study treats the use of CAGs as a natural experiment for causally understanding how teacher judgements of academic ability may be biased according to the demographic and socio-economic characteristics of their students.
A variety of machine learning models (neural networks, support vector regressions, Optuna hyperparameter-optimised LightGBM models) were trained on the 2018-19 data and then used to generate predictions for what the 2020 students were likely to have received had their examinations taken place as usual. The differences between these predictions and the CAGs that students received were calculated and then averaged across students’ different characteristics, revealing what the treatment effects of the use of CAGs for different types of students were likely to have been.
No evidence of absolute negative bias against students of any demographic or socio-economic characteristic was found, with all groups of students having received higher CAGs than the grades they were likely to have received had they sat their examinations. Some evidence for relative bias was found, with consistent, but insubstantial differences being observed in the treatment effects of certain groups. However, when higher-order interactions of student characteristics were considered, these differences became more substantial. Intersectional perspectives which emphasise the importance of interactions and sub-group differences should be used more widely within quantitative educational equalities research.
Check out the dashboard of results here.- cag_code.py: Python file that contains all the code used to analyse data within the Secure Research Service (SRS). The limitations of working in the SRS (due to the sensitivity of the data) meant that code was exported all one file and so is less modular than it could have been. It also meant that any output had to be removed (it was originally in Notebook form).
- results_analysis.Rmd: RMarkdown file used to produce the conditional average treatment effect graphs for the project.
- prep_for_visualisations.ipynb: Jupyter Notebook for reformatting and reshaping the CAG data released from the SRS into data that can be more easily uploaded to BigQuery and used in the Data Studio dashboard for the project.
- Python: Scikit-Learn, Statsmodels, Scipy, Tensorflow, Keras, LightGBM, Optuna, SHAP, Matplotlib and Seaborn.
- R: Tidyverse, ggplot2 and rlang
In order to work with the same data I did, you will need to apply to become an accredited researcher with the ONS (Office for National Statistics). Then you'll need to submit a project application to work with the GRADE dataset. This is a lengthy and involved process, best to allow ~6 months before you'd be able to get started working with the data.
If you do gain access to the dataset, you will also need to request a custom virtual environment to be created for you within the Secure Research Service. Then you'll also need to request whatever non-standard packages you want to be installed there, as you are not able to ingest code or install packages yourself. I requested Keras, Tensorflow, LightGBM, Optuna and Shap to be installed.
Distributed under the MIT License. See LICENSE.txt
for more information.
Louis Magowan - Medium Profile
Project Link: https://github.com/louismagowan/cag-equality