Skip to content

Repo for code used during MSc thesis in Applied Social Data Science at LSE. Used to analyse Centre Assessment Grades in 2020, investigating bias in teacher grading judgements.

License

Notifications You must be signed in to change notification settings

louismagowan/cag-equality

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn

Centre Assessment Grades in 2020: A Natural Experiment for Investigating Bias in Teacher Judgements

Code used for capstone project/thesis during MSc in Applied Social Data Science at LSE.


The published article for this project can be read in the Journal of Computational Social Science

About The Project

DISCLAIMER: This work was produced using statistical data accessed via the ONS Secure Research Service. The use of this data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. This work uses research datasets which may not exactly reproduce National Statistics aggregates.

The COVID-19 pandemic meant that in 2020 students in England were unable to sit their examinations and instead received predicted grades, or “centre assessment grades” (CAGs), from their teachers to allow them to progress. Using the Grading and Admissions Data for England dataset for students in 2020 and 2018-2019, this study treats the use of CAGs as a natural experiment for causally understanding how teacher judgements of academic ability may be biased according to the demographic and socio-economic characteristics of their students.

A variety of machine learning models (neural networks, support vector regressions, Optuna hyperparameter-optimised LightGBM models) were trained on the 2018-19 data and then used to generate predictions for what the 2020 students were likely to have received had their examinations taken place as usual. The differences between these predictions and the CAGs that students received were calculated and then averaged across students’ different characteristics, revealing what the treatment effects of the use of CAGs for different types of students were likely to have been.

No evidence of absolute negative bias against students of any demographic or socio-economic characteristic was found, with all groups of students having received higher CAGs than the grades they were likely to have received had they sat their examinations. Some evidence for relative bias was found, with consistent, but insubstantial differences being observed in the treatment effects of certain groups. However, when higher-order interactions of student characteristics were considered, these differences became more substantial. Intersectional perspectives which emphasise the importance of interactions and sub-group differences should be used more widely within quantitative educational equalities research.

Check out the dashboard of results here.

Project Contents

  • cag_code.py: Python file that contains all the code used to analyse data within the Secure Research Service (SRS). The limitations of working in the SRS (due to the sensitivity of the data) meant that code was exported all one file and so is less modular than it could have been. It also meant that any output had to be removed (it was originally in Notebook form).
  • results_analysis.Rmd: RMarkdown file used to produce the conditional average treatment effect graphs for the project.
  • prep_for_visualisations.ipynb: Jupyter Notebook for reformatting and reshaping the CAG data released from the SRS into data that can be more easily uploaded to BigQuery and used in the Data Studio dashboard for the project.

Built With

  • Python: Scikit-Learn, Statsmodels, Scipy, Tensorflow, Keras, LightGBM, Optuna, SHAP, Matplotlib and Seaborn.
  • R: Tidyverse, ggplot2 and rlang

(back to top)

Prerequisites

In order to work with the same data I did, you will need to apply to become an accredited researcher with the ONS (Office for National Statistics). Then you'll need to submit a project application to work with the GRADE dataset. This is a lengthy and involved process, best to allow ~6 months before you'd be able to get started working with the data.

If you do gain access to the dataset, you will also need to request a custom virtual environment to be created for you within the Secure Research Service. Then you'll also need to request whatever non-standard packages you want to be installed there, as you are not able to ingest code or install packages yourself. I requested Keras, Tensorflow, LightGBM, Optuna and Shap to be installed.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Louis Magowan - Medium Profile

Project Link: https://github.com/louismagowan/cag-equality

(back to top)

Acknowledgments

(back to top)

About

Repo for code used during MSc thesis in Applied Social Data Science at LSE. Used to analyse Centre Assessment Grades in 2020, investigating bias in teacher grading judgements.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published