This research project concerns the depiction of gender in historical English language novels, exploring how authors of various backgrounds and experiences described gender in their works.
Currently, we have analyzed a corpus of over 4,200 books from Project Gutenberg, an online book repository, utilizing programming methods we developed. Among our findings, we discovered the ratio of male pronouns to female pronouns, the most common words after male and female pronouns, and the distance between repetitions of male and female pronouns.
As of Summer 2019, the work on this project has been forked into two repos:
- The website presenting our research: https://github.com/dhmit/gender_novels_site
- The Gender Analysis Toolkit, https://github.com/dhmit/gender_analysis
If you would like to contribute to this project, please check out one of those follow-on projects!
This MIT Digital Humanities Lab project is part of the MIT/SHASS Programs in Digital Humanities funded by the Mellon Foundation.
To use our tools or contribute to the project, please view our guide to contributing, CONTRIBUTING.md
. It includes information on how to install the tools we used as well as style guidelines for adding code. We are open to contributions and would love to see other people’s ideas, thoughts, and additions to this project, so feel free to leave comments or make a pull request!
For anybody who wants to use our code, here’s a little outline of where everything is.
In the gender_novels/gender_novels
folder, there are six folders:
analysis
— programming files focused on textual analysis and research write-ups, including data visualizations and conclusionscorpora
— metadata information on each book (including author, title, publication year, etc.), including sample data sets and instructions for generating a Gutenberg mirrordeployment
— this directory contains code for the original Gender/Novels website. This has now been forked and replaced with https://github.com/dhmit/gender_novels_site; we only maintain this code here for historical reasons.pickle_data
— pickled data for various analyses to avoid running time-consuming computationtesting
— files for code teststutorials
— tutorials used by the lab to learn about various technical subjects needed to complete this project
For a user who’ll need some readily available methods for analyzing documents, the files you’ll most likely want are corpus.py
and novel.py
. These include methods used for loading and analyzing texts from the corpora. If you’d like to generate your own corpus rather than use the one provided in the repo, you’ll want to use corpus_gen.py
. If you’d only like a specific part of our corpus, the method get_subcorpus()
may be useful.
Cite the project using either the short or long form:
-
Michael Scott Asato Cuthbert, et al., Computational Reading of Gender in Novels, 1770–1922 (2019) http://gendernovels.digitalhumanitiesmit.org.
-
Michael Scott Asato Cuthbert, Lisa Tagliaferri, Stephan Risi, Ife Ademolu-Odeneye, Dina Atia, Elena Boal, Emily Caragay, Susannah Chen, Alena Culbertson, Howard DaCosta, Mingfei Duan, Maritza Gallegos, Assel Ismoldayeva, Elsa Itambo, Michelle Li, Kelsey Merrill, Charlotte Minsky, Keith Murray, Carol Pan, Isaac Redlon, Shobita Sundaram, Felix Tran, Kate Xu, Derek Yen, Samantha York, Sophia Zhi, Computational Reading of Gender in Novels, 1770–1922: The Gender/Novels Project (2019) http://gendernovels.digitalhumanitiesmit.org and https://github.com/dhmit/gender_novels
This document was prepared by the MIT Digital Humanities Lab.
Copyright © 2018, MIT Programs in Digital Humanities. Released under the BSD license. Some included texts might not be out of copyright in all jurisdictions of the world.