Skip to content

Latest commit

 

History

History
41 lines (19 loc) · 2.1 KB

README.md

File metadata and controls

41 lines (19 loc) · 2.1 KB

ScienceEducationLDA

Public repository for the Science Education LDA Project

Description

This is the public repository for the Science Education LDA research project, which is maintained by Tor Ole Odden and Alessandro Marin.

This project is based on the method published in Physical Review Physics Education Research 1. Also refer to the CCSE/PERC_TopicModel repository.

Jupyter Notebook

See the Science Education LDA Notebook, which contains an extract of the methods described in 1.

Installation

To run the main notebook PERC_TopicModeling.ipynb install the required packages:

pip install -r requirements.txt --user

A file (scied_words_bigrams_V5.pkl) contains the corpus obtained after processing the papers should be downloaded separately. Its size is about 200MB and the link will be posted soon.

The required packages include Gensim (unsupervised semantic modelling on text), NLTK (Natural Language Tool Kit), LDAVis (interactive topic model visualization), scikit-learn, along with standard data analysis libraries such as pandas, numpy, and matplotlib.

Preliminary Results

Graph of average topic prevalence over time: AvgPrev.html

Graph of cumulative topic prevalence over time: CumuPrev.html

Contact

Questions can be directed to Tor Ole Odden

Literature

1: Tor Ole B. Odden and Alessandro Marin, Marcos D. Caballero. Thematic Analysis of 18 Years of Physics Education Research Conference Proceedings using Natural Language Processing, Physical Review Physics Education Research, 2020. Link