The code in this directory scrapes the data source website and stores the extracted data in a suitable format for analysis.
npm install
The following command extracts data from the online database and run the data transformation scripts:
npm start
The following is executed:
- download the list of movies
- download extra metadata from each movie page
- download all jump-scare timing subtitle files
- process the subtitle files in a single JSON file containing timestamps of all jump-scares (both minor and major ones)
Results are saved in the directory data
.
The following data can be extracted from wheresthejump.com
Download the .srt
files announcing jump scares for the all the movies referenced in the website.
npm run extract-subtitles
The subtitles are placed in the directory data/subtitles
.
Download the list of movies listed in https://wheresthejump.com/full-movie-list/
, together with associated metadata:
-
Director
-
Year
-
Jump count
-
Jump Scare rating
-
Netflix (US)
-
Imdb rating
npm run extract-subtitles
The results are saved in a JSON file: data/moviesList.json
.
This transformation script processes all the downloaded subtitles files and outputs a single file containing the jump-scare timestamps of all movies.
npm run build-jumpscare-timeline
(!) node.js v12+ is required for this transformation to work, since String.prototype.matchAll()
is used. Can be replaced by Regexp.exec
in order to run on older versions of node.js (c.f. MDN Article.
The results are saved in a JSON file: data/jumpScareTimeline.json
.