- Python 3.3+
- PRAW 4.1.0
- reddit user account
- reddit api keys
This is a (work in progress) script which let you pull in posts and their subsequent comment from subreddits, and save it to json.
- make sure you have praw (reddit api wrapper) and boto3 (to write to s3) installed.
- save
config.example.py
asconfig.py
with your own api keys - to get submissions for a subreddit, run
getSubmissions.py
with the argument to the terminal via-i
, e.g.python getSubmissions.py -i dataisbeautiful
- this will save the submissions to a series of json files in the data directory, with the name to be
reddit-[subreddit_name]-submissions-[timestamp of first entry]-[timestamp of last entry]
, e.g.reddit-dataisbeautiful-submissions-1464039126-1460036691.json
- if you don't want to setup/save to s3 then modify the save_to_s3 bit...