tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities #52

kshaffer · 2017-03-16T15:59:38Z

Anyone interested in doing some basic word/n-gram analysis, topic models, etc. on presidential candidate speeches and press releases? Would be really interesting to see which candidates were/weren't plugged in to the extremist communities and when/where certain extremist language creeps into more mainstream campaign discourse.

An R notebook with instructions and code for obtaining this data from The American Presidency Project will be in the exploratory_notebooks folder soon (just submitted a pull request).

kshaffer · 2017-03-16T16:01:35Z

Some examples of what's possible are in my personal GitHub repo.

FWIW, this should be a beginner-friendly project, but also open to more advanced algorithmic analysis.

justinstimatze · 2017-03-20T14:44:34Z

I'm interested in learning R and think this is an interesting project.

kshaffer · 2017-03-20T15:05:08Z

@justinstimatze Excellent! I was able to scrape all of the GOP speeches, press releases, and campaign statements from January 2015 on and assemble into a single CSV, if that helps you explore: https://github.com/kshaffer/presidencyproject/blob/master/data/gop_2016_candidate_docs.csv

And if you're using this project to learn R, I highly recommend Tidy Text Mining. It's a free ebook explaining tools that might be helpful for this analysis.

ghost · 2017-03-21T14:00:12Z

This seems like an interesting project. Is it possible for me to join in on this project?

princeatul · 2017-03-28T15:02:27Z

Hi Kshaffer,

I would like to join this project. I will be working on Pyhton. Is it possible for me to join this project?

bstarling · 2017-03-29T19:09:19Z

@princeatul Thanks for your interest. All the things @kshaffer mentioned should be doable in python as well. If you're interested I would suggest grabbing the data linked above and try tackle one task from the list in the original post. Ex topic modeling once you have a preliminary jupyter notebook open a PR to add it to the exploratory_notebooks section of this repo. I'm not an expert in this area but I do have this tutorial in my backlog that may help you get started.

If you need any help just visit us in #assemble channel on slack or post back here with any questions.

mw0 · 2017-04-30T23:19:00Z

I don't see it discussed above, so I'll mention that FiveThirtyEight had a very interesting article on using latent semantic analysis for topic modeling reddit groups. Certainly an interesting starting point for those interested in seeing what might be done here.

bstarling added beginner-friendly help wanted labels Mar 20, 2017

bstarling added modeling and removed beginner-friendly labels Mar 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities #52

tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities #52

kshaffer commented Mar 16, 2017

kshaffer commented Mar 16, 2017

justinstimatze commented Mar 20, 2017

kshaffer commented Mar 20, 2017

ghost commented Mar 21, 2017 •

edited by ghost

Loading

princeatul commented Mar 28, 2017

bstarling commented Mar 29, 2017

mw0 commented Apr 30, 2017 •

edited

Loading

tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities #52

tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities #52

Comments

kshaffer commented Mar 16, 2017

kshaffer commented Mar 16, 2017

justinstimatze commented Mar 20, 2017

kshaffer commented Mar 20, 2017

ghost commented Mar 21, 2017 • edited by ghost Loading

princeatul commented Mar 28, 2017

bstarling commented Mar 29, 2017

mw0 commented Apr 30, 2017 • edited Loading

ghost commented Mar 21, 2017 •

edited by ghost

Loading

mw0 commented Apr 30, 2017 •

edited

Loading