Home

Intro

Welcome to the course wiki for Topics in Social Data Science offered by the University of Copenhagen. This is the wiki page for the course, where you will find all the info you need. The main thing you want to look for here is course plan (see below), which outlines the content of each coming session. You'll see that it has most of the sessions sketched out, but as we go information will be added, indicating what you should expect from each session and what you must prepare.

In the side-bar, there are pages where you can read about other important details of the course, such as general course preparation, the timeline, assignments, evaluation and more. The teaching will consist of a weekly session with lectures and exercises. The sessions are held Thursday 13-17 in CSS 35-0-12.

Finally, have a look at the <> Code tab (above, left). That's where we will be sharing all the files used in the course, which includes exercises and assignments mostly.

Before the course starts we expect that you have prepared for the course.

Course plan

Week 1 (Feb. 8): Introduction and Python. [UA] This is get started week. The lecture will introduce the course, how it works, what you can expect to learn and what we expect you to deliver. We will introduce the idea of social data science and why it's important, motivate the use of Python and introduce some of the tools you will be learning. The exercises will be part install party to ensure everyone is up to speed on the technological side, and part Python exercises teaching you how to work with data practically.
- Preparation: Get set up with Python, learn the basics if you haven't already and read about social data science. But above all, prepare for the course.
  - Download and install Python 3.6 from the Anaconda distribution (not version 2.7).
Week 2 (Feb. 15): Scraping and structuring. [SR/ABN] We provide a brief recap of data scraping and data structuring.
- Readings:
  - The three part blog by Greg Reda 'Intro to pandas data structures'
Week 3 (Feb. 22): Data modelling. [ABN] Short recap of basic machine learning concepts.
- Preparation: read the following subsections.
  - Raschka and Mirjalili (2017)
    - Chapter 3: A Tour of Machine Learning Classifiers Using scikit-learn
      - Choosing a classification algorithm
      - First steps with scikit-learn
      - Modeling class probabilities via logistic regression
    - Chapter 10: Predicting Continuous Target Variables with Regression Analysis
      - Introducing linear regression
      - Implementing an ordinary least squares linear regression model
      - Evaluating the performance of linear regression models
      - Using regularized methods for regression
      - Turning a linear regression model into a curve – polynomial regression
    - Chapter 11: Working with Unlabeled Data – Clustering Analysis
      - Grouping objects by similarity using k-means
  - Friedman, J., Hastie T., and R. Tibshirani (2017)
    - Regularization/shrinkage, 3.4.1-3.4.3
    - Clustering and k-means, 14.3, 14.3.1, 14.3.2, 14.3.6
    - Principal components, 14.5.1
Week 4 (Mar. 1): Networks 1. [UA] Introduction to networks: types, applications and limitations. Representing and describing networks.
- Preparation: Compile examples in your mind of what nodes and links in a network can represent. Understand what the following terms mean: path, adjacency matrix, sparse, connectedness, centrality, local clustering, and communities.
  - Read Network Science chapter 1, 2, 9. Specifically read the four blocks [1.2-5], [2.1-5], [2.9-10], [9.1].
Week 5 (Mar. 8): Networks 2. [UA] Processes on networks: clustering and spreading. Null models.
Week 6 (Mar. 15): Spatial data. [ABN] Learn to efficiently use geolocation data. Topics including working spatial shapes (polygons, lines etc.) and methods for storage, manipulation, changing coordinate system, feature extraction as well as plotting data.
- Preparation:
  - Install geo packages with the following command:
    - conda install -c conda-forge geopandas folium -y
  - Gimond (2017), read the following subsections
    - Required reading:
      - Chapters 1, 2, 3.1.1, 5, 8.2, 8.3, 9, 14.1
    - Further topics:
      - Good maps: chapter 4, 6
  - Inspirational:
    - Alex Hern in The Guardian on Jan 28, 2018: Fitness tracking app Strava gives away location of secret US army bases
    - Ali Winston in The Verge on Feb 27, 2018: Palantir has secretly been using New Orleans to test its predictive policing technology
Week 7 (Mar. 22): Text as Data. [SR] Exploration and Basic Information Retrieval.
- Preparation:
  - Install text processing packages with the following command:
    - conda install -c anaconda nltk
    - conda install -c conda-forge spacy
    - conda install -c anaconda gensim
  Required Readings
  - Chapter 2 and 4. Dan Jurafsky and James H. Martin: [Speech and Language Processing (3rd ed. draft)] (https://web.stanford.edu/~jurafsky/slp3/)
  Extra
  - Chpt 16-18 Flat Clustering in Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval
  - David Blei 2012: Probabilistic Topic Modelling
Week 8 (Apr. 5): Machine learning 1. [UA] Tree based methods, ensemble learning, cross validation and more.
- Preperation:
  - Read chapters 11 and 17 on Machine Learning and Decision Trees in Data Science from Scratch.
  - Read this blog post on decision trees.
Week 9 (Apr. 12): Text as Data 2. [SR] Computational content analysis. Scaling qualitative evalutions using supervised and ... unsupervised ... learning.
- Preparation:
  - Read Sida Wang and Christoffer Mannings article on simple baseline models using only bigram features.
  - Read "Machine Translation: Mining Text for Social Theory" by James Evans and Pedro Aceves.
  - And finally read David Bleis paper on "Probabistic Topic Models"
Week 10 (Apr. 19): Machine learning 2. [SR] This lecture will focus on how to improve your models using unsupervised learning, semi-supervised learning, and transfer learning. The focus is on feature engineering, learning from unlabeled data and transferring knowledge from other learning tasks.
- Preparation:
  - Read the seminal papers on Word Embeddings by Mikolov et. al (2013): here and here.
- Inspirational:
  - Read Chapter 3 on Semi-Supervised learning in Anders Søgaard's book on Semi-supervised learning and domain adaptation (2013).
  - Read Sebastian Ruder's informative blogpost on the benefits of auxilliary tasks and Multi-Task learning and this brilliant example by Felbo et. al. (2017).
Week 11 (Apr. 26): Machine learning 3. [ABN] We will focus on applications of machine learning and its interplay causality.
- Preparation, required read:
- Optional reading:
  - Wager, S. and Athey, S., 2017. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association.
Week 12 (May 3): Wrap-up/project work.
- Preparation: None
- You will have a brief opportunity to present your project and get feedback.
Week 13: Project work.
Week 14: Project work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally