Skip to content
Andreas Bjerre-Nielsen edited this page May 1, 2018 · 43 revisions

Intro

Welcome to the course wiki for Topics in Social Data Science offered by the University of Copenhagen. This is the wiki page for the course, where you will find all the info you need. The main thing you want to look for here is course plan (see below), which outlines the content of each coming session. You'll see that it has most of the sessions sketched out, but as we go information will be added, indicating what you should expect from each session and what you must prepare.

In the side-bar, there are pages where you can read about other important details of the course, such as general course preparation, the timeline, assignments, evaluation and more. The teaching will consist of a weekly session with lectures and exercises. The sessions are held Thursday 13-17 in CSS 35-0-12.

Finally, have a look at the <> Code tab (above, left). That's where we will be sharing all the files used in the course, which includes exercises and assignments mostly.

Before the course starts we expect that you have prepared for the course.

Course plan

  • Week 1 (Feb. 8): Introduction and Python. [UA] This is get started week. The lecture will introduce the course, how it works, what you can expect to learn and what we expect you to deliver. We will introduce the idea of social data science and why it's important, motivate the use of Python and introduce some of the tools you will be learning. The exercises will be part install party to ensure everyone is up to speed on the technological side, and part Python exercises teaching you how to work with data practically.

    • Preparation: Get set up with Python, learn the basics if you haven't already and read about social data science. But above all, prepare for the course.

  • Week 2 (Feb. 15): Scraping and structuring. [SR/ABN] We provide a brief recap of data scraping and data structuring.

  • Week 3 (Feb. 22): Data modelling. [ABN] Short recap of basic machine learning concepts.

    • Preparation: read the following subsections.
      • Raschka and Mirjalili (2017)
        • Chapter 3: A Tour of Machine Learning Classifiers Using scikit-learn
          • Choosing a classification algorithm
          • First steps with scikit-learn
          • Modeling class probabilities via logistic regression
        • Chapter 10: Predicting Continuous Target Variables with Regression Analysis
          • Introducing linear regression
          • Implementing an ordinary least squares linear regression model
          • Evaluating the performance of linear regression models
          • Using regularized methods for regression
          • Turning a linear regression model into a curve – polynomial regression
        • Chapter 11: Working with Unlabeled Data – Clustering Analysis
          • Grouping objects by similarity using k-means
      • Friedman, J., Hastie T., and R. Tibshirani (2017)
        • Regularization/shrinkage, 3.4.1-3.4.3
        • Clustering and k-means, 14.3, 14.3.1, 14.3.2, 14.3.6
        • Principal components, 14.5.1
  • Week 4 (Mar. 1): Networks 1. [UA] Introduction to networks: types, applications and limitations. Representing and describing networks.

    • Preparation: Compile examples in your mind of what nodes and links in a network can represent. Understand what the following terms mean: path, adjacency matrix, sparse, connectedness, centrality, local clustering, and communities.

      • Read Network Science chapter 1, 2, 9. Specifically read the four blocks [1.2-5], [2.1-5], [2.9-10], [9.1].
  • Week 5 (Mar. 8): Networks 2. [UA] Processes on networks: clustering and spreading. Null models.

  • Week 6 (Mar. 15): Spatial data. [ABN] Learn to efficiently use geolocation data. Topics including working spatial shapes (polygons, lines etc.) and methods for storage, manipulation, changing coordinate system, feature extraction as well as plotting data.

  • Week 7 (Mar. 22): Text as Data. [SR] Exploration and Basic Information Retrieval.

  • Week 8 (Apr. 5): Machine learning 1. [UA] Tree based methods, ensemble learning, cross validation and more.

  • Week 9 (Apr. 12): Text as Data 2. [SR] Computational content analysis. Scaling qualitative evalutions using supervised and ... unsupervised ... learning.

  • Week 10 (Apr. 19): Machine learning 2. [SR] This lecture will focus on how to improve your models using unsupervised learning, semi-supervised learning, and transfer learning. The focus is on feature engineering, learning from unlabeled data and transferring knowledge from other learning tasks.

    • Preparation:
      • Read the seminal papers on Word Embeddings by Mikolov et. al (2013): here and here.
    • Inspirational:
      • Read Chapter 3 on Semi-Supervised learning in Anders Søgaard's book on Semi-supervised learning and domain adaptation (2013).
      • Read Sebastian Ruder's informative blogpost on the benefits of auxilliary tasks and Multi-Task learning and this brilliant example by Felbo et. al. (2017).
  • Week 11 (Apr. 26): Machine learning 3. [ABN] We will focus on applications of machine learning and its interplay causality.

  • Week 12 (May 3): Wrap-up/project work.

    • Preparation: None
    • You will have a brief opportunity to present your project and get feedback.
  • Week 13: Project work.

  • Week 14: Project work

Clone this wiki locally