A rapidly increasing number of applications in industry, academia, and everyday life are – or should be – based on careful analysis of data. With more and more datasets being easily available, some industries have described themselves as “drowning in data”. This course aims to communicate that anyone and everyone needs to know how to be data-curious, how to access data, and how to analyze data. In this course students will learn to appreciate that with the right tools from statistics and computer science we can learn to take advantage of the growing amounts of data without drowning in it, and that almost any question about the world can be answered using data. They will also learn how to find relevant data sources on the web and to critically evaluate these sources. Furthermore, the course will explore the topics such as reproducibility of data analyses (with the consistent use of literate programming and version control tools throughout the course) as well as data privacy, data sharing, data science ethics, which are becoming increasingly more important in today’s society.
The course highlights tools and techniques from statistics, mathematics, computer science, as well as the social sciences and digital humanities to introduce students to various facets of data analysis such as data visualization, wrangling, and sampling to get a suitable data set; data management to be able to access data quickly and reproducibly; exploratory data analysis to generate hypotheses and intuition; modeling to understand and quantify patterns and prediction; and effective communication of results using visualizations and interpretable summaries.
As part of each assignment, assessment, the semester long project, and case study, students will use data analysis skills to solve problems, and present their process and their results as fully reproducible written reports as well as oral presentations.
Source code for
- course website: this repo.
- slides: https://github.com/ids-s1-20/slides
- homework assignments: https://github.com/ids-s1-20/homework
- labs: https://github.com/ids-s1-20/labs
- code alongs: https://github.com/ids-s1-20/code-along
- application exercises: https://github.com/ids-s1-20/application-exercises
Outputs of all these components are linked from the course website.
The course website is created with blogdown, with Hugo Academic theme. Course materials are organized by week and the weekly pages are in content/post/.
Majority of the materials in this course are also available at Data Science in a Box. You are welcomed to reuse materials directly from here but you might prefer to start with the versions in Data Science in a Box since course specific content (e.g, due dates, submission instructions, etc.) have been removed for materials served there.
If reusing materials, please attribute to Data Science in a Box, created by Dr. Mine Çetinkaya-Rundel. And please review the license before reusing.