Exploratory Data Analysis ( EDA ) - DS201 Fall 2019

Exploratory Data Analysis (EDA) is the first step in your data analysis process. Here, you make sense of the data you have and then figure out what questions you want to ask and how to frame them, as well as how best to manipulate your available data sources to get the answers you need.

This is part 2 of a 3-part program designed to be a introduction to data science and applied machine learning. If you're a developer, analyst, manager, or aspiring data scientist looking learn more about data science, then you're in the right place.

In this section you will cover the techniques that real life data scientists use to gain insight into a dataset. From data engineering to more advanced manipulation and visualizations; this program will show you how to unlock the secrets of any dataset — with the expressed goal of building experiments that require the building of a hypothesis function or “Model”.

Day 1: Overview

Welcome Back! We'll spend the first part of class getting reacquainted and ensuring our environments are working properly. From there we'll briefly review what we went over in DS101 and run through some Pandas exercises to get back into the flow of things before jumping into EDA.

This is a great time to ask questions and go over any material from the previous section that you feel you would like some clarification on. Moving forward we'll be working a lot with Pandas dataframes, so having a solid foundation with respect to Pandas will be beneficial.

Day 2: Introduction to Statistics

Statistics is a field of mathematics that is universally agreed to be a prerequisite for a deeper understanding of machine learning.

Although statistics is a large field with many esoteric theories and findings, the nuts and bolts tools and notations taken from the field are required for machine learning practitioners. With a solid foundation of what statistics is, it is possible to focus on just the good or relevant parts.

In this section of the course, you will discover how you can get started and confidently read and implement statistical methods used in machine learning with Python.

Day 3: Data Cleaning

Proper data cleaning is the “secret” sauce behind machine learning… Well, it’s not really a “secret”… It’s just a bit boring, so no one really talks about it. But the truth is:

Better data beats fancier algorithms
- Even if you forget everything else from this program, please remember this point
Garbage in = Garbage out
- Plain and Simple! If you have a clean dataset, even simple algorithms can learn impressive insights from it!

Now, as you might imagine, different problems will require different methods. For now though, let’s at least ensure we know how to fix the most common issues. This section will give you a reliable starting point, regardless of your dataset.

Day 4: Feature Engineering

To start, feature engineering is very open-ended. There are literally infinite options for new features to create. Plus, you’ll need domain knowledge to add informative features instead of more noise.

This is a skill that you’ll develop with time and practice, but heuristics will give you a head start. Heuristics help you know where to start looking, spark ideas, and get unstuck.

Day 5/6: Data Visualizations & Graphing Libraries

In the Python world, there are multiple options for visualizing your data. Because of this variety, it can be really challenging to figure out which one to use when. This section covers some of the more popular ones and illustrates how to use them to create a simple charts.

Our goal for this section is for you to be comfortable identifying which graph would best represent a given dataset as well as communicate insights and tell stories using data visualizations.

Day 7/8: Final Project

Final project where students are given a real world dataset. Students are expected to go through each of the steps covered on their own. On the final day, each student will present what they found in the data to the rest of the class.

This program was inspired by data science primers and blogs found on the internet.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Day_01		Day_01
Day_02		Day_02
Day_03		Day_03
Day_04		Day_04
assets/images		assets/images
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploratory Data Analysis ( EDA ) - DS201 Fall 2019

Day 1: Overview

Day 2: Introduction to Statistics

Day 3: Data Cleaning

Day 4: Feature Engineering

Day 5/6: Data Visualizations & Graphing Libraries

Day 7/8: Final Project

About

Releases

Packages

Contributors 2

Languages

SoftStackFactory/data-science-fall2019

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis ( EDA ) - DS201 Fall 2019

Day 1: Overview

Day 2: Introduction to Statistics

Day 3: Data Cleaning

Day 4: Feature Engineering

Day 5/6: Data Visualizations & Graphing Libraries

Day 7/8: Final Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages