Skip to content

This is a repository containing a wealth of Data Cleaning methodologies

Notifications You must be signed in to change notification settings

EliasNo/Data-Cleaning-1

 
 

Repository files navigation

Data-Cleaning

This is a repository containing a wealth of Data Cleaning methodologies

Overview

Are you looking to improve your data cleaning skills? This is a project designed to help you master Data Cleaning.

According to a poll, data science professionals say, 80% of their time is spent on data cleaning. There is no one-size-fits-all in cleaning, however, practicing with as many datasets as you can find really sets you up in the right direction.

Of course, data comes in different formats. If you however, practice with as many as possible, you expose yourself to a wide range of manipulation techniques. Learning all the possible tehniques available helps set you with the right ability to deal with any dataset.

Datasets Scripts

Some of the datasets are excel sheets containing the cleaned version and the dirty version. The scripts to clean the data are available in Pyhon and R. If you want to clean the datasets using other languages feel free to do that.

Pull Request

We aim to populate this repository with as many cleaning projects as possible. If you have datasets you have previously cleaned, you're welcome to send a pull request. But ensure the code works and is well documented. A PR of the dataset and the script should be sent and it would merged once properly reviewed. You will be added to the contributors list once your PR has been merged.

Contributors List

About

This is a repository containing a wealth of Data Cleaning methodologies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.8%
  • R 2.9%
  • Python 2.3%