Python Machine Learning for Data Analytics An Independent Report on the 'Census Income' Dataset
My task as the data analyst on the project was to take the Census Income dataset and gain insight on its characteristics and patterns using Python to cluster data and find any patterns in high- and low-income records. Another goal was to predict whether income exceeds $50,000 per annum (p/a) using classification. As the data set is quite large it will requiring mining and for this I used the CRISP-DM methodology as it helps provide a structured approach when planning a data mining project (Simmons, 2014). The CRISP-DM was applied to achieve my objective to first properly outline the data mining goal before proceeding to the data understanding and other stages of the methodology, which is addressed in further detail in the attached report.