Skip to content

Latest commit

 

History

History
41 lines (34 loc) · 2.2 KB

README.md

File metadata and controls

41 lines (34 loc) · 2.2 KB

DAND Project 3 - Investigate a Dataset

Project Purpose and Notes

This project uses NumPy, Pandas, Matplotlib, and Jupyter Notebooks to analyze a dataset and communicate findings. A dataset curated by Udacity will be utilized.

This project was created and tested on Windows 7 64bit using Python 3.6.4 32bit, NumPy 1.13.3, Pandas 0.22.0, Matplotlib 2.1.1, Jupyter 1.0.0, IPykernel 4.8.2, IPython 6.2.1, Jupyter-client 5.2.2, Jupyter-core 4.4.0, IPywidgets 7.1.2, nbformat 4.4.0, traitlets 4.3.2, widgetsnbextension 3.1.4, notebook 5.4.0, Jupyter-console 5.2.0, nbconvert 5.3.1

Installation and Requirements

  • Install Python
    • Note 1: Due to the features used, Python v3.6 or later is required
  • Install NumPy, Pandas, Matplotlilb, and Jupyter Notebook
  • Download the Udacity curated TMDb movie data
  • Clone this repo
  • From the repo's Proj3 directory, run: jupyter notebook
  • From Jupyter, open Project 3 - Investigate TMDb.ipynb

Project Requirements

  • Using the selected dataset (TMDB movie data), perform an analysis using descriptive statistics
  • Choosen questions to explore:
    • Overall statistics from movie titles in dataset:
      • Most popular films?
      • Highest budget films?
      • Highest revenue films?
      • Highest margin films?
      • Most successful directors?
      • Most popular genres?
      • Popularity of genres over time?
      • Most successful production companies?
      • Most popular actors/actresses?
    • What kinds of properties are associated with high revenue movies? (dependent variable)
      • directors, genres, production companies, cast (features with multiple values - independent variables)
      • runtime, budget, release month (features with single value - independent variables)
  • Analysis will be performed and documented in a Jupyter Notebook

Project Solution Documents

License

MIT License