The goal of our project is to automate and extend the age rating of movies based on the script of the movie. The criteria for our evaluation will be primarily based on the official rating rules used by the Motion Picture Association.
- Jakob Hennighausen
- Email: [email protected]
- Davit Melkonyan
- Email: [email protected]
- Leon Remke
- Email: [email protected]
- Elastic search
- React
- Fastapi
- Spacy
- Sklearn Check requirements.txt in sub-modules of project for detailed information.
See "Project log" section.
One of our high level milestones for November could not be achivet yet. Find a detailed list of goals achived and not achived.
- Basic project setup
- Setup elastic search hosting on linux server
- Data crawling
- Data understanding
- Data analysis
- Start of data preprocessing
- Get baseline for PG-Ratings
- Data preprocessing pipeline
- Decision which statistical method to use for age ratings
- First PG ratings on films
Currently, the project is behind schedule with respect to the initial milestone plan. This should be made up by the lecture break from 22.12.22 to 07.01.23. One reason for the delay is the change in data sourcing. The operator of a platform for film scripts had unexpectedly stopped responding.
todo: add descriptions to preprocessing steps and knowledge from data analysis
First experiment to find the official age ratings of movies through a TMDB API. For this we wrote a python script and used a selection of movie titles to get the age ratings. Results can be found here
Data analysis, explocation and description. Results can be found in next section.
This section can be found in this jupyter-notebook:
- POC dataset crawling
- Linux server for elasticsearch instance
- Data understanding, exploration and preprocessing
- Labels scraping
- Model fine tuning
- Random Forest Implementation
- Research papers and materials
- Research for baseline dataset
- Implementation of finding age ratings by movie titles with TMDB API
- Implementation of RNN based on paper
- Preprocessing
- Creation of training and test datasets
- Setup git project (react, fastapi, elasticsearch, dockerfiles)
- Setup git-hooks integration
- Run elasticsearch instance on linux server
- Frontend development
- Fastapi development
- SVM Classifier
- Elasticsearch setup and upload
Find a raw overview how the project is structured:
Documentation files
Training data and model results
Data exploration notebook
- Notebooks for scraping and collecting relevant data (e.g. age ratings)
- Elasticsearch upload script
- Notebooks to preprocess data
- Nginx config
- Documentation
- Elasticsearch was set up on our server. If you want to access the elasticsearch instance, please contact us.
- API project
- SVM and Random Forest model implementation
- Frontend project
- Model from referenced paper
Presentation can be found in the assets folder.