Skip to content

mdgjohnny/dio-etl-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movie Synopsis Generator ETL Project

This project is a Python-based ETL (Extract, Transform, Load) application that retrieves data from an IMDb user list and generates new movie synopses using the OpenAI API. The generated synopses blend elements from existing synopses to create captivating and unique summaries for movies.

Special Thanks: This project is made possible thanks to the individual who created and shared the original IMDb user list titled "Kim Newman's Nightmare Movies," which is based on the book by Kim Newman. We also extend our gratitude to Kim Newman for his great work. You can find the original list here.

Directory Structure

The project directory is structured as follows:

├── data/
│   └── Kim Newman's Nightmare Movies.csv   # IMDb user list data
├── src/
│   ├── __init__.py
│   ├── gpt.py          # Main script to generate new synopses
│   └── utils.py        # Utility functions for data retrieval and generation
├── tests/
│   ├── __init__.py
│   ├── conftest.py     # Configuration for test fixtures
│   └── test_utils.py   # Test functions for utility functions
├── .gitignore          # Specifies files/directories to ignore in version control
├── example.env         # Example environment variables
└── README.md           # Project documentation

Running Tests To run the tests, navigate to the root directory of the project and execute the following command:

pytest

Loading OpenAI API Key Create a file named .env in the root directory.

Add your OpenAI API key to the .env file in the following format:

OPENAI_KEY=your_api_key_here

Replace your_api_key_here with your actual OpenAI API key.

Save the .env file and add it to the .gitignore before uploading it to the repository. In case you have doubts, just look at the example.env file.

Usage The primary script for generating new synopses is gpt.py. It reads IMDb user list data, extracts information, and generates new synopses for movies based on the provided inspiration synopses. You can run the script as follows:

python src/gpt.py --num num_films --verbose True

Replace num_synopses with the number of films you want to generate. Use the --verbose flag to display existing synopses when generating new ones. Each film is generated from a combination of two existing synopses.

Note

This project was developed as part of a course on DIO.ME. It aims to demonstrate the ETL process using real-world data and the OpenAI API for creative text generation.

Feel free to modify the README to provide more details or customize it according to your project's requirements.

License

MIT

About

Simple ETL Project for DIO.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages