I made this machine learning pipeline to show recruiters an example of my skills in data engineering and my style of writing and documenting code.
The pipeline does the following:
- Sends a request to the NewsAPI's /sources endpoint (https://newsapi.org/)
- Creates
sources.csv
from the API response - Sends a request to the NewsAPI's /everything endpoint
- For every publisher in sources.csv grabs every article written in the past 3 days
- Applies a sentiment analysis model to the article titles
- Performs some data analysis
- Generates a sql statement that can be used to upload this data to BigQuery for further analysis
Note: pipeline.ipynb
is meant to be run in google colab.
Enjoy!