GitHub - espoirMur/congo-news-summarizer: A repository containing the code for the congo news summarizer

Congo News Summarizer

Over the past months, I have been collecting a lot of news articles from major Congolese news websites. I have those articles saved in a Postgres database. There is a lot of fun stuff I can do with them. Among them is a news summarizer. I want to analyze the daily news and find out what the websites are talking about.

In this project, I will try to build that news summarizer.

Architecture

The summarizer has four main components.

A new collector, cluster model, a Generative model and finally a front-end.

New Collector

This is build with scrappy scrappers, they scrape the news website and download the news data.

The cluster model

This is a machine learning model that pulls today news and runs a hierarchical clustering model on them. The output of this a a dataframe with news clusters. You can learn more on how I have implemented the clustering here.

Learn more on how to run Readme.md

A generative model.

This component, start from the output of the clustering model and build a summary for each new cluster.

It use a self hosted Language model to generate the summary of the news. I am working on a blog post that document all the process of building the generative model.

Technologies:

The Generative model is a Qwen1.5b model hosted using llama.cpp and run on ubuntu VPS for inference.

A front end

This is the final part, it will display the news summaries as a UI and user can interact with it.

You can read more about this project in this presentation I gave at PydataLondon Meetup.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
.vscode		.vscode
data		data
docker		docker
docs		docs
images		images
notebooks		notebooks
scripts		scripts
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
download_model.py		download_model.py
models		models
notes.md		notes.md
ruff.toml		ruff.toml
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Congo News Summarizer

Architecture

New Collector

The cluster model

A generative model.

Technologies:

A front end

About

Releases

Packages

Contributors 2

Languages

espoirMur/congo-news-summarizer

Folders and files

Latest commit

History

Repository files navigation

Congo News Summarizer

Architecture

New Collector

The cluster model

A generative model.

Technologies:

A front end

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages