Recommendation System using LDA

A Topic Modelling model is created using LDA from gensim library. LDA can be understood from this youtube video.

Clone this repository on your laptop or download files by clicking here.

Making Basic Enlgish Model

A model for Topic Modelling using LDA is made by using the gensim library.

What it can be used for?

This Topic Modelling Model can be used for any English Database. To see an example of how it can be used for Movie Recommendation, check out this repository.

Dataset

For the model, Wikipedia dump has been used as the Dataset, which has over 4 million articles in English. The dataset can be found here. The dataset size is 16.2 GB.

Requirements

Written in requirements.txt. Using a virtual environment is recommended.

pip install -r requirements.txt

Preprocessing Dataset and making gensim corpus

The code for tpreprocessing dataset is written in create_wiki_corpus.py.
Note: This process will take around 10 hours to complete. Output file is a gensim corpus of size 34.6 GB, so it's not uploaded.

Training the Model

The code to train the model is written in the script train_lda_model.py.
The model has been trained via unsupervised learning on the complete dataset of all Wikipedia English articles. The number of topics trained on the model is 130.
Note: This process will take around 6 hours to complete. The model files have already been saved here in the Models folder.

Checking the model

The code for checking the topics inside the model can be found in show_model_topics.py.
Run the code to see the topics. The topics have a number id. It can be seen that the words in the topics have similaritites among them.
Model can be improved by tweaking the number of topics. This strictly depends on usage.

python load_model.py

This will return list of topics the model has made.

License

GNU GENERAL PUBLIC LICENSE Version 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommendation System using LDA

Making Basic Enlgish Model

What it can be used for?

Dataset

Requirements

Preprocessing Dataset and making gensim corpus

Training the Model

Checking the model

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.idea		.idea
Models		Models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_wiki_corpus.py		create_wiki_corpus.py
requirements.txt		requirements.txt
show_model_topics.py		show_model_topics.py
train_lda_model.py		train_lda_model.py

License

arnav-deep/EnglishTopicModel

Folders and files

Latest commit

History

Repository files navigation

Recommendation System using LDA

Making Basic Enlgish Model

What it can be used for?

Dataset

Requirements

Preprocessing Dataset and making gensim corpus

Training the Model

Checking the model

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages