Introducing EPFLearn, a chatbot designed to answer students’ questions on specialized course material. The chatbot leverages Google’s pre-trained T5 Transformer as its foundation model, which was further fine-tuned using the StackOverflow and NLP4Education datasets.
Team: Antoine Bonnet, Silvia Romanato and Alexander Sternfeld
To run the code, you first need to install the requirements in the requirements.txt file.
pip install -r requirements.txt
To use our trained chatbot, you will need to download the checkpoint from here and place it in the checkpoints/finalChatbot
folder. You can then instantiate the chatbot and ask it some questions as follows.
checkpoint = 'path/to/finalChatbotDirectory'
chatbot = Chatbot(checkpoint)
chatbot.ask('What is the difference between genetics and epigenetics?')
Here is a list of relevant scripts used in this project.
- final_report_syntax_sorcerers.pdf : Final report of the project.
- gen_model: Generative model
- chatbot.py : Chatbot class for interaction
- gen_script_syntax_sorcerers.py : Generating answers with fine-tuned chatbot
- finetune.py : Fine-tuning a generative language model on specialized content
- load_data.py : Data pre-processing
- milestone3.ipynb : Overview
- reward_model: Reward model
- model.py : Model class
- evaluate.py : Evaluation script
- milestone2.ipynb : Overview
- m2_data_preparation.ipynb : Data pre-processing for reward model
- checkpoints : Model checkpoints
- finalChatbot : Final chatbot (fine-tuned on StackOverflow and EPFL datasets)
- midChatbot : Intermediary chatbot (fine-tuned on StackOverflow dataset)
- rewardModel : Reward model (trained on StackOverflow and EPFL datasets)
- data : Data files.
- gen_model : Fine-tuning data for the generative model
- answers_syntax-sorcerers.json : Sample answers generated by chatbot
- gen_dataset_syntax-sorcerers_EPFL.json : EPFL dataset
- gen_dataset_syntax-sorcerers_StackOverflow.json : StackOverflow dataset
- gen_dataset_syntax-sorcerers.json : Combined EPFL and StackOverflow datasets
- reward_model : Training data for the reward model
- reward_dataset_syntax-sorcerers_EPFL.json : EPFL dataset
- reward_dataset_syntax-sorcerers_StackOverflow.json : StackOverflow dataset
- reward_dataset_syntax-sorcerers.json : Combined EPFL and StackOverflow datasets
- gen_model : Fine-tuning data for the generative model
- requirements.txt : Packages required to run the code
- python.txt : Python version used