EPFLearn: AI Student Mentor 📚

Introducing EPFLearn, a chatbot designed to answer students’ questions on specialized course material. The chatbot leverages Google’s pre-trained T5 Transformer as its foundation model, which was further fine-tuned using the StackOverflow and NLP4Education datasets.

Modern NLP: Course project Milestone 3

Team: Antoine Bonnet, Silvia Romanato and Alexander Sternfeld

How to run the code

To run the code, you first need to install the requirements in the requirements.txt file.

pip install -r requirements.txt

To use our trained chatbot, you will need to download the checkpoint from here and place it in the checkpoints/finalChatbot folder. You can then instantiate the chatbot and ask it some questions as follows.

checkpoint = 'path/to/finalChatbotDirectory'
chatbot = Chatbot(checkpoint)

chatbot.ask('What is the difference between genetics and epigenetics?')

Repository structure

Here is a list of relevant scripts used in this project.

final_report_syntax_sorcerers.pdf : Final report of the project.
gen_model: Generative model
- chatbot.py : Chatbot class for interaction
- gen_script_syntax_sorcerers.py : Generating answers with fine-tuned chatbot
- finetune.py : Fine-tuning a generative language model on specialized content
- load_data.py : Data pre-processing
- milestone3.ipynb : Overview
reward_model: Reward model
- model.py : Model class
- evaluate.py : Evaluation script
- milestone2.ipynb : Overview
- m2_data_preparation.ipynb : Data pre-processing for reward model
checkpoints : Model checkpoints
- finalChatbot : Final chatbot (fine-tuned on StackOverflow and EPFL datasets)
- midChatbot : Intermediary chatbot (fine-tuned on StackOverflow dataset)
- rewardModel : Reward model (trained on StackOverflow and EPFL datasets)
data : Data files.
- gen_model : Fine-tuning data for the generative model
  - answers_syntax-sorcerers.json : Sample answers generated by chatbot
  - gen_dataset_syntax-sorcerers_EPFL.json : EPFL dataset
  - gen_dataset_syntax-sorcerers_StackOverflow.json : StackOverflow dataset
  - gen_dataset_syntax-sorcerers.json : Combined EPFL and StackOverflow datasets
- reward_model : Training data for the reward model
  - reward_dataset_syntax-sorcerers_EPFL.json : EPFL dataset
  - reward_dataset_syntax-sorcerers_StackOverflow.json : StackOverflow dataset
  - reward_dataset_syntax-sorcerers.json : Combined EPFL and StackOverflow datasets
requirements.txt : Packages required to run the code
python.txt : Python version used

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EPFLearn: AI Student Mentor 📚

Modern NLP: Course project Milestone 3

How to run the code

Repository structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
checkpoints		checkpoints
data		data
gen_model		gen_model
instructions		instructions
reward_model		reward_model
submission		submission
README.md		README.md
final_report_syntax_sorcerers.pdf		final_report_syntax_sorcerers.pdf
python.txt		python.txt
requirements.txt		requirements.txt

AGBonnet/EPFLearn

Folders and files

Latest commit

History

Repository files navigation

EPFLearn: AI Student Mentor 📚

Modern NLP: Course project Milestone 3

How to run the code

Repository structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages