-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5x Error Reduction in RAG with gpt-3.5-turbo-0613 Finetuning #678
Conversation
* feat(fine-tuned-RAG): add DatasetPrep.ipynb file for dataset preparation * feat(DatasetPrep.ipynb): add code for downloading validation.json file
* docs(ModelFinetune.ipynb): add comments to code
…Qdrant to improve RAG model * feat(fine-tuned-RAG): use Qdrant for finetuning and inference changes
* fix(ModelFinetune.ipynb): update generated answer for few-shot question * fix(ModelFinetune.ipynb): update count percentages for different answer types
* feat(ModelFinetune.ipynb): add new sections for Qdrant integration and Few-Shot Learning * fix(ModelFinetune.ipynb): fix typo in section title
… notebook * fix(ModelFinetune.ipynb): fix heading for the setting up section * fix(ModelFinetune.ipynb): fix heading for the data preparation section
…to the blog post * feat(ModelFinetune.ipynb): add section on why to read the blog post
…n section * feat(ModelFinetune.ipynb): add information about reducing hallucinations in the introduction section
…f the notebook and target audience
…ss for fine-tuning the OpenAI model
…valuation section
…ne-tuning chat models
…nd add more details in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this cookbook, its going to be a great addition. However, the messages are not super clear when it gets to the evaluation, especially the bottom "Comparison & Results" section, and the method you're using in the Few-Shot Learning section is similarly unclear.
Can you clear these up and ask for another review? The other change is a housekeeping one, can you please move this notebook into the existing "fine-tuned-qa" directory in the parent "examples" folder?
…ding for plotting the results
…tent * docs(ft_retrieval_augmented_generation.ipynb): add instructions and insights to the results breakdown
… and update section numbering * feat(ft_retrieval_augmented_generation.ipynb): add new section for evaluation * fix(ft_retrieval_augmented_generation.ipynb): fix section numbering and update section
…ft_retrieval_augmented_generation_qdrant.ipynb
…all command * feat(ft_retrieval_augmented_generation_qdrant.ipynb): add cell to set OpenAI and Qdrant keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the updated version, clearer what the trade-offs are between each approach and how you can optimize them. I have some remaining non-blocking comments which I'll raise via a separate PR.
Happy to merge this one.
Outline
Data Preparation: We use a subset of the SQuADv2 and get answers using OpenAI's GPT3.5-Turbo model. This serves as our baseline for performance comparison.
Evaluation Metrics: Wrote an Evaluation class to assess the performance of the initial RAG model on our dataset. This sets the stage for the fine-tuning process by providing a quantitative measure of the model's initial capabilities.
Fine-Tuning Setup: We convert the dataset into a JSONL format that's compatible with OpenAI's fine-tuning process and create a fine-tuning job, targeting improvements in the model's answer-generating capabilities.
Performance Comparison: After fine-tuning, we run the model on the same dataset and use the Evaluator again to quantify the improvements gained from fine-tuning. We see an error reduction from ~50% questions to ~10% questions
Documentation: The entire process is commented, aimed at aiding anyone looking to fine-tune OpenAI's RAG models.