Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5x Error Reduction in RAG with gpt-3.5-turbo-0613 Finetuning #678

Merged
merged 38 commits into from
Sep 12, 2023

Conversation

NirantK
Copy link
Contributor

@NirantK NirantK commented Sep 4, 2023

Outline

  1. Data Preparation: We use a subset of the SQuADv2 and get answers using OpenAI's GPT3.5-Turbo model. This serves as our baseline for performance comparison.

  2. Evaluation Metrics: Wrote an Evaluation class to assess the performance of the initial RAG model on our dataset. This sets the stage for the fine-tuning process by providing a quantitative measure of the model's initial capabilities.

  3. Fine-Tuning Setup: We convert the dataset into a JSONL format that's compatible with OpenAI's fine-tuning process and create a fine-tuning job, targeting improvements in the model's answer-generating capabilities.

Performance Comparison: After fine-tuning, we run the model on the same dataset and use the Evaluator again to quantify the improvements gained from fine-tuning. We see an error reduction from ~50% questions to ~10% questions

Documentation: The entire process is commented, aimed at aiding anyone looking to fine-tune OpenAI's RAG models.

image

* feat(fine-tuned-RAG): add DatasetPrep.ipynb file for dataset preparation
* feat(DatasetPrep.ipynb): add code for downloading validation.json file
* docs(ModelFinetune.ipynb): add comments to code
…Qdrant to improve RAG model

* feat(fine-tuned-RAG): use Qdrant for finetuning and inference changes
* fix(ModelFinetune.ipynb): update generated answer for few-shot question
* fix(ModelFinetune.ipynb): update count percentages for different answer types
* feat(ModelFinetune.ipynb): add new sections for Qdrant integration and Few-Shot Learning
* fix(ModelFinetune.ipynb): fix typo in section title
… notebook

* fix(ModelFinetune.ipynb): fix heading for the setting up section
* fix(ModelFinetune.ipynb): fix heading for the data preparation section
@NirantK NirantK marked this pull request as ready for review September 7, 2023 13:11
Copy link
Collaborator

@colin-openai colin-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this cookbook, its going to be a great addition. However, the messages are not super clear when it gets to the evaluation, especially the bottom "Comparison & Results" section, and the method you're using in the Few-Shot Learning section is similarly unclear.

Can you clear these up and ask for another review? The other change is a housekeeping one, can you please move this notebook into the existing "fine-tuned-qa" directory in the parent "examples" folder?

examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
examples/fine-tuned-RAG/ModelFinetune.ipynb Outdated Show resolved Hide resolved
@NirantK
Copy link
Contributor Author

NirantK commented Sep 11, 2023

Revised image, simpler to understand!

image

…tent

* docs(ft_retrieval_augmented_generation.ipynb): add instructions and insights to the results breakdown
… and update section numbering

* feat(ft_retrieval_augmented_generation.ipynb): add new section for evaluation
* fix(ft_retrieval_augmented_generation.ipynb): fix section numbering and update section
…ft_retrieval_augmented_generation_qdrant.ipynb
…all command

* feat(ft_retrieval_augmented_generation_qdrant.ipynb): add cell to set OpenAI and Qdrant keys
Copy link
Collaborator

@colin-openai colin-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the updated version, clearer what the trade-offs are between each approach and how you can optimize them. I have some remaining non-blocking comments which I'll raise via a separate PR.

Happy to merge this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants