![image](https://private-user-images.githubusercontent.com/112084296/256569859-23766e0c-a39c-4139-ad78-a7c9ad2420cf.jpeg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MzcxNDEsIm5iZiI6MTczODkzNjg0MSwicGF0aCI6Ii8xMTIwODQyOTYvMjU2NTY5ODU5LTIzNzY2ZTBjLWEzOWMtNDEzOS1hZDc4LWE3YzlhZDI0MjBjZi5qcGVnP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDdUMTQwMDQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MWJhNWExMGMxMzAxZTk0OWYyY2IyY2MyOTFkYmZlMmNkZGEwY2I4ZjM0N2ZlNGU2NDg5MTNhOTY1NmZjNzAyOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.cxwnBMa-eWGtuUGq_nT80N1cABDhKtErVSx5agD2XF4)
For a detailed overview of RAG evaluation package refer to the Medium Blog.
The Eval_Package is a tool designed to evaluate the performance of the LLM (Language Model) on a dataset containing questions, context, and ideal answers. It allows you to run evaluations on various datasets and assess how well the Model generates the answer.
- Evaluate LLM Model on custom datasets: Use the Eval_Package to assess the performance of your Model on datasets of your choice.
- Measure model accuracy: The package provides metrics to gauge the accuracy of the model-generated answers against the ideal answers.
The MLflow_Package is a comprehensive toolkit designed to integrate the results from the Eval_Package and efficiently track and manage experiments. It also enables you to create a leaderboard for evaluation comparisons and visualize metrics through a dashboard.
- Experiment tracking: Use MLflow to keep a record of experiments, including parameters, metrics, and model artifacts generated during evaluations.
- Leaderboard creation: The package allows you to create a leaderboard, making it easy to compare the performance of different Models across multiple datasets.
- Metric visualization: Generate insightful charts and graphs through the dashboard, allowing you to visualize and analyze evaluation metrics easily.
-
I. LLM Eval Toolkit: Toolkit for evaluating RAG pipeline
- Eval_Package: A package to automatically evaluate the LLM based RAG pipeline
- mlflow_package: A package to automatically add the evaluation results to the
-
II. MLFLOW Integration: MLFlOW evaluation dashboard scripts
- Notebook: Evaluation Notebook
- Output_CSV: Output CSV for LLM Models Evaluation
- Result: Result file and png of MLFLOW
- mlruns: MLFLOW metadata