Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValidationError #1545

Open
amin-kh96 opened this issue Oct 21, 2024 · 15 comments
Open

ValidationError #1545

amin-kh96 opened this issue Oct 21, 2024 · 15 comments
Labels
bug Something isn't working question Further information is requested

Comments

@amin-kh96
Copy link

amin-kh96 commented Oct 21, 2024

i saw this error :

value is not a valid list (type=type_error.list))
Evaluating:  33%|█████████████████████████████▎                                                          | 1/3 [01:37<03:14, 97.11s/it]Exception raised in Job[0]: ValidationError(1 validation error for LLMResult
generations -> 0
  value is not a valid list (type=type_error.list))
Evaluating:  67%|██████████████████████████████████████████████████████████▋                             | 2/3 [02:20<01:05, 65.71s/it]Exception raised in Job[1]: ValidationError(1 validation error for LLMResult
generations -> 0
  value is not a valid list (type=type_error.list))
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [02:37<00:00, 52.47s/it] 
RAGAS Evaluation Report:
{'context_utilization': nan}
PS C:\Users\Amin\git\ragas-prototype> 

I think the cause of the error might be some where here:

# Define the custom LLM class
class CustomRagasLLM(BaseRagasLLM):
    def __init__(self, api_key: str = None):
        """
        Initialize the custom LLM, optionally using an API key if necessary.
        """
        self.api_key = api_key

    def _call(self, prompt: str) -> str:
        """
        Process the prompt and return a result. This can be customized to
        use a local model or perform any required logic.
        """
        if not self.api_key:
            return f"Processed: {prompt} (without API key)"
        else:
            # Handle LLM response if using an API
            return f"Processed: {prompt} (with API key: {self.api_key})"
       
    def generate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.List = []
    ) -> LLMResult:
        # Synchronous generation logic
        text = self._call(prompt)
        return LLMResult(generations=[Generation(text=text)])
    async def agenerate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.List = []
    ) -> LLMResult:
        """
        Asynchronous method to generate text. This should allow for async processing.
        """
        # Simulate an asynchronous call, here we directly call the sync method for now
        text = self._call(prompt)
        return LLMResult(generations=[Generation(text=text)])     
    
    
custom_llm =CustomRagasLLM(api_key=None) 

this is code on the first line maybe it helps for better understanding:

#creating a dataset of str type
new_data_set = []
question=[]
context=[]
answer=[]
 #extarcting str data of 'question' and 'answer'
for item in llm:
    if item['role'] == 'user':
       for c in item['content']:
           question.append(c['text'])
    else:
        for c in item['content']:
            answer.append(c['text'])

# Iterate through each dictionary in your data
for item in ground_truth_data:
    # Check if 'content' key exists in the dictionary
    if 'content' in item:
        # Access the value of the 'content' key and append it to the context list
        context.append(item['content'])
    else:
        print(f"'content' key not found in item with id: {item.get('id')}")

# Check the length of context to see if anything was appended
print(f"Number of context entries extracted: {len(context)}")

# Initialize empty lists for dataset
new_ragas_dataset = {
    "question": [],
    "contexts": [],
    "answer": []
}

# Assuming question, context, and answer lists are already available
for i in range(len(question)):
    new_ragas_dataset['question'].append(question[i])
    
    # For now, we assign all the chunks (contexts) to each question
    new_ragas_dataset['contexts'].append(context)  # context is a list of chunks
    
    # Assign the corresponding answer
    new_ragas_dataset['answer'].append(answer[i])
@amin-kh96 amin-kh96 added the question Further information is requested label Oct 21, 2024
Copy link

dosubot bot commented Oct 21, 2024

The error you're encountering, ValidationError: value is not a valid list (type=type_error.list), suggests that the generations field in the LLMResult object is not being structured as expected. In the Ragas framework, the generations attribute should be a list of lists, where each inner list contains Generation objects, and each Generation object has a text attribute [1].

In your CustomRagasLLM class, the generate_text and agenerate_text methods are returning LLMResult objects with generations structured as a single list of Generation objects. Instead, you should structure it as a list of lists. Here's how you can modify your methods:

def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = self._call(prompt)
    return LLMResult(generations=[[Generation(text=text)]])

async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Simulate an asynchronous call, here we directly call the sync method for now
    text = self._call(prompt)
    return LLMResult(generations=[[Generation(text=text)]])

By ensuring that generations is a list of lists, you should resolve the validation error.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the bug Something isn't working label Oct 21, 2024
@amin-kh96
Copy link
Author

@dosu
what should I do with this one?
Evaluating: 0%| | 0/3 [00:00<?, ?it/s]Exception raised in Job[2]: ValidationError(1 validation error for LLMResult
generations -> 0
value is not a valid list (type=type_error.list))
Evaluating: 33%|█████████████████████████████▎ | 1/3 [01:37<03:14, 97.11s/it]Exception raised in Job[0]: ValidationError(1 validation error fo value is not a valid list (type=type_error.list))
Evaluating: 33%|█████████████████████████████▎ | 1/3 [01:37<03:14, 97.11s/it]Exception raised in Job[0]: ValidationError(1 validation error for LLMResult
generations -> 0
value is not a valid list (type=type_error.list))
Evaluating: 67%|██████████████████████████████████████████████████████████▋ | 2/3 [02:20<01:05, 65.71s/it]Exception raised in Job[1]: ValidationError(1 validation error for LLMResult
generations -> 0
value is not a valid list (type=type_error.list))
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [02:37<00:00, 52.47s/it]
RAGAS Evaluation Report:
{'context_utilization': nan}
PS C:\Users\Amin\git\ragas-prototype> ^C
PS C:\Users\Amin\git\ragas-prototype> c:; cd 'c:\Users\Amin\git\ragas-prototype'; & 'c:\Users\Amin\AppData\Local\Microsoft\WindowsApps\python3.12.exe' 'c:\Users\Amin.vscode\extensions\ms-python.debugpy-2024.12.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher' '62754' '--' 'c:\Users\Amin\git\ragas-prototype\src\rag_eva.py'
Number of context entries extracted: 665
C:\Users\Amin\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
C:\Users\Amin\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
Evaluating: 0%| | 0/3 [00:00<?, ?it/s]Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.

Copy link

dosubot bot commented Oct 21, 2024

To resolve the ValidationError indicating that the generations field in LLMResult is not a valid list, ensure that the generations field is structured as a list of lists, where each inner list contains Generation objects. Update your generate_text and agenerate_text methods to return LLMResult with generations structured correctly:

def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    text = self._call(prompt)
    return LLMResult(generations=[[Generation(text=text)]])

async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    text = self._call(prompt)
    return LLMResult(generations=[[Generation(text=text)]])

Regarding the FutureWarning related to the clean_up_tokenization_spaces parameter in the Hugging Face Transformers library, you should explicitly set this parameter in your code to avoid unexpected behavior in future versions. If you are not using this parameter, you can ignore the warning for now, but be prepared to update your code when the library changes [1].

To continue talking to Dosu, mention @dosu.

@amin-kh96
Copy link
Author

@dosu.
the code that you gave me is as same as my code. again I see this error:
warnings.warn(
Evaluating: 0%| | 0/3 [00:00<?, ?it/s]Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.

Copy link

dosubot bot commented Oct 21, 2024

The "Failed to parse output" error suggests that the output parser is unable to process the generated text correctly. This can happen if the output does not meet the expected format or constraints defined in the prompt. Here are a few steps you can take to troubleshoot and resolve this issue:

  1. Check Output Format: Ensure that the text returned by your _call method matches the expected format. The Ragas framework might have specific requirements for the output structure, so verify that the generated text adheres to these constraints.

  2. Debugging: Enable debug logging to get more insights into what might be going wrong. You can use the patch_logger function to enable detailed logging for the module handling the output parsing. This might help you identify where the parsing is failing.

  3. Output Constraints: Review any constraints or requirements specified in the prompt or the framework's documentation. If the output does not satisfy these constraints, the framework might attempt to fix it using a prompt like FixOutputFormat. Ensure that your output is compatible with these constraints.

  4. Retries and Exceptions: The framework might be attempting to parse the output multiple times before raising an exception. Check if there are any retry mechanisms in place and if they are being triggered. This could provide clues about what is causing the parsing to fail.

By addressing these areas, you should be able to identify and fix the issue causing the "Failed to parse output" error [1][2].

To continue talking to Dosu, mention @dosu.

@jjmachan
Copy link
Member

@amin-kh96 did you solve this?
which models are you using?

@amin-kh96
Copy link
Author

amin-kh96 commented Oct 22, 2024

@jjmachan
No I still have problem with it.
model_name = 'distilbert-base-uncased'
maybe this will be helpful. I am using BaseRagasLLM. and also in my dataset I do not have the ground_truth.
so I do not think so it generate anything my subclass. I do not know if what I said is related or not.

@amin-kh96
Copy link
Author

@dosu.
what should I do with this error?
evaluating: 0%| | 0/3 [00:00<?, ?it/s]Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Failed to parse output. Returning None.
PS C:\Users\Amin\git\ragas-prototype>

Copy link

dosubot bot commented Oct 23, 2024

The "Failed to parse output" error you're encountering might be related to the absence of ground truth in your dataset, which can affect the output generation and parsing process. Here are some steps to address this issue:

  1. Ensure Correct Output Structure: The generate_text and agenerate_text methods in your CustomRagasLLM class should return an LLMResult with generations structured as a list of lists, even if no ground truth is provided. This structure is crucial for the framework to parse the output correctly.

  2. Handle Missing Ground Truth: Since your dataset lacks ground truth, ensure that your model or logic can handle this scenario gracefully. You might need to adjust your prompt or logic to generate meaningful outputs even without ground truth.

  3. Debugging and Logging: Use the patch_logger function to enable detailed logging for the module handling the output parsing. This can provide insights into why the parsing is failing. Additionally, check if the RAGAS_DEBUG environment variable is set to enable debug mode, which can help diagnose the issue.

  4. Review Model Integration: Since you're using the distilbert-base-uncased model with BaseRagasLLM, ensure that the integration is correct and that the model is capable of generating outputs in the expected format. Verify that the model's outputs are compatible with the Ragas framework's requirements.

By following these steps, you should be able to identify and resolve the issue causing the "Failed to parse output" error. If the problem persists, consider reviewing the specific requirements and constraints of the Ragas framework to ensure full compatibility.

To continue talking to Dosu, mention @dosu.

@amin-kh96
Copy link
Author

@dosu.
but the metric that I am using is context_utilization. I do not think so it needs the ground truth! tell me if it needs it.
context_utilization

Copy link

dosubot bot commented Oct 23, 2024

The context_utilization metric typically evaluates how well the context is used in generating responses, and it might not require explicit ground truth answers. However, the framework might still expect a certain structure or format in the dataset or outputs to compute this metric correctly.

Here are a few things to consider:

  1. Dataset Structure: Ensure that your dataset is structured correctly, even if it doesn't include ground truth answers. The context should be clearly defined and associated with the prompts.

  2. Output Format: Verify that the outputs generated by your model are in the expected format for the context_utilization metric. This might involve ensuring that the context is being referenced or utilized in the generated text.

  3. Framework Requirements: Double-check the documentation or source code of the Ragas framework to understand any specific requirements or assumptions it makes about the data or outputs when calculating context_utilization.

  4. Debugging: Use logging to gain insights into how the metric is being calculated and where it might be failing. This can help identify if there's a mismatch between the expected and actual data formats.

If the issue persists, consider reaching out to the maintainers or community for more specific guidance on using the context_utilization metric without ground truth data.

To continue talking to Dosu, mention @dosu.

@amin-kh96
Copy link
Author

@dosu.
what is the ground truth?
how to put in my dataset?
which metrics does not need ground truth?

Copy link

dosubot bot commented Oct 23, 2024

In the Ragas framework, 'ground truth' refers to the expected or correct answers used to evaluate the correctness of generated responses. It is used to classify answer statements into true positives, false positives, and false negatives, which helps in assessing the accuracy of the model's outputs [1].

To include ground truth in your dataset, you should structure your data as follows:

  • For SingleTurnSample:

    • user_input: The user's input.
    • retrieved_contexts: Contexts retrieved for the input.
    • reference_contexts: Reference contexts for the input.
    • response: The system's generated response.
    • multi_responses: Multiple responses generated for the query.
    • reference: The ground truth or expected answer.
    • rubric: Evaluation rubric for the sample.
  • For MultiTurnSample:

    • user_input: A list of messages representing the conversation turns.
    • reference: The reference answer or expected outcome for the conversation.
    • reference_tool_calls: Expected tool calls for the conversation.
    • rubrics: Evaluation rubrics for the conversation.
    • reference_topics: Reference topics for the conversation.

Here's an example of creating a SingleTurnSample with ground truth:

sample1 = SingleTurnSample(
    user_input="What is the capital of Germany?",
    retrieved_contexts=["Berlin is the capital and largest city of Germany."],
    reference_contexts=["Berlin is the capital of Germany."],
    response="The capital of Germany is Berlin.",
    multi_responses=["Berlin", "The capital is Berlin."],
    reference="Berlin",
    rubric={"accuracy": "high"}
)

Regarding metrics that do not require ground truth, the default metrics provided by the Ragas framework include answer_relevancy, context_precision, faithfulness, and context_recall. Among these, context_precision and context_recall do not require ground truth data [2].

To continue talking to Dosu, mention @dosu.

@amin-kh96
Copy link
Author

@dosu.
how can I get this ground truth when I am using it in a special use case. for example medical machines.
these information does not exist in the internet.

Copy link

dosubot bot commented Oct 23, 2024

When dealing with specialized use cases like medical machines, where ground truth information might not be readily available on the internet, you can consider the following approaches to obtain or create ground truth data:

  1. Domain Expert Collaboration: Work with domain experts who have the necessary knowledge to provide accurate answers. They can help create a dataset of questions and corresponding correct answers based on their expertise.

  2. Internal Documentation: Utilize internal documentation, manuals, or proprietary databases that contain verified information about the medical machines. This can serve as a source for ground truth data.

  3. Simulated Data: In some cases, you can simulate scenarios or use synthetic data generation techniques to create plausible ground truth data. This is particularly useful when real-world data is scarce or sensitive.

  4. Crowdsourcing: If feasible, consider crowdsourcing the creation of ground truth data by engaging a group of individuals with relevant knowledge to validate or generate answers.

  5. Historical Data: Use historical data from past interactions or logs that have been verified for accuracy. This can be a valuable source of ground truth if such data is available.

  6. Research Publications: Refer to academic papers, research publications, or industry reports that provide validated information relevant to your use case.

By employing these strategies, you can build a reliable ground truth dataset tailored to your specialized domain, which will help in evaluating and improving your RAG pipeline effectively.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants