[BUG] Evaluate on test dataset using evaluate() with SimilarityEvaluator returns NaN #3381

bhonris · 2024-06-06T17:48:03Z

Describe the bug
When running the evaluation a dataset using evaluate() using the similarity evaluator I have come across some scenarios where the result is not a number.
How To Reproduce the bug
Model config
{azure_deployment= "gpt4-turbo-preview", api_version="2024-02-01"}
jsonl file
{"Question":"How can you get the version of the Kubernetes cluster?","Answer":"{\"code\": \"kubectl version\" }","output":"{code: kubectl version --output=json}"}
Evaluate Config

result = evaluate(
    data="testdata2.jsonl",
    evaluators={
        "similarity": SimilarityEvaluator(model_config)
    },
    evaluator_config={
        "default": {
            "question": "${data.Question}",
            "answer": "${data.output}",
            "ground_truth": "${data.Answer}"
        }
    }
)

Expected behavior
Value returned is number

Running Information(please complete the following information):

Promptflow Package Version using pf -v:

{
 "promptflow": "1.1.1",
 "promptflow-azure": "1.11.0",
 "promptflow-core": "1.11.0",
 "promptflow-devkit": "1.11.0",
 "promptflow-evals": "0.3.0",
 "promptflow-tracing": "1.11.0"
}

Operating System: Windows 11
Python Version using python --version: 3.10.11

Additional context

Checking the actual logged value in _similarity.py suggests the actual returned value is the string 'The'.
I notice that this issue usually occurs when the answer does not match what the LLM response based on the question would be. For example, {Question: What is the capital of France?, Answer: Washington DC, }

The text was updated successfully, but these errors were encountered:

bhonris · 2024-06-06T18:12:08Z

I have added to similarity.prompty the following text: "You will respond with a single digit number between 1 and 5. You will include no other text or information", and this seems to fix the issue.

brynn-code · 2024-06-07T02:50:26Z

Hi @singankit and @luigiw , could you please help take a look at this issue?

luigiw · 2024-06-14T21:10:37Z

@bhonris , thank you for reporting the issue and sharing a workaround. It is a known issue that some preview OpenAI models will cause NaN results. Please also try with stable version models.

github-actions · 2024-07-14T21:32:46Z

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

luigiw · 2024-08-15T00:01:07Z

Fixed in 0.3.2 version.

bhonris added the bug Something isn't working label Jun 6, 2024

brynn-code assigned singankit and luigiw Jun 7, 2024

github-actions bot added the no-recent-activity There has been no recent activity on this issue/pull request label Jul 14, 2024

luigiw removed the no-recent-activity There has been no recent activity on this issue/pull request label Jul 16, 2024

luigiw mentioned this issue Jul 20, 2024

Update GPT based evaluators to force output to be a single integer #3550

Merged

8 tasks

luigiw closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Evaluate on test dataset using evaluate() with SimilarityEvaluator returns NaN #3381

[BUG] Evaluate on test dataset using evaluate() with SimilarityEvaluator returns NaN #3381

bhonris commented Jun 6, 2024

bhonris commented Jun 6, 2024

brynn-code commented Jun 7, 2024

luigiw commented Jun 14, 2024

github-actions bot commented Jul 14, 2024

luigiw commented Aug 15, 2024

[BUG] Evaluate on test dataset using evaluate() with SimilarityEvaluator returns NaN #3381

[BUG] Evaluate on test dataset using evaluate() with SimilarityEvaluator returns NaN #3381

Comments

bhonris commented Jun 6, 2024

bhonris commented Jun 6, 2024

brynn-code commented Jun 7, 2024

luigiw commented Jun 14, 2024

github-actions bot commented Jul 14, 2024

luigiw commented Aug 15, 2024