-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] Improve LLM evaluation framework #204574
[Obs AI Assistant] Improve LLM evaluation framework #204574
Conversation
57b281e
to
e82e4e6
Compare
Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant) |
b60786e
to
5d7fe68
Compare
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @viduni94! just a few nits
.../solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md
Show resolved
Hide resolved
.../solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/README.md
Outdated
Show resolved
Hide resolved
.../observability/plugins/observability_ai_assistant_app/scripts/evaluation/get_service_urls.ts
Outdated
Show resolved
Hide resolved
.../observability/plugins/observability_ai_assistant_app/scripts/evaluation/get_service_urls.ts
Outdated
Show resolved
Hide resolved
@@ -124,13 +124,13 @@ export class KibanaClient { | |||
return this.axios<T>({ | |||
method, | |||
url, | |||
data: data || {}, | |||
...(method.toLowerCase() !== 'delete' ? { data: data || {} } : {}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this condition, deleting ruleIds
fails here - https://github.com/elastic/kibana/pull/204574/files#diff-23cc9139c91a064a3ca574552ad823023c579cc2c68ff7f277c392102a0d526aL139
Because the DELETE
method doesn't allow an undefined
or empty body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to checking whether data
is empty and if so, not setting the key? IIRC there actually are some routes in Kibana that allow for a request body with the DELETE method (whether that's a good idea or not 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
...ons/observability/plugins/observability_ai_assistant_app/scripts/evaluation/kibana_client.ts
Outdated
Show resolved
Hide resolved
...gins/observability_ai_assistant_app/scripts/evaluation/scenarios/elasticsearch/index.spec.ts
Outdated
Show resolved
Hide resolved
Results after the updates:
|
40c6445
to
d889b3e
Compare
…rch credentials are provided
d889b3e
to
63e5be1
Compare
💚 Build Succeeded
Metrics [docs]
History
cc @viduni94 |
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
@@ -124,10 +124,10 @@ export class KibanaClient { | |||
return this.axios<T>({ | |||
method, | |||
url, | |||
data: data || {}, | |||
...(method.toLowerCase() === 'delete' && !data ? {} : { data: data || {} }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about simply:
...(data ? { data } : {}),
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
Closes elastic#203122 ## Summary ### Problem The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the `main` branch and has missing scenarios. Problems identified: - Unable to run the evaluation with a local Elasticsearch instance - Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests) - State contaminations between runs makes the script throw errors when run multiple times. - Authentication issues when calling `/internal` APIs ### Solution As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework. #### Fixes | Problem | RC (Root Cause) | Fixed? | |------------------------|---------------------------------|--------| | Running with a local Elasticsearch instance | Service URLs were not picking up the correct auth because of the format specified in `kibana.dev.yml` | ✅ | | Alerts and APM results skipped in final result | Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. | ✅ (all test scenarios fixed) | | State contaminations between runs | Some `after` hooks were not running successfully because of an error in the `callKibana` method | ✅ | | Authentication issues when calling `/internal` APIs | The required headers are not present in the request | ✅ | #### Enhancements / Improvements | What was added | How does it enhance the framework | |------------------------|---------------------------------| | Added new KB retrieval test to the KB scenario | More scenarios covered | | Added new scenario for the `retrieve_elastic_doc` function | Cover missing newly added functions | | Enhance how scope is used for each scenario and apply correct scope | The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: `alerts`) to produce the best result. At present all scenarios use the scope `all` which is not ideal and doesn't align with the actual functionality of the AI Assistant | | Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) | Makes it easier to navigate through the results printed on the terminal | | Improved readme | Easier to configure and use the framework while identifying all possible options | | Improved logging | Easier to navigate through the terminal output | ### Checklist - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
Closes #203122
Summary
Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the
main
branch and has missing scenarios.Problems identified:
/internal
APIsSolution
As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework.
Fixes
kibana.dev.yml
after
hooks were not running successfully because of an error in thecallKibana
method/internal
APIsEnhancements / Improvements
retrieve_elastic_doc
functionalerts
) to produce the best result. At present all scenarios use the scopeall
which is not ideal and doesn't align with the actual functionality of the AI AssistantChecklist
release_note:*
label is applied per the guidelines