Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Investigation app] add entities route and investigation Contextual Insight #194432

Merged

Conversation

dominiqueclarke
Copy link
Contributor

@dominiqueclarke dominiqueclarke commented Sep 30, 2024

Summary

Adds a route that can be used to fetch entities related to an investigation.

The route fetches associated entities by service name, host name, or container id. It then identifies the associated indices and datastreams.

The discovered entities are passed to the contextual insight to inform the LLM.

image

This PR represents the first step in developing an AI-informed hypothesis at the beginning of the investigation. Over time, further insights will be provided to the LLM to deepen it's investigative analysis and propose a more helpful root cause hypothesis.

Testing

  1. Create some APM data. I'm using the otel demo and triggering a failure via the flagd service. Since this is in flux, you can reach out to me about this workflow. However, you can also create APM data via synth-trace.
  2. Create an custom threshold rule that you expect to trigger an alert. I created mine to using http.response.status_code: 500 / http.response.status_code : * and set a low threshold base on the amount of failures in my current test data. Be sure to also group the alert by service.name
  3. Wait for the alert to fire, then visit the alert details page and start an investigation
  4. notice the contextual insight. Expand it to see more information

@dominiqueclarke dominiqueclarke added release_note:skip Skip the PR/issue when compiling release notes v9.0.0 Team:obs-ux-management Observability Management User Experience Team v8.16.0 labels Sep 30, 2024
@dominiqueclarke dominiqueclarke requested a review from a team as a code owner September 30, 2024 13:05
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Sep 30, 2024
@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

): Promise<{ responses: Array<InferSearchResponseOf<TDocument, TSearchRequest>> }>;
}

export function createEntitiesESClient({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a client specifically for searching through entities indices, but I should be using the observability es client as a dependency. Will update when I can.

.map((params) => {
const searchParams: [MsearchMultisearchHeader, MsearchMultisearchBody] = [
{
index: [SERVICE_ENTITIES_LATEST_ALIAS],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copypasta. I'd like to remove the reference to the service alias in particular.

@dominiqueclarke dominiqueclarke added the backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) label Sep 30, 2024
@mgiota mgiota self-requested a review October 1, 2024 09:57
@dominiqueclarke dominiqueclarke requested a review from a team as a code owner October 1, 2024 20:14
@dominiqueclarke dominiqueclarke changed the title [Investigation app] add entities route [Investigation app] add entities route and investigation Contextual Insight Oct 2, 2024
Copy link
Contributor

@kdelemme kdelemme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick first pass, will continue

@@ -28,7 +28,7 @@
"kibanaReact",
"kibanaUtils",
],
"optionalPlugins": [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already in the requiredPlugins

Comment on lines 88 to 90
const alertOriginInvestigation = alertOriginSchema.safeParse(investigation?.origin);
const alertId = alertOriginInvestigation.success ? alertOriginInvestigation.data.id : undefined;
const { data: alert } = useFetchAlert({ id: alertId });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰 nit: this logic is required every time we use useFetchAlert(), maybe we can refactor the hook to encapsulate this logic: the hook itself could use useInvestigation to retrieve the investigation, and we won't need to expose the originated alert in this context. I'm already worried about this context becoming bloated with too many things.

const { data: alert } = useFetchAlertOrigin()

Copy link
Contributor

@kdelemme kdelemme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and nits, but otherwise looks good to me.
I guess for testing this I need to setup a genAI connector, do you have a guide for this?

Comment on lines +37 to +41
{investigation?.id && (
<EuiFlexItem grow={false}>
<AssistantHypothesis investigationId={investigation.id} />
</EuiFlexItem>
)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰 nit: use the context hook useInvestigation() directly from AssistantHypothesis:

Suggested change
{investigation?.id && (
<EuiFlexItem grow={false}>
<AssistantHypothesis investigationId={investigation.id} />
</EuiFlexItem>
)}
<EuiFlexItem grow={false}>
<AssistantHypothesis />
</EuiFlexItem>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had this originally, but it made it so that the investigation was sometimes undefined, and I hated having to handle that all the time. Would you prefer that trade off?

});
export const SERVICE_ENTITIES_HISTORY_ALIAS = entitiesAliasPattern({
type: 'service',
dataset: ENTITY_HISTORY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought EEM had removed the history?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They did. I'll remove this for now.

hostName,
entitiesEsClient,
}: {
context: InvestigateAppRequestHandlerContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible let's try to not leak route/request details into the services. Here we can replace the whole request handler context with the esClient, and do the wiring in the route handler.

);
}

const getEntitySource = async ({ index }: { index: IndicesIndexState }) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be async?

Comment on lines 82 to 86
return await Promise.all(
Object.values(indices).map(async (index) => {
return await getEntitySource({ index });
})
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the promise all and await here?

const sourceIndex = entity?.sourceIndex;
if (!sourceIndex) return null;

const indices = await esClient.indices.get({ index: sourceIndex });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰 nit: might be probably too early to optimize, but this call is made in a double for-loop. Is there a way to call the esClient.indices.get for all sourceIndex at once?

@dominiqueclarke dominiqueclarke force-pushed the feature/investigation-entities branch from 34235a6 to 53655e5 Compare October 2, 2024 21:08
Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than:

It's already in the requiredPlugins

kibana.jsonc lgtm

@dominiqueclarke
Copy link
Contributor Author

Some questions and nits, but otherwise looks good to me. I guess for testing this I need to setup a genAI connector, do you have a guide for this?

The guide for setting up the connector can be found here https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/observability_ai_assistant/README.md

You'll also need to start your knowledge base. The easiest way to do that is, after setting up your connector, open the Assistant flyout via the Assistant button on the top right and click the start knowledge base button.

@dominiqueclarke dominiqueclarke force-pushed the feature/investigation-entities branch from b9de0ca to f95017d Compare October 4, 2024 14:27
@kibana-ci
Copy link
Collaborator

kibana-ci commented Oct 4, 2024

💚 Build Succeeded

  • Buildkite Build
  • Commit: f95017d
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-194432-f95017d80e5d

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
investigateApp 567 572 +5

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/investigation-shared 73 81 +8

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
investigateApp 474.6KB 479.6KB +5.0KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
investigateApp 6.5KB 6.4KB -104.0B
Unknown metric groups

API count

id before after diff
@kbn/investigation-shared 73 81 +8

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dominiqueclarke dominiqueclarke merged commit e4bb435 into elastic:main Oct 4, 2024
23 checks passed
@dominiqueclarke dominiqueclarke deleted the feature/investigation-entities branch October 4, 2024 17:58
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11184673144

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.x Backport failed because of merge conflicts

You might need to backport the following PRs to 8.x:
- feat(rca): add screen context into investigation details (#194753)

Manual backport

To create the backport manually run:

node scripts/backport --pr 194432

Questions ?

Please refer to the Backport tool documentation

@dominiqueclarke
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

dominiqueclarke added a commit to dominiqueclarke/kibana that referenced this pull request Oct 5, 2024
…nsight (elastic#194432)

## Summary

Adds a route that can be used to fetch entities related to an
investigation.

The route fetches associated entities by service name, host name, or
container id. It then identifies the associated indices and datastreams.

The discovered entities are passed to the contextual insight to inform
the LLM.

![image](https://github.com/user-attachments/assets/855a8d68-b039-4557-ba23-5661cd961021)

This PR represents the first step in developing an AI-informed
hypothesis at the beginning of the investigation. Over time, further
insights will be provided to the LLM to deepen it's investigative
analysis and propose a more helpful root cause hypothesis.

### Testing

1. Create some APM data. I'm using the otel demo and triggering a
failure via the flagd service. Since this is in flux, you can reach out
to me about this workflow. However, you can also create APM data via
`synth-trace`.
2. Create an custom threshold rule that you expect to trigger an alert.
I created mine to using `http.response.status_code: 500 /
http.response.status_code : *` and set a low threshold base on the
amount of failures in my current test data. Be sure to also group the
alert by `service.name`
3. Wait for the alert to fire, then visit the alert details page and
start an investigation
4. notice the contextual insight. Expand it to see more information

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit e4bb435)
tiansivive pushed a commit to tiansivive/kibana that referenced this pull request Oct 7, 2024
…nsight (elastic#194432)

## Summary

Adds a route that can be used to fetch entities related to an
investigation.

The route fetches associated entities by service name, host name, or
container id. It then identifies the associated indices and datastreams.

The discovered entities are passed to the contextual insight to inform
the LLM.


![image](https://github.com/user-attachments/assets/855a8d68-b039-4557-ba23-5661cd961021)

This PR represents the first step in developing an AI-informed
hypothesis at the beginning of the investigation. Over time, further
insights will be provided to the LLM to deepen it's investigative
analysis and propose a more helpful root cause hypothesis.

### Testing

1. Create some APM data. I'm using the otel demo and triggering a
failure via the flagd service. Since this is in flux, you can reach out
to me about this workflow. However, you can also create APM data via
`synth-trace`.
2. Create an custom threshold rule that you expect to trigger an alert.
I created mine to using `http.response.status_code: 500 /
http.response.status_code : *` and set a low threshold base on the
amount of failures in my current test data. Be sure to also group the
alert by `service.name`
3. Wait for the alert to fire, then visit the alert details page and
start an investigation
4. notice the contextual insight. Expand it to see more information

---------

Co-authored-by: kibanamachine <[email protected]>
dominiqueclarke added a commit that referenced this pull request Oct 7, 2024
…tual Insight (#194432) (#195158)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Investigation app] add entities route and investigation Contextual
Insight (#194432)](#194432)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dominique
Clarke","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-04T17:58:28Z","message":"[Investigation
app] add entities route and investigation Contextual Insight
(#194432)\n\n## Summary\r\n\r\nAdds a route that can be used to fetch
entities related to an\r\ninvestigation.\r\n\r\nThe route fetches
associated entities by service name, host name, or\r\ncontainer id. It
then identifies the associated indices and datastreams.\r\n\r\nThe
discovered entities are passed to the contextual insight to
inform\r\nthe
LLM.\r\n\r\n\r\n![image](https://github.com/user-attachments/assets/855a8d68-b039-4557-ba23-5661cd961021)\r\n\r\nThis
PR represents the first step in developing an AI-informed\r\nhypothesis
at the beginning of the investigation. Over time, further\r\ninsights
will be provided to the LLM to deepen it's investigative\r\nanalysis and
propose a more helpful root cause hypothesis.\r\n\r\n###
Testing\r\n\r\n1. Create some APM data. I'm using the otel demo and
triggering a\r\nfailure via the flagd service. Since this is in flux,
you can reach out\r\nto me about this workflow. However, you can also
create APM data via\r\n`synth-trace`.\r\n2. Create an custom threshold
rule that you expect to trigger an alert.\r\nI created mine to using
`http.response.status_code: 500 /\r\nhttp.response.status_code : *` and
set a low threshold base on the\r\namount of failures in my current test
data. Be sure to also group the\r\nalert by `service.name`\r\n3. Wait
for the alert to fire, then visit the alert details page and\r\nstart an
investigation\r\n4. notice the contextual insight. Expand it to see more
information\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<[email protected]>","sha":"e4bb435b48560852b37e4de54fb9c05cf5a7f3b1","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.16.0"],"number":194432,"url":"https://github.com/elastic/kibana/pull/194432","mergeCommit":{"message":"[Investigation
app] add entities route and investigation Contextual Insight
(#194432)\n\n## Summary\r\n\r\nAdds a route that can be used to fetch
entities related to an\r\ninvestigation.\r\n\r\nThe route fetches
associated entities by service name, host name, or\r\ncontainer id. It
then identifies the associated indices and datastreams.\r\n\r\nThe
discovered entities are passed to the contextual insight to
inform\r\nthe
LLM.\r\n\r\n\r\n![image](https://github.com/user-attachments/assets/855a8d68-b039-4557-ba23-5661cd961021)\r\n\r\nThis
PR represents the first step in developing an AI-informed\r\nhypothesis
at the beginning of the investigation. Over time, further\r\ninsights
will be provided to the LLM to deepen it's investigative\r\nanalysis and
propose a more helpful root cause hypothesis.\r\n\r\n###
Testing\r\n\r\n1. Create some APM data. I'm using the otel demo and
triggering a\r\nfailure via the flagd service. Since this is in flux,
you can reach out\r\nto me about this workflow. However, you can also
create APM data via\r\n`synth-trace`.\r\n2. Create an custom threshold
rule that you expect to trigger an alert.\r\nI created mine to using
`http.response.status_code: 500 /\r\nhttp.response.status_code : *` and
set a low threshold base on the\r\namount of failures in my current test
data. Be sure to also group the\r\nalert by `service.name`\r\n3. Wait
for the alert to fire, then visit the alert details page and\r\nstart an
investigation\r\n4. notice the contextual insight. Expand it to see more
information\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<[email protected]>","sha":"e4bb435b48560852b37e4de54fb9c05cf5a7f3b1"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/194432","number":194432,"mergeCommit":{"message":"[Investigation
app] add entities route and investigation Contextual Insight
(#194432)\n\n## Summary\r\n\r\nAdds a route that can be used to fetch
entities related to an\r\ninvestigation.\r\n\r\nThe route fetches
associated entities by service name, host name, or\r\ncontainer id. It
then identifies the associated indices and datastreams.\r\n\r\nThe
discovered entities are passed to the contextual insight to
inform\r\nthe
LLM.\r\n\r\n\r\n![image](https://github.com/user-attachments/assets/855a8d68-b039-4557-ba23-5661cd961021)\r\n\r\nThis
PR represents the first step in developing an AI-informed\r\nhypothesis
at the beginning of the investigation. Over time, further\r\ninsights
will be provided to the LLM to deepen it's investigative\r\nanalysis and
propose a more helpful root cause hypothesis.\r\n\r\n###
Testing\r\n\r\n1. Create some APM data. I'm using the otel demo and
triggering a\r\nfailure via the flagd service. Since this is in flux,
you can reach out\r\nto me about this workflow. However, you can also
create APM data via\r\n`synth-trace`.\r\n2. Create an custom threshold
rule that you expect to trigger an alert.\r\nI created mine to using
`http.response.status_code: 500 /\r\nhttp.response.status_code : *` and
set a low threshold base on the\r\namount of failures in my current test
data. Be sure to also group the\r\nalert by `service.name`\r\n3. Wait
for the alert to fire, then visit the alert details page and\r\nstart an
investigation\r\n4. notice the contextual insight. Expand it to see more
information\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<[email protected]>","sha":"e4bb435b48560852b37e4de54fb9c05cf5a7f3b1"}},{"branch":"8.x","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Rickyanto Ang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-management Observability Management User Experience Team v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants