-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Timeline] Rebuild nested fields structure from fields response #96187
Conversation
* Always requests TIMELINE_CTI_FIELDS as part of request This only works for one level of nesting; will be extending tests to allow for multiple levels momentarily.
This is a recursive implementation, but recursion depth is limited to the number of levels of nesting, with arguments reducing in size as we go (i.e. logarithmic)
* Order short-circuiting conditions by cost, ascending * Simplify object building for non-nested objects from fields * The non-nested case is the same as the base recursive case, so always call our recursive function if building from .fields * Simplify getNestedParentPath * We can do a few simple string comparison rather than building up multiple strings/arrays * Don't call getNestedParentPath unnecessarily, only if we have a field
By definition, nestedParentFieldName can never be equal to fieldName, which means there are only two branches here.
Each top-level field value can be either an array of leaf values (unknown[]), or an array of nested fields.
If fieldName is null or undefined, there is no reason to search for it in dataFields. Looking through the git history this looks to be dead code as a result of refactoring, as opposed to a legitimate bugfix, so I'm removing it.
}, | ||
{ | ||
field: 'threat.indicator.matched.field', | ||
value: ['matched_field', 'matched_field_2'], | ||
field: 'source.geo.location', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order here changed due to the addition of threat match fields to the ECS fields list, meaning it comes sooner than a requested field like this one (source.geo.location
)
...plugins/security_solution/server/search_strategy/timeline/factory/events/all/helpers.test.ts
Show resolved
Hide resolved
return set({}, fieldPath, toStringArray(value)); | ||
} | ||
|
||
return buildObjectRecursive(fieldPath, hit.fields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FrankHassanabad would you be able to pair on verifying the performance of this logic? I know this is a hot path and that we're doing setTimeout
s and Promise.resolve
s elsewhere; I would appreciate a crash course in analzying/benchmarking/profiling this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rylnd and myself discussed this and the conclusion is:
I suggested for testing we can create some large index mappings and testing these functions manually to see if they need to do partitioning or not with a setTimeout.
We have had weird things in the past where once you add a partition through a setTimeout
, team members from anywhere at any time can sometimes think it's a mistake (no matter how many docs and all CAPS! you put above it :-) ) and they will remove 'em and you have to watch them forever.
But also so we can see that everything functions as we expect today with a manual test as suggested. However right now we don't have an automated perf regression framework for injecting large index mappings, running a test against them and then seeing if we have de-graded performance of the regression of either NodeJS or the frontend UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more context about this in the Performance Characteristics section in the PR description 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have PR who removed all the set
from lodash in this function, if I remember correctly that show slowness in the past.
looks like I reviewed this change as a part of #96275, I was wondering if you were considering keeping this alive or just continuing the discussion under the row renderer change |
* one was a test failure due to my modifying mock data * one may have been a legitimate bug where we don't handle a hit without a fields response; I need to follow up with Xavier to verify.
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-threat-hunting (Team:Threat Hunting) |
@@ -407,6 +407,14 @@ describe('#formatTimelineData', () => { | |||
}); | |||
}); | |||
|
|||
it('builds an object with no fields response', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XavierM I added this test and the accompanying null check because the parseEqlResponse
unit tests were feeding in events without a fields
property and failing. It certainly seems plausible for a hit not to have that property, hence the changes here, but I wanted to double-check it with you as well.
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@@ -26,10 +26,12 @@ export interface EventsActionGroupData { | |||
doc_count: number; | |||
} | |||
|
|||
export type Fields = Record<string, unknown[] | Fields[]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this type is referring to itself is that kosher? does it recurse? broke my brain a little bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's recursive! This was the most accurate representation of nested fields (and thus "fields" in general) I could think of. It appears to work with all our existing uses of the property without any changes because unknown[]
is greedier than Fields[]
, but e.g. I needed to narrow in cases where we know it's gong to be the recursive form.
let nestedHit: EventHit; | ||
|
||
beforeEach(() => { | ||
// @ts-expect-error nestedHit is minimal | ||
nestedHit = { | ||
fields: { | ||
'nested_1.foo': [ | ||
{ | ||
'nested_2.bar': [ | ||
{ leaf: ['leaf_value'], leaf_2: ['leaf_2_value'] }, | ||
{ leaf_2: ['leaf_2_value_2', 'leaf_2_value_3'] }, | ||
], | ||
}, | ||
{ | ||
'nested_2.bar': [ | ||
{ leaf: ['leaf_value_2'], leaf_2: ['leaf_2_value_4'] }, | ||
{ leaf: ['leaf_value_3'], leaf_2: ['leaf_2_value_5'] }, | ||
], | ||
}, | ||
], | ||
}, | ||
}; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not need the let
or beforeEach
let nestedHit: EventHit; | |
beforeEach(() => { | |
// @ts-expect-error nestedHit is minimal | |
nestedHit = { | |
fields: { | |
'nested_1.foo': [ | |
{ | |
'nested_2.bar': [ | |
{ leaf: ['leaf_value'], leaf_2: ['leaf_2_value'] }, | |
{ leaf_2: ['leaf_2_value_2', 'leaf_2_value_3'] }, | |
], | |
}, | |
{ | |
'nested_2.bar': [ | |
{ leaf: ['leaf_value_2'], leaf_2: ['leaf_2_value_4'] }, | |
{ leaf: ['leaf_value_3'], leaf_2: ['leaf_2_value_5'] }, | |
], | |
}, | |
], | |
}, | |
}; | |
}); | |
// @ts-expect-error nestedHit is minimal | |
const nestedHit: EventHit = { | |
fields: { | |
'nested_1.foo': [ | |
{ | |
'nested_2.bar': [ | |
{ leaf: ['leaf_value'], leaf_2: ['leaf_2_value'] }, | |
{ leaf_2: ['leaf_2_value_2', 'leaf_2_value_3'] }, | |
], | |
}, | |
{ | |
'nested_2.bar': [ | |
{ leaf: ['leaf_value_2'], leaf_2: ['leaf_2_value_4'] }, | |
{ leaf: ['leaf_value_3'], leaf_2: ['leaf_2_value_5'] }, | |
], | |
}, | |
], | |
}, | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this let/beforeEach
is my paranoid "reset test data" pattern, meant to address the case where some function under test permutes its arguments. Consider the following scenario:
describe('permuting test data', () => {
const a = ['foo'];
it('works once', () => {
expect(a).toEqual(['foo']);
a.push('bar');
});
it('then breaks', () => {
expect(a).toEqual(['foo']);
});
});
Contrived example? Yes, we're not usually writing code with side effects like this, and should strive not to. However, it inevitably happens, and occasionally the solution is to "fix" the tests such that they work against the permuted data. And then we have both order-dependent and incorrect tests 😦 .
Given the fact that this suite was using that shared eventHit
variable all over the place, and I had no idea how the code worked before writing these new tests, I opted to go this route and ensure that these new tests were "clean" and not subject to previous permutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotchya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning this up, nested fields === 🤯 . LGTM 🚀
Conflicts: x-pack/plugins/security_solution/server/search_strategy/timeline/factory/events/all/helpers.test.ts
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @rylnd |
…elds response (elastic#96187) * First pass at rebuilding nested object structure from fields response * Always requests TIMELINE_CTI_FIELDS as part of request This only works for one level of nesting; will be extending tests to allow for multiple levels momentarily. * Build objects from arbitrary levels of nesting This is a recursive implementation, but recursion depth is limited to the number of levels of nesting, with arguments reducing in size as we go (i.e. logarithmic) * Simplify parsing logic, perf improvements * Order short-circuiting conditions by cost, ascending * Simplify object building for non-nested objects from fields * The non-nested case is the same as the base recursive case, so always call our recursive function if building from .fields * Simplify getNestedParentPath * We can do a few simple string comparison rather than building up multiple strings/arrays * Don't call getNestedParentPath unnecessarily, only if we have a field * Simplify if branching By definition, nestedParentFieldName can never be equal to fieldName, which means there are only two branches here. * Declare/export a more accurate fields type Each top-level field value can be either an array of leaf values (unknown[]), or an array of nested fields. * Remove unnecessary condition If fieldName is null or undefined, there is no reason to search for it in dataFields. Looking through the git history this looks to be dead code as a result of refactoring, as opposed to a legitimate bugfix, so I'm removing it. * Fix failing tests * one was a test failure due to my modifying mock data * one may have been a legitimate bug where we don't handle a hit without a fields response; I need to follow up with Xavier to verify. Co-authored-by: Kibana Machine <[email protected]>
…elds response (#96187) (#96915) * First pass at rebuilding nested object structure from fields response * Always requests TIMELINE_CTI_FIELDS as part of request This only works for one level of nesting; will be extending tests to allow for multiple levels momentarily. * Build objects from arbitrary levels of nesting This is a recursive implementation, but recursion depth is limited to the number of levels of nesting, with arguments reducing in size as we go (i.e. logarithmic) * Simplify parsing logic, perf improvements * Order short-circuiting conditions by cost, ascending * Simplify object building for non-nested objects from fields * The non-nested case is the same as the base recursive case, so always call our recursive function if building from .fields * Simplify getNestedParentPath * We can do a few simple string comparison rather than building up multiple strings/arrays * Don't call getNestedParentPath unnecessarily, only if we have a field * Simplify if branching By definition, nestedParentFieldName can never be equal to fieldName, which means there are only two branches here. * Declare/export a more accurate fields type Each top-level field value can be either an array of leaf values (unknown[]), or an array of nested fields. * Remove unnecessary condition If fieldName is null or undefined, there is no reason to search for it in dataFields. Looking through the git history this looks to be dead code as a result of refactoring, as opposed to a legitimate bugfix, so I'm removing it. * Fix failing tests * one was a test failure due to my modifying mock data * one may have been a legitimate bug where we don't handle a hit without a fields response; I need to follow up with Xavier to verify. Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
Summary
TIMELINE_EVENTS_FIELDS
so as to always be requested for events/all and eql queriesTODO
*Ecs
types for threat fields?Performance characteristics
TL;DR this path should not be an issue.
Rationale (please verify @XavierM @FrankHassanabad):
buildObjectForFieldPath
),getValuesFromFields
, was meant to mimic the performance improvements informatIndexFields
formatFirstFields
needs to be performant as it operates on a potentially huge list of index fields from all data sourcesformatTimelineData
(the caller of the code in this PR), however, operates only on requested fields; typically a subset of ECS and those we need for various SecSol pagesformatIndexFields
would receiveRegarding the specific performance of
buildObjectFromFields
, it's invoked once for each requested field; see cab30d4 for more details.Why?
Given alerts with nested fields like
threat.indicator
:we currently have two differing needs for nested fields in the UI:
threat.indicator.matched.field
: ["matched_field", "other_matched_field", "matched_field_2"]#92025 did the work in 7.12 to give us 1. above. While there are some minor issues (only supports one level of nested fields, object values are stringified), I believe this is sufficient for our immediate needs, i.e.: answering "Which values populated this field?" and the ability to investigate those values in timeline.
However, search strategies were previously missing the structure described in 2. Our initial CTI tabs work solved this problem by
JSON.parse
ing the stringifiedthreat.indicator
object returned as part of 1., but this is blocking work that should be done by the backend if possible, and is not feasible for a repeated task such as row rendering.Thus, this PR: the logic of 1. was left relatively untouched, so the table view should continue to function as before. The new object structure is included as part of the
.ecs
property of a given node in the response, and can thus be consumed by e.g. row renderers without further changes. Previously, requesting nested fields in this section would result in an incorrect structure, with single objects/keys and empty array values.Checklist
For maintainers