Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Timeline] Rebuild nested fields structure from fields response #96187

Merged
merged 11 commits into from
Apr 12, 2021
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,12 @@ export interface EventsActionGroupData {
doc_count: number;
}

export type Fields = Record<string, unknown[] | Fields[]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type is referring to itself is that kosher? does it recurse? broke my brain a little bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's recursive! This was the most accurate representation of nested fields (and thus "fields" in general) I could think of. It appears to work with all our existing uses of the property without any changes because unknown[] is greedier than Fields[], but e.g. I needed to narrow in cases where we know it's gong to be the recursive form.


export interface EventHit extends SearchHit {
sort: string[];
_source: EventSource;
fields: Record<string, unknown[]>;
fields: Fields;
aggregations: {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
[agg: string]: any;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@
* 2.0.
*/

export const TIMELINE_CTI_FIELDS = [
'threat.indicator.event.dataset',
'threat.indicator.event.reference',
'threat.indicator.matched.atomic',
'threat.indicator.matched.field',
'threat.indicator.matched.type',
'threat.indicator.provider',
];

export const TIMELINE_EVENTS_FIELDS = [
'@timestamp',
'signal.status',
Expand Down Expand Up @@ -230,4 +239,5 @@ export const TIMELINE_EVENTS_FIELDS = [
'zeek.ssl.established',
'zeek.ssl.resumed',
'zeek.ssl.version',
...TIMELINE_CTI_FIELDS,
];
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import { EventHit } from '../../../../../../common/search_strategy';
import { TIMELINE_EVENTS_FIELDS } from './constants';
import { formatTimelineData } from './helpers';
import { buildObjectForFieldPath, formatTimelineData } from './helpers';
import { eventHit } from '../mocks';

describe('#formatTimelineData', () => {
Expand Down Expand Up @@ -42,12 +42,12 @@ describe('#formatTimelineData', () => {
value: ['beats-ci-immutable-ubuntu-1804-1605624279743236239'],
},
{
field: 'source.geo.location',
value: [`{"lon":118.7778,"lat":32.0617}`],
field: 'threat.indicator.matched.field',
value: ['matched_field', 'other_matched_field', 'matched_field_2'],
},
{
field: 'threat.indicator.matched.field',
value: ['matched_field', 'matched_field_2'],
field: 'source.geo.location',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order here changed due to the addition of threat match fields to the ECS fields list, meaning it comes sooner than a requested field like this one (source.geo.location)

value: [`{"lon":118.7778,"lat":32.0617}`],
},
],
ecs: {
Expand Down Expand Up @@ -94,6 +94,34 @@ describe('#formatTimelineData', () => {
user: {
name: ['jenkins'],
},
threat: {
indicator: [
{
event: {
dataset: [],
reference: [],
},
matched: {
atomic: ['matched_atomic'],
field: ['matched_field', 'other_matched_field'],
type: [],
},
provider: ['yourself'],
},
{
event: {
dataset: [],
reference: [],
},
matched: {
atomic: ['matched_atomic_2'],
field: ['matched_field_2'],
type: [],
},
provider: ['other_you'],
},
],
},
},
},
});
Expand Down Expand Up @@ -371,4 +399,173 @@ describe('#formatTimelineData', () => {
},
});
});

describe('buildObjectForFieldPath', () => {
it('builds an object from a single non-nested field', () => {
expect(buildObjectForFieldPath('@timestamp', eventHit)).toEqual({
'@timestamp': ['2020-11-17T14:48:08.922Z'],
});
});

it('builds an object with no fields response', () => {
Copy link
Contributor Author

@rylnd rylnd Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XavierM I added this test and the accompanying null check because the parseEqlResponse unit tests were feeding in events without a fields property and failing. It certainly seems plausible for a hit not to have that property, hence the changes here, but I wanted to double-check it with you as well.

const { fields, ...fieldLessHit } = eventHit;
// @ts-expect-error fieldLessHit is intentionally missing fields
expect(buildObjectForFieldPath('@timestamp', fieldLessHit)).toEqual({
'@timestamp': [],
});
});

it('does not misinterpret non-nested fields with a common prefix', () => {
// @ts-expect-error hit is minimal
const hit: EventHit = {
fields: {
'foo.bar': ['baz'],
'foo.barBaz': ['foo'],
},
};

expect(buildObjectForFieldPath('foo.barBaz', hit)).toEqual({
foo: { barBaz: ['foo'] },
});
});

it('builds an array of objects from a nested field', () => {
// @ts-expect-error hit is minimal
const hit: EventHit = {
fields: {
foo: [{ bar: ['baz'] }],
},
};
expect(buildObjectForFieldPath('foo.bar', hit)).toEqual({
foo: [{ bar: ['baz'] }],
});
});

it('builds intermediate objects for nested fields', () => {
// @ts-expect-error nestedHit is minimal
const nestedHit: EventHit = {
fields: {
'foo.bar': [
{
baz: ['host.name'],
},
],
},
};
expect(buildObjectForFieldPath('foo.bar.baz', nestedHit)).toEqual({
foo: {
bar: [
{
baz: ['host.name'],
},
],
},
});
});

it('builds intermediate objects at multiple levels', () => {
expect(buildObjectForFieldPath('threat.indicator.matched.atomic', eventHit)).toEqual({
threat: {
indicator: [
{
matched: {
atomic: ['matched_atomic'],
},
},
{
matched: {
atomic: ['matched_atomic_2'],
},
},
],
},
});
});

it('preserves multiple values for a single leaf', () => {
expect(buildObjectForFieldPath('threat.indicator.matched.field', eventHit)).toEqual({
threat: {
indicator: [
{
matched: {
field: ['matched_field', 'other_matched_field'],
},
},
{
matched: {
field: ['matched_field_2'],
},
},
],
},
});
});

describe('multiple levels of nested fields', () => {
let nestedHit: EventHit;

beforeEach(() => {
// @ts-expect-error nestedHit is minimal
nestedHit = {
fields: {
'nested_1.foo': [
{
'nested_2.bar': [
{ leaf: ['leaf_value'], leaf_2: ['leaf_2_value'] },
{ leaf_2: ['leaf_2_value_2', 'leaf_2_value_3'] },
],
},
{
'nested_2.bar': [
{ leaf: ['leaf_value_2'], leaf_2: ['leaf_2_value_4'] },
{ leaf: ['leaf_value_3'], leaf_2: ['leaf_2_value_5'] },
],
},
],
},
};
});
Comment on lines +505 to +527
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need the let or beforeEach

Suggested change
let nestedHit: EventHit;
beforeEach(() => {
// @ts-expect-error nestedHit is minimal
nestedHit = {
fields: {
'nested_1.foo': [
{
'nested_2.bar': [
{ leaf: ['leaf_value'], leaf_2: ['leaf_2_value'] },
{ leaf_2: ['leaf_2_value_2', 'leaf_2_value_3'] },
],
},
{
'nested_2.bar': [
{ leaf: ['leaf_value_2'], leaf_2: ['leaf_2_value_4'] },
{ leaf: ['leaf_value_3'], leaf_2: ['leaf_2_value_5'] },
],
},
],
},
};
});
// @ts-expect-error nestedHit is minimal
const nestedHit: EventHit = {
fields: {
'nested_1.foo': [
{
'nested_2.bar': [
{ leaf: ['leaf_value'], leaf_2: ['leaf_2_value'] },
{ leaf_2: ['leaf_2_value_2', 'leaf_2_value_3'] },
],
},
{
'nested_2.bar': [
{ leaf: ['leaf_value_2'], leaf_2: ['leaf_2_value_4'] },
{ leaf: ['leaf_value_3'], leaf_2: ['leaf_2_value_5'] },
],
},
],
},
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this let/beforeEach is my paranoid "reset test data" pattern, meant to address the case where some function under test permutes its arguments. Consider the following scenario:

describe('permuting test data', () => {
  const a = ['foo'];

  it('works once', () => {
    expect(a).toEqual(['foo']);
    a.push('bar');
  });

  it('then breaks', () => {
    expect(a).toEqual(['foo']);
  });
});

Contrived example? Yes, we're not usually writing code with side effects like this, and should strive not to. However, it inevitably happens, and occasionally the solution is to "fix" the tests such that they work against the permuted data. And then we have both order-dependent and incorrect tests 😦 .

Given the fact that this suite was using that shared eventHit variable all over the place, and I had no idea how the code worked before writing these new tests, I opted to go this route and ensure that these new tests were "clean" and not subject to previous permutations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotchya


it('includes objects without the field', () => {
expect(buildObjectForFieldPath('nested_1.foo.nested_2.bar.leaf', nestedHit)).toEqual({
nested_1: {
foo: [
{
nested_2: {
bar: [{ leaf: ['leaf_value'] }, { leaf: [] }],
},
},
{
nested_2: {
bar: [{ leaf: ['leaf_value_2'] }, { leaf: ['leaf_value_3'] }],
},
},
],
},
});
});

it('groups multiple leaf values', () => {
expect(buildObjectForFieldPath('nested_1.foo.nested_2.bar.leaf_2', nestedHit)).toEqual({
nested_1: {
foo: [
{
nested_2: {
bar: [
{ leaf_2: ['leaf_2_value'] },
{ leaf_2: ['leaf_2_value_2', 'leaf_2_value_3'] },
],
},
},
{
nested_2: {
bar: [{ leaf_2: ['leaf_2_value_4'] }, { leaf_2: ['leaf_2_value_5'] }],
},
},
],
},
});
});
});
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@
* 2.0.
*/

import { set } from '@elastic/safer-lodash-set';
import { get, has, merge, uniq } from 'lodash/fp';
import { Ecs } from '../../../../../../common/ecs';
import {
EventHit,
Fields,
TimelineEdges,
TimelineNonEcsData,
} from '../../../../../../common/search_strategy';
Expand Down Expand Up @@ -75,18 +78,13 @@ const getValuesFromFields = async (
[fieldName]: get(fieldName, hit._source),
};
} else {
if (nestedParentFieldName == null || nestedParentFieldName === fieldName) {
if (nestedParentFieldName == null) {
fieldToEval = {
[fieldName]: hit.fields[fieldName],
};
} else if (nestedParentFieldName != null) {
fieldToEval = {
[nestedParentFieldName]: hit.fields[nestedParentFieldName],
};
} else {
// fallback, should never hit
fieldToEval = {
[fieldName]: [],
[nestedParentFieldName]: hit.fields[nestedParentFieldName],
};
}
}
Expand All @@ -99,22 +97,50 @@ const getValuesFromFields = async (
);
};

const buildObjectRecursive = (fieldPath: string, fields: Fields): Partial<Ecs> => {
const nestedParentPath = getNestedParentPath(fieldPath, fields);
if (!nestedParentPath) {
return set({}, fieldPath, toStringArray(get(fieldPath, fields)));
}

const subPath = fieldPath.replace(`${nestedParentPath}.`, '');
const subFields = (get(nestedParentPath, fields) ?? []) as Fields[];
return set(
{},
nestedParentPath,
subFields.map((subField) => buildObjectRecursive(subPath, subField))
);
};

export const buildObjectForFieldPath = (fieldPath: string, hit: EventHit): Partial<Ecs> => {
if (has(fieldPath, hit._source)) {
const value = get(fieldPath, hit._source);
return set({}, fieldPath, toStringArray(value));
}

return buildObjectRecursive(fieldPath, hit.fields);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankHassanabad would you be able to pair on verifying the performance of this logic? I know this is a hot path and that we're doing setTimeouts and Promise.resolves elsewhere; I would appreciate a crash course in analzying/benchmarking/profiling this!

Copy link
Contributor

@FrankHassanabad FrankHassanabad Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rylnd and myself discussed this and the conclusion is:

I suggested for testing we can create some large index mappings and testing these functions manually to see if they need to do partitioning or not with a setTimeout.

We have had weird things in the past where once you add a partition through a setTimeout, team members from anywhere at any time can sometimes think it's a mistake (no matter how many docs and all CAPS! you put above it :-) ) and they will remove 'em and you have to watch them forever.

But also so we can see that everything functions as we expect today with a manual test as suggested. However right now we don't have an automated perf regression framework for injecting large index mappings, running a test against them and then seeing if we have de-graded performance of the regression of either NodeJS or the frontend UI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more context about this in the Performance Characteristics section in the PR description 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have PR who removed all the set from lodash in this function, if I remember correctly that show slowness in the past.

};

/**
* If a prefix of our full field path is present as a field, we know that our field is nested
*/
const getNestedParentPath = (fieldPath: string, fields: Fields | undefined): string | undefined =>
fields &&
Object.keys(fields).find((field) => field !== fieldPath && fieldPath.startsWith(`${field}.`));

const mergeTimelineFieldsWithHit = async <T>(
fieldName: string,
flattenedFields: T,
hit: EventHit,
dataFields: readonly string[],
ecsFields: readonly string[]
) => {
if (fieldName != null || dataFields.includes(fieldName)) {
const fieldNameAsArray = fieldName.split('.');
const nestedParentFieldName = Object.keys(hit.fields ?? []).find((f) => {
return f === fieldNameAsArray.slice(0, f.split('.').length).join('.');
});
if (fieldName != null) {
const nestedParentPath = getNestedParentPath(fieldName, hit.fields);
if (
nestedParentPath != null ||
has(fieldName, hit._source) ||
has(fieldName, hit.fields) ||
nestedParentFieldName != null ||
specialFields.includes(fieldName)
) {
const objectWithProperty = {
Expand All @@ -123,22 +149,13 @@ const mergeTimelineFieldsWithHit = async <T>(
data: dataFields.includes(fieldName)
? [
...get('node.data', flattenedFields),
...(await getValuesFromFields(fieldName, hit, nestedParentFieldName)),
...(await getValuesFromFields(fieldName, hit, nestedParentPath)),
]
: get('node.data', flattenedFields),
ecs: ecsFields.includes(fieldName)
? {
...get('node.ecs', flattenedFields),
// @ts-expect-error
...fieldName.split('.').reduceRight(
// @ts-expect-error
(obj, next) => ({ [next]: obj }),
toStringArray(
has(fieldName, hit._source)
? get(fieldName, hit._source)
: hit.fields[fieldName]
)
),
...buildObjectForFieldPath(fieldName, hit),
}
: get('node.ecs', flattenedFields),
},
Expand Down
Loading