Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Create mathColumn function to improve performance #101908

Merged
merged 6 commits into from
Jun 16, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 49 additions & 8 deletions docs/canvas/canvas-function-reference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Alias: `condition`
[[alterColumn_fn]]
=== `alterColumn`

Converts between core types, including `string`, `number`, `null`, `boolean`, and `date`, and renames columns. See also <<mapColumn_fn>> and <<staticColumn_fn>>.
Converts between core types, including `string`, `number`, `null`, `boolean`, and `date`, and renames columns. See also <<mapColumn_fn>>, <<mathColumn_fn>>, and <<staticColumn_fn>>.

*Expression syntax*
[source,js]
Expand Down Expand Up @@ -1717,23 +1717,23 @@ Adds a column calculated as the result of other columns. Changes are made only w
|===
|Argument |Type |Description

|`id`

|`string`, `null`
|An optional id of the resulting column. When no id is provided, the id will be looked up from the existing column by the provided name argument. If no column with this name exists yet, a new column with this name and an identical id will be added to the table.

|_Unnamed_ ***

Aliases: `column`, `name`
|`string`
|The name of the resulting column.
|The name of the resulting column. Names are not required to be unique.

|`expression` ***

Aliases: `exp`, `fn`, `function`
|`boolean`, `number`, `string`, `null`
|A Canvas expression that is passed to each row as a single row `datatable`.

|`id`

|`string`, `null`
|An optional id of the resulting column. When not specified or `null` the name argument is used as id.

|`copyMetaFrom`

|`string`, `null`
Expand Down Expand Up @@ -1808,6 +1808,47 @@ Default: `"throw"`
*Returns:* `number` | `boolean` | `null`


[float]
[[mathColumn_fn]]
=== `mathColumn`

Adds a column by evaluating `TinyMath` on each row. This function is optimized for math, so it performs better than the <<mapColumn_fn>> with a <<math_fn>>.
*Accepts:* `datatable`

[cols="3*^<"]
|===
|Argument |Type |Description

|id ***
|`string`
|id of the resulting column. Must be unique.

|name ***
|`string`
|The name of the resulting column. Names are not required to be unique.

|_Unnamed_

Alias: `expression`
|`string`
|A `TinyMath` expression evaluated on each row. See https://www.elastic.co/guide/en/kibana/current/canvas-tinymath-functions.html.

|`onError`

|`string`
|In case the `TinyMath` evaluation fails or returns NaN, the return value is specified by onError. For example, `"null"`, `"zero"`, `"false"`, `"throw"`. When `"throw"`, it will throw an exception, terminating expression execution.

Default: `"throw"`

|`copyMetaFrom`

|`string`, `null`
|If set, the meta object from the specified column id is copied over to the specified target column. Throws an exception if the column doesn't exist
|===

*Returns:* `datatable`


[float]
[[metric_fn]]
=== `metric`
Expand Down Expand Up @@ -2581,7 +2622,7 @@ Default: `false`
[[staticColumn_fn]]
=== `staticColumn`

Adds a column with the same static value in every row. See also <<alterColumn_fn>> and <<mapColumn_fn>>.
Adds a column with the same static value in every row. See also <<alterColumn_fn>>, <<mapColumn_fn>>, and <<mathColumn_fn>>.

*Accepts:* `datatable`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ export * from './moving_average';
export * from './ui_setting';
export { mapColumn, MapColumnArguments } from './map_column';
export { math, MathArguments, MathInput } from './math';
export { mathColumn, MathColumnArguments } from './math_column';
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

import { i18n } from '@kbn/i18n';
import { ExpressionFunctionDefinition } from '../types';
import { math, MathArguments } from './math';
import { Datatable, DatatableColumn, getType } from '../../expression_types';

export type MathColumnArguments = MathArguments & {
id: string;
name?: string;
copyMetaFrom?: string | null;
};

export const mathColumn: ExpressionFunctionDefinition<
'mathColumn',
Datatable,
MathColumnArguments,
Datatable
> = {
name: 'mathColumn',
type: 'datatable',
inputTypes: ['datatable'],
help: i18n.translate('expressions.functions.mathColumnHelpText', {
defaultMessage:
'Adds a column calculated as the result of other columns. ' +
'Changes are made only when you provide arguments.' +
'See also {alterColumnFn} and {staticColumnFn}.',
values: {
alterColumnFn: '`alterColumn`',
staticColumnFn: '`staticColumn`',
},
}),
args: {
...math.args,
id: {
types: ['string'],
help: i18n.translate('expressions.functions.mathColumn.args.idHelpText', {
defaultMessage: 'id of the resulting column. Must be unique.',
}),
required: true,
},
name: {
types: ['string'],
aliases: ['_', 'column'],
help: i18n.translate('expressions.functions.mathColumn.args.nameHelpText', {
defaultMessage: 'The name of the resulting column. Names are not required to be unique.',
}),
required: true,
},
copyMetaFrom: {
types: ['string', 'null'],
help: i18n.translate('expressions.functions.mathColumn.args.copyMetaFromHelpText', {
defaultMessage:
"If set, the meta object from the specified column id is copied over to the specified target column. If the column doesn't exist it silently fails.",
}),
required: false,
default: null,
},
},
fn: (input, args, context) => {
const columns = [...input.columns];
const existingColumnIndex = columns.findIndex(({ id }) => {
return id === args.id;
});
if (existingColumnIndex > -1) {
throw new Error('ID must be unique');
}

const newRows = input.rows.map((row) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this processed in async chunks, in order to give the thread some time to run some small tasks here and there if very big tables are passed.
Lodash exposes a chunks utility for this. What do you think?

return {
...row,
[args.id]: math.fn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still calling math separately for each row which causes the tinymath parser to run many times. If there are a lot of rows, this becomes relevant to performance (~4k rows with a very simple formula - can get worse when multiple math contexts are used for column wise calculations):
Screenshot 2021-06-14 at 11 00 15

I propose we cache the ast by not calling evaluate, but parse, then interpret. This can be done either by pulling the math logic into this expression function so we can simply call parse once, then interpret for every row, or by using memoize-one in the math function on the parse call.

Can be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the memoization to tinymath in this PR as it definitely improves the overall speed.

{
type: 'datatable',
columns: input.columns,
rows: [row],
},
{
expression: args.expression,
onError: args.onError,
},
Comment on lines +84 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object could be declared on top and reused over and over. Just saving some memory.

I've made also an experiment reusing the same table "template" above, but in terms of performance results were negligible for a medium size table, so not worth the hack.

context
),
};
});
const type = newRows.length ? getType(newRows[0][args.id]) : 'null';
const newColumn: DatatableColumn = {
id: args.id,
name: args.name ?? args.id,
meta: { type, params: { id: type } },
};
if (args.copyMetaFrom) {
const metaSourceFrom = columns.find(({ id }) => id === args.copyMetaFrom);
newColumn.meta = { ...newColumn.meta, ...(metaSourceFrom?.meta || {}) };
}

columns.push(newColumn);

return {
type: 'datatable',
columns,
rows: newRows,
} as Datatable;
},
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

import { mathColumn } from '../math_column';
import { functionWrapper, testTable } from './utils';

describe('mathColumn', () => {
const fn = functionWrapper(mathColumn);

it('throws if the id is used', () => {
expect(() => fn(testTable, { id: 'price', name: 'price', expression: 'price * 2' })).toThrow(
`ID must be unique`
);
});

it('applies math to each row by id', () => {
const result = fn(testTable, { id: 'output', name: 'output', expression: 'quantity * price' });
expect(result.columns).toEqual([
...testTable.columns,
{ id: 'output', name: 'output', meta: { params: { id: 'number' }, type: 'number' } },
]);
expect(result.rows[0]).toEqual({
in_stock: true,
name: 'product1',
output: 60500,
price: 605,
quantity: 100,
time: 1517842800950,
});
});

it('handles onError', () => {
const args = {
id: 'output',
name: 'output',
expression: 'quantity / 0',
};
expect(() => fn(testTable, args)).toThrowError(`Cannot divide by 0`);
expect(() => fn(testTable, { ...args, onError: 'throw' })).toThrow();
expect(fn(testTable, { ...args, onError: 'zero' }).rows[0].output).toEqual(0);
expect(fn(testTable, { ...args, onError: 'false' }).rows[0].output).toEqual(false);
expect(fn(testTable, { ...args, onError: 'null' }).rows[0].output).toEqual(null);
});

it('should copy over the meta information from the specified column', async () => {
const result = await fn(
{
...testTable,
columns: [
...testTable.columns,
{
id: 'myId',
name: 'myName',
meta: { type: 'date', params: { id: 'number', params: { digits: 2 } } },
},
],
rows: testTable.rows.map((row) => ({ ...row, myId: Date.now() })),
},
{ id: 'output', name: 'name', copyMetaFrom: 'myId', expression: 'price + 2' }
);

expect(result.type).toBe('datatable');
expect(result.columns[result.columns.length - 1]).toEqual({
id: 'output',
name: 'name',
meta: { type: 'date', params: { id: 'number', params: { digits: 2 } } },
});
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import {
movingAverage,
mapColumn,
math,
mathColumn,
} from '../expression_functions';

/**
Expand Down Expand Up @@ -342,6 +343,7 @@ export class ExpressionsService implements PersistableStateService<ExpressionAst
movingAverage,
mapColumn,
math,
mathColumn,
]) {
this.registerFunction(fn);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,28 +100,11 @@ export const formulaOperation: OperationDefinition<
return [
{
type: 'function',
function: 'mapColumn',
function: 'mathColumn',
arguments: {
id: [columnId],
name: [label || defaultLabel],
exp: [
{
type: 'expression',
chain: currentColumn.references.length
? [
{
type: 'function',
function: 'math',
arguments: {
expression: [
currentColumn.references.length ? `"${currentColumn.references[0]}"` : ``,
],
},
},
]
: [],
},
],
expression: [currentColumn.references.length ? `"${currentColumn.references[0]}"` : ``],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is causing the failing test with an empty formula which is annoying - I tried to fix it using staticColumn, but it's missing separate id/name params. We could add those, not sure whether there's a more elegant solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best solution I've found is to use mapColumn with an empty expression.

},
},
];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,25 +45,12 @@ export const mathOperation: OperationDefinition<MathIndexPatternColumn, 'managed
return [
{
type: 'function',
function: 'mapColumn',
function: 'mathColumn',
arguments: {
id: [columnId],
name: [columnId],
exp: [
{
type: 'expression',
chain: [
{
type: 'function',
function: 'math',
arguments: {
expression: [astToString(column.params.tinymathAst)],
onError: ['null'],
},
},
],
},
],
expression: [astToString(column.params.tinymathAst)],
onError: ['null'],
},
},
];
Expand Down