Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cumulative sum expression function #80129

Merged
merged 11 commits into from
Oct 20, 2020
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
/*
* Licensed to Elasticsearch B.V. under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch B.V. licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

import { i18n } from '@kbn/i18n';
import { ExpressionFunctionDefinition } from '../types';
import { Datatable, DatatableRow } from '../../expression_types';

export type ExpressionFunctionCumulativeSum = ExpressionFunctionDefinition<
'cumulative_sum',
Datatable,
{ by?: string[]; column: string },
Datatable
>;

/**
* Returns a string identifying the group of a row by a list of columns to group by
*/
function getBucketIdentifier(row: DatatableRow, groupColumns?: string[]) {
return (groupColumns || [])
.map((groupColumnId) => (row[groupColumnId] == null ? '' : String(row[groupColumnId])))
.join('|');
}

/**
* Calculates the cumulative sum of a specified column in the data table.
*
* Also supports multiple series in a single data table - use the `by` argument
* to specify the columns to split the calculation by.
* For each unique combination of all `by` columns a separate cumulative sum will be calculated.
* The order of rows won't be changed - this function is not adding or removing rows and columns,
* it's only changes the values of the column specified by the `column` argument.
*
* Behavior:
* * Will overwrite the specified column with the cumulative sum.
* * Cumulative sums always start with 0, a cell will contain its own value plus the values of
* all cells of the same series further up in the table.
*
* Edge cases:
* * If `column` contains `null` or `undefined`, it will be ignored and overwritten with the cumulative sum of
* all cells of the same series further up in the table.
flash1293 marked this conversation as resolved.
Show resolved Hide resolved
* * For all values besides `null` and `undefined`, the value will be cast to a number before it's added to the
* cumulative sum of the current series - if this results in `NaN` (like in case of objects), all cells of the
* current series will be set to `NaN`.
* * To determine separate series defined by the `by` columns, the values of these columns will be cast to strings
* before comparison. If the values are objects, the return value of their `toString` method will be used for comparison.
* Missing values (`null` and `undefined`) will be treated as empty strings.
*/
export const cumulativeSum: ExpressionFunctionCumulativeSum = {
name: 'cumulative_sum',
type: 'datatable',

inputTypes: ['datatable'],

help: i18n.translate('expressions.functions.cumulativeSum.help', {
defaultMessage: 'Calculates the cumulative sum of a column in a data table',
}),

args: {
by: {
help: i18n.translate('expressions.functions.cumulativeSum.args.byHelpText', {
defaultMessage: 'Column to split the cumulative sum calculation by',
}),
multi: true,
types: ['string'],
required: false,
},
column: {
help: i18n.translate('expressions.functions.cumulativeSum.args.columnHelpText', {
defaultMessage: 'Column to calculate the cumulative sum of',
}),
types: ['string'],
required: true,
},
},

fn(input, { by, column }) {
const accumulators: Partial<Record<string, number>> = {};
flash1293 marked this conversation as resolved.
Show resolved Hide resolved
return {
...input,
rows: input.rows.map((row) => {
const newRow = { ...row };

const bucketIdentifier = getBucketIdentifier(row, by);
const accumulatorValue = accumulators[bucketIdentifier] ?? 0;
const currentValue = newRow[column];
if (currentValue != null) {
newRow[column] = Number(newRow[column]) + accumulatorValue;
accumulators[bucketIdentifier] = newRow[column];
} else {
newRow[column] = accumulatorValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this again, it does seem a bit weird that the comulative sum would overwrite the current value, or at least that being the only option.

could we add another argument, name which would be the name of the new column ? (what map does)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and i guess (what do you think @lukeelmers ) we can still replace the column if no name is provided ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but didn't implement eventually because you can put that together quite easily yourself using mapColumn like in the example in the description (trying to keep the function as single-purpose as possible). I'm fine with adding it if you think it's the right thing though, in that case what about input and output arguments (name/column isn't really descriptive anymore)

Copy link
Contributor Author

@flash1293 flash1293 Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about it again, what about the following API?

  • name or unnamed arg are the target column and a required argument.
  • from is an optional argument and references the column to calculate the cumulative sum for - if it's not set, it defaults to name

Then you can do these:

demodata | cumulative_sum "price"
demodata | cumulative_sum "price" by="state"
demodata | cumulative_sum "cumulative_price" from="price" by="state"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can't replace the column in Lens because the same inner column could be used for multiple metrics. Imagine that the user has configured Count and Cumulative sum of count- we need to use the original column, and create a copy.

  2. There are no existing expression functions which are able to create a column with the fieldFormatter params, or that can "copy" an existing column

For both these reasons, my preference is to use the API we last discussed here: #61775 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ to not overwriting the column (which it sounds like we can't do anyway)

There are no existing expression functions which are able to create a column with the fieldFormatter params

As an aside: I didn't realize mapColumn doesn't copy the full meta, but perhaps it should be provided as an option there as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we ever want to change formatter (looking at the issue linked above)? i think that's not a good idea. I think the resulting column should keep the same format information as original column had (why would doing a sum/derivative/ some other calculation affect the format ?)

also thinking about it again, seems overriding existing column:

  • can't be done for various reasons where we need source column later
  • could be an edge case, but we can also achieve this by just dropping original column later, so i think we should leave the option to replace the column out, which makes target column id/name required parameters (if name is not provided we can use name=id)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we ever want to change formatter (looking at the issue linked above)?

I'm not sure about the use case for changing the formatter either, but it does feel like if we are making any expression functions that copy columns (currently just mapColumn) we should at least be preserving the full meta, or as much of it as makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely see your point @lukeelmers, but mapColumn is not necessarily mapping just a single column, it's acting on the whole row. Maybe a separate function would be justified. I will create a separate issue for discussing this (IMHO it doesn't block moving forward with this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Yeah it shouldn't block moving forward with this, but point taken on it acting on a whole row.

}

return newRow;
}),
};
},
};
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import { variableSet } from './var_set';
import { variable } from './var';
import { AnyExpressionFunctionDefinition } from '../types';
import { theme } from './theme';
import { cumulativeSum } from './cumulative_sum';

export const functionSpecs: AnyExpressionFunctionDefinition[] = [
clog,
Expand All @@ -34,6 +35,7 @@ export const functionSpecs: AnyExpressionFunctionDefinition[] = [
variableSet,
variable,
theme,
cumulativeSum,
];

export * from './clog';
Expand All @@ -43,3 +45,4 @@ export * from './kibana_context';
export * from './var_set';
export * from './var';
export * from './theme';
export * from './cumulative_sum';
Loading