Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Research] Document Timeline visualization expression params, function params and their usage #3408

Closed
ananzh opened this issue Feb 9, 2023 · 5 comments
Assignees
Labels
de-angular de-angularize work docs Improvements or additions to documentation timeline visualizations Issues and PRs related to visualizations

Comments

@ananzh
Copy link
Member

ananzh commented Feb 9, 2023

What is timeline visualization?

OpenSearch Dashboards Timeline is a visualization tool used to create time-series charts and analyze time-series data in OpenSearch. Timeline uses a unique syntax to create time-series visualizations. The syntax includes a chain of commands that are executed in a particular order.

When could we use timeline visualization?

There are many useful use cases to visualize time-series data in a flexible and intuitive manner.

  • Analyzing website traffic: You can easily create visualizations that show the number of users to your website over time, along with other metrics such as bounce rate and time on site. This can help you identify trends and patterns in user behavior, as well as track the effectiveness of marketing campaigns.

  • Monitoring system performance: Timeline can be used to monitor system performance metrics such as CPU usage, memory usage, and disk space. By creating real-time visualizations of these metrics, you can quickly identify potential bottlenecks or performance issues and take action to resolve them.

  • Analyzing social media trends: Timeline can be used to track social media mentions of your brand or product over time, allowing you to see how sentiment is changing and identify potential issues or opportunities.

  • Monitoring financial metrics: Timeline can be used to create visualizations of financial metrics such as stock prices, exchange rates, and commodity prices. This can help you identify trends and make informed decisions about investments.

There are definitely more use cases. It is a useful tool to visualize the time series data.

About timeline data

Need to have a @timestamp field which need to be type date.

@ananzh ananzh added the enhancement New feature or request label Feb 9, 2023
@ananzh ananzh self-assigned this Feb 9, 2023
@ananzh ananzh added timeline visualizations Issues and PRs related to visualizations docs Improvements or additions to documentation and removed enhancement New feature or request untriaged labels Feb 9, 2023
@ananzh ananzh changed the title [Research] Document Timeline visualization expression params, chain params and their usage [Research] Document Timeline visualization expression params, function params and their usage Feb 9, 2023
@ananzh ananzh added the de-angular de-angularize work label Feb 9, 2023
@ananzh
Copy link
Member Author

ananzh commented Mar 7, 2023

Timeline expression functions and params

.abs()

Get the absolute value of each time series data point. No param required. Example:

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=max:bytes, split = machine.os.keyword:5).multiply(-1).abs()

.add()/.plus()/.sum()

This function is used to add the values of two or more series together.

example

  • .es(index=opensearch_dashboards_sample_data_logs).add(100)

Number of documents plus 100 in opensearch_dashboards_sample_data_logs in the time range.

  • .es(index=opensearch_dashboards_sample_data_logs).add(.es(index=opensearch_dashboards_sample_data_logs))

Double the number of documents in opensearch_dashboards_sample_data_logs in the time range.

.aggregation()

This function is used to group the data points into buckets and calculate summary statistics for each bucket. It accepts a param called function.

params

function

avg: This parameter calculates the average value of the data points within each bucket.
cardinality: This parameter calculates the number of unique values within each bucket.
min: This parameter calculates the minimum value of the data points within each bucket.
max: This parameter calculates the maximum value of the data points within each bucket.
last: This parameter retrieves the last data point within each bucket.
first: This parameter retrieves the first data point within each bucket.
sum: This parameter calculates the sum of the data points within each bucket.

example

  • .es(index=opensearch_dashboards_sample_data_logs, timefield=@timestamp, metric=count:request).aggregate(function=avg)

.bars()

This function is used to create bar charts that visualize the data over time. The function takes a series of data points and groups them into discrete intervals, typically representing time intervals such as hours, days, or months. The height of each bar represents the sum of the values within that interval.

params

width

This parameter determines the width of each bar in the chart. The default value is 400, but you can adjust this value to make the bars wider or narrower.

stack

This parameter determines whether the bars should be stacked on top of each other or displayed side by side. If set to true, the bars will be stacked on top of each other, and if set to false, they will be displayed side by side. The default value is true.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q=response:200).bars(width=20, stack=true)

The .bars() function is used to create a bar chart with a width of 20, stacked bars.

.color()

The color function allows you to change the color of the lines or bars in your visualizations. The color function takes a string as an argument and assigns a unique color to each string value. This can be useful for differentiating between multiple data series in your visualization. Color could be hex color, for example #c6c6c6, or a regular color word.

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=avg:bytes, split = machine.os.keyword:5).color("purple:red:green:yellow:red")

Calculate the average value of bytes field from opensearch_dashboards_sample_data_logs index in the time range and split the data into 5 series based on machine.os.keyword. Then color machine.os series with different colors.

.cusum()

This function is used to calculate the cumulative sum of a time series data. It can be used to identify the trend or change points in the data over time.

params

base

This parameter is used to specify the starting value for the cumulative sum calculation. By default, it is set to 0.

example

  • .es(index=opensearch_dashboards_sample_data_logs, timefield=@timestamp, metric=count:request).cusum(base=500)

The .cusum() function is used to calculate the cumulative sum of the requests, starting from 500.

.derivative()

This function is used to calculate the rate of change of a time series data. It can be used to identify the trends or changes in the data over time.

example

  • .es(index=opensearch_dashboards_sample_data_logs, timefield=@timestamp, metric=count:request).derivative()

The .derivative() function is used to calculate the rate of change of requests over time.

.divide()

This function is used to calculate the quotient of two time series data. It can be used to analyze the relationship between two data sets or to create ratios.

params

divisor

This parameter specifies the second time series to be divided by the first time series. The parameter is mandatory.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q=response:200).divide(.es(index=opensearch_dashboards_sample_data_logs, q=response:404)).label("200/404 Ratio")

This example retrieves the count of status code 200 and 404 responses over time. The .divide() function is used to calculate the quotient of status code 200 divided by status code 404. The label "200/404 Ratio" is applied to the resulting chart.

Note that the .divide() function requires two time series data sets to work. In the examples above, we used the .es() function to retrieve the necessary data sets from the index. The divisor parameter specifies the second data set to be divided by the first data set. The resulting time series data set will have values representing the quotient of the two input data sets.

.es() /.elasticsearch() / .opensearch(): expression function

Pull data from OpenSearch. It often work with query (q), metric, index, timefield, openSearchDashboards, split, offset.

params

fit ??? (ToDo: what is the difference between .fit() function?)

It is used to fit series to target interval. It has the following options:

  • average
  • carry
  • nearest
  • none
  • scale

index

Specify the OpenSearch index to retrieve data from.

interval

Defines the interval for the time buckets in the chart. It can be a fixed interval or an auto interval based on the selected time range.

kibana/opensearchDashboards

If set to true, the .kibana index will be searched for saved visualizations and dashboards.

metric

This param specifies the metric (aggregation) to calculate for each time bucket. Examples include "count", "avg", "sum", "min", and "max".

offset

This param is used to shift the time range of the query by a specified amount.

q

Query function allows you to filter data using a query string syntax similar to what is used in the Dashboards or Discover search bar.

split

Divide the data into multiple series based on a particular field

timefield

This is a parameter that specifies the name of the time-based field in the OpenSearch index being queried. This field is used as the x-axis for the visualization, and it represents the time range for the data being plotted.

example

  • .es(*, timefield=timestamp, metric=avg:bytes, split=response.keyword:5, kibana=true)

Calculate the average value of bytes field from all indices including .kibana index with timestamp as the time field in the time range and split the data into maximum 5 buckets based on response.keyword.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=count, q='(response:503 OR response:404) AND (machine.os:ios)')

Calculate count of documents matching response either 503 or 404 and machine.os is ios in the time range.

  • .es(index=opensearch_dashboards_sample_data_logs, q=machine.os:ios, split=response:10)

.fit()

The .fit() function is used to adjust the scaling of a time series to fit within a specific range of values. It has five modes.

params

mode = average

Average the time series to smooth out noisy data.

mode = carry

Carry forward the previous value of the time series.

mode = nearest

Use the nearest value in the time series.

mode = none

Do not adjust the time series.

mode = scale

This mode scales the time series so that its values fit within a specific range of values.

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=max:memory).fit(average)
    See the difference from the below two graphs.

Screenshot 2023-03-02 at 19 14 55

Screenshot 2023-03-02 at 19 14 43

.hide()

This function is used to hide specific series chart. It allows you to exclude certain series from the visualization without deleting the entire graph.

example

  • .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).hide(hide=true), .es(...)
    There are two expressions here and the first one will not show in the visualization.

.holt()

This function is used to forecast future data points in a time series using the Holt-Winters method, which is a popular algorithm for time series forecasting. (Reference: https://orangematter.solarwinds.com/2019/12/15/holt-winters-forecasting-simplified/)

params

alpha

The smoothing factor for the level component of the Holt-Winters method. This factor considers the prediction ability of using average value. The default value is 0.1. Increasing alpha will make the new series more closely follow the original. Lowering it will make the series smoother.

beta

The smoothing factor for the trend component of the Holt-Winters method. This factor considers the prediction ability of using slope. The default value is 0.1. Increasing beta will make rising/falling lines continue to rise/fall longer. Lowering it will make the function learn the new trend faster.

gamma

The smoothing factor for the seasonal component of the Holt-Winters method. This factor considers the prediction ability of using of cyclical repeating pattern (seasonality). The default value is 0.1. Increasing this will give recent seasons more importance, thus changing the wave form faster. Lowering it will reduce the importance of new seasons, making history more important.

season

The length of the seasonal period. For example, if your data has a weekly seasonal pattern, season should be set to 7. The default value is null, which means that the algorithm will try to automatically detect the seasonal period.

sample

The number of seasons to sample before starting to predict in a seasonal series. (Only useful with gamma, Default: all)

example

  • .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).holt(0.8, 0.2, 0.8, 2h).color(#c6c6c6).lines(10).label('Prediction Value'), .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).color(red).lines(1).label('Actual Value')

Screenshot 2023-03-02 at 21 04 15

Compare the difference of predicted max bytes value and actual value.

  • .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).holt(0.8, 0.2, 0.8, 2h).color(#c6c6c6).lines(10).label('Prediction Value'), .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).color(red).lines(1).label('Actual Value'), .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).lines(1).holt(0.8, 0.2, 0.8, 2h).subtract(.es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes)).abs().if(lt, 500, null, .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes)).points(5,2,1.5).color(green).label('Anomaly Value').title('max types anomaly')

This example is to get anomaly values of max bytes.

Screenshot 2023-03-02 at 21 00 09

.if( )

If function is a conditional statement that allows you to apply different calculations or aggregations to your data based on a certain condition.

params

operator

if

The condition.

then:

Accept the true expression

else:

Accept the false expression. This param is not required.

example

  • .es(index=opensearch_dashboards+Sheet1!D30, metric=avg:bytes).if(operator=lt, if=7500, then=.es(index=opensearch_dashboards_sample_data_logs, metric=min:bytes), else=.es(index=opensearch_dashboards_sample_data_logs, metric=max:bytes).min(value=7500))

If the average bytes is less than 7500, then show the min value of bytes field, else show the the minimum value of max bytes and 7500. Since all the max bytes are greater than 7500, it actually add a cap to the graph and only focus on the low values.

.label( )

This function is used to assign a label to a data series in a Timeline chart. This label can be used to identify the data series in the legend of the chart, as well as in other parts of the Timeline UI.

params

regex

A regular expression string that can be used to match a portion of the data series name, and replace it with the label. This parameter is optional.

label

A string value that specifies the label to assign to the data series. This parameter is required.

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=avg:bytes, split=geo.dest:10).label($1, "^.* > geo.dest:(\S+) > .*")

.legent()

This function is used to add a legend to a Timeline chart. The legend provides information about the different series displayed in the chart. Here are the available parameters of the .legend() function:

params

position

Specifies the position of the legend. The default value is nw (northwest). Other possible values are ne (northeast), se (southeast), sw (southwest), n (north), e (east), s (south), w (west), and none (to disable the legend).

columns

Specifies the number of columns in which the legend should be displayed. The default value is 1.

showTime

Specifies whether to display the time range of the chart in the legend. The default value is true.

timeFormat

Specifies the date format to use for the time range displayed in the legend. The default value is YYYY-MM-DD HH:mm:ss.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q='response.keyword:200').label('200 count').color('red'), .es(index=opensearch_dashboards_sample_data_logs).label('total count').color('#2E7D32').legend(position='ne', columns=2)

Add a legend to the chart, positioned in the northeast corner with two columns.

.lines()

This function is used to plot a line chart of the selected data. It takes in several parameters to customize the appearance and behavior of the chart.

params

fill

Whether to fill the area under the line with color. Default is none.

width

The width of the line in pixels. Default is 1.

show

Whether to show the line chart or not. Default is true.

stack

Whether to stack the line chart on top of previous lines. Default is false.

steps

Whether to connect data points with straight lines or with steps. Default is false.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:200").color('green').lines(fill=5, width=5, stack=true, steps=true)

This will plot a stacked line chart, with the area under the lines filled with color, a line width of 5 pixels, and connected by steps.

.log()

This function is used to calculate the logarithm of the input values.

params

base

Base is an optional parameter that specifies the base of the logarithm to be calculated. If no base is specified, the function defaults to using the natural logarithm (base e).

example

  • .es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:200").log(2)

In this example, the .log(2) function is used to calculate the base-2 logarithm of the result.

.max()

Sets the point to whichever is higher, the existing value, or the one passed. If passing a seriesList it must contain exactly 1 series.

example

  • .es(*).max(value=20)

Count of documents which have bytes value greater than the minimum avg bytes in the time range.

.min()

Sets the point to whichever is lower, the existing value, or the one passed. If passing a seriesList it must contain exactly 1 series.

example

  • .es(*).min(value=20)

Count of documents which have bytes value lower than the minimum avg bytes in the time range.

.multiply()

This function is used to multiply the values of two or more series. It takes the values from each of the specified series and multiplies them together to create a new series.

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=count).multiply(-1)

Multiple -1 to all the data points.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=max:hour_of_day, split = machine.os.keyword:5).multiply(.max(.es(index=opensearch_dashboards_sample_data_logs, metric=max:bytes, q='response.keyword:200')))

Multiple a time series.

.mvavg()

This function is used to calculate the moving average of a time series data.

params

window

The size of the moving window (number of data points) or a date math expression (e.g. 1d, 1m) to average over for the moving average calculation. This parameter is required.

position

The position of the output point within the window. The default value is center, which means the output point is centered within the window.

example

  • .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).mvavg(window=10, position=center)

This expression calculates the moving average of the max bytes over a window of 10 data points, with the output point centered within the window.

  • .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).mvavg(window=10h, position=center)

This query calculates the moving average of the max bytes over a window of 10 hour, with the output point centered within the window.

.mvstd()

This funtions is used to calculate the moving standard deviation of a time series data. It can smooth out the data and remove any short-term fluctuations or noise.

params

window

Number of points to compute the standard deviation over.

position

Position of the window slice relative to the result time. Options are left, right, center. Default is left.

.points()

This function is used to display data points on a chart. It creates a point series chart with data points represented as circles, squares, or other shapes. The function takes several parameters, including:

params

fill

A number value that indicates whether to fill the data points with color.

fillColor

A string value that specifies the fill color of the data points. This value can be a hex color code or a named color.

radius

A number value that specifies the size of the data points.

show

A boolean value that indicates whether to display the data points. By default, this value is set to true.

symbol

A string value that specifies the shape of the data points. The available values are triangle, cross, square, diamond and circle.

weight

A number value that specifies the line weight of the data points.

example

  • .es(index=opensearch_dashboards_sample_data_logs,metric=max:bytes).points(fill=2,fillColor=green,radius=4,show=true,symbol=square,weight=1), .es(index=opensearch_dashboards_sample_data_logs,metric=min:bytes).points(2, 4, 1)

You can specify the value with param. You could also input three numbers which represent fill, radius and weight in turn. If you don’t specify a shape, then the default circle shape will be choosen.

.precision()

This function is used to set the number of decimal places displayed for a metric. It takes a single numeric argument that specifies the number of decimal places to display.

params

precision

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=sum:bytes).divide(100000).precision(4)

The .precision(4) function is then used to limit the number of decimal places displayed to 4.

.range()

This function is used to specify a time range for a query

params

min / max

min or max value

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=avg:bytes).range(5000, 25000)

Set a range for the count.

.scale_interval()

This function is used to adjust the interval or granularity of a time series. This function takes a single parameter which specifies the new interval.

params

interval

Note that the interval specified in .scale_interval() function must be equal or greater than the query interval specified in .es() function. For example, if you're querying data in 1-hour intervals using .es() function, you can't scale it down to 5-minute intervals using .scale_interval() function. You can only scale it up to 2-hour or 1-day intervals.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:200").scale_interval(interval=1d)

Aggregate data into 1-day intervals.

.static() / .value()

This function is used to create a static line that doesn't change based on the data . This function is useful for adding a reference line to a Timeline chart.

params

value

This parameter specifies the value at which to draw the line. Could be a single value. Could also pass several values and it will interpolate them evenly across your time range.

label

This parameter specifies the label to use for the line.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:200").scale_interval(interval=1d), .es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:200").scale_interval(interval=1d).static(value="100:200:300:400", label="bottom").color('red')

.subtract()

This function is used to subtract the values of one or more time series from another time series. This function takes one or more series as input and subtracts them from the first series. The resulting time series represents the difference between the values of the first series and the other series.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:200").subtract(.es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:403"))

.title()

This function is used to assign a custom title or label to a time series visualization. This function takes a string argument and sets it as the title or label for the corresponding time series.

example

  • .es(*).title('count of all indices')

.trend()

This function draws a trend line using a specified regression algorithm.

params

mode

The algorithm to use for generating the trend line. One of linear and log.

start / end

Here start and end calculating from the beginning or end. For example, -10 means start or end calculating 10 points from the end. 10 means start or end 10 points from the beginning. Default value is 0.

example

  • .es(index=opensearch_dashboards_sample_data_logs, q="response.keyword:200").trend(mode=log, start=10, end=-10)

.trim()

This function can also be used to trim the number of buckets in a visualization. This is done using the .trim() function in conjunction with the .fit() function.

The .fit() function is used to resize the number of buckets in a visualization, and the .trim() function is used to trim this resized visualization to a specific number of buckets.

example

  • .es(index=opensearch_dashboards_sample_data_logs, metric=count, q="response.keyword:200").fit(mode=average).trim(start=50, end=10)

Here we use the .trim() function to trim the visualization to the 50th through the 10th-to-last buckets, removing the first 50 and last 10 buckets from the visualization.

.yaxis()

The function is used to customize the y-axis of a Timeline chart. This function can be used to set various properties of the y-axis, such as its label, range, and tick format.

params

color

This param specifies the color of the y-axis label.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=avg:bytes).yaxis(label="average bytes", color="red")

label

This param specifies a label to the y-axis.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=sum:bytes).yaxis(label="sum of bytes")

max and min

Set the minimum and maximum values of the y-axis.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=sum:bytes).yaxis(min=200000, max=1000000)

This will create a Timeline chart that displays the maximum value of the types in the opensearch_dashboards_sample_data_logs index, with the y-axis ranging from 200000 to 1000000.

position

This param sets the position of the y-axis.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=count).yaxis(position="right", label="number of data")

This will create a chart that displays the count of documents in the opensearch_dashboards_sample_data_logs index, with the y-axis positioned on the right side of the chart.

tickDecimals

This parameter is used to set the number of decimal places displayed in the y-axis tick marks.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=max:bytes).divide(100000).yaxis(tickDecimals=2)

This will create a chart that displays the maximum value of bytes field divded by 100000 in the opensearch_dashboards_sample_data_logs index, with the y-axis tick marks displaying two decimal places.

units

This parameter allows you to specify the units of the y-axis values.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=sum:bytes).yaxis(units=bytes)

yaxis

This param specifies which y-axis on the chart a particular expression or metric should use.

  • .es(index=opensearch_dashboards_sample_data_logs, metric=avg:bytes).yaxis(yaxis=1), .es(index=opensearch_dashboards_sample_data_logs, metric=sum:bytes).color('red').yaxis(yaxis=3)

In this example, there are two expressions being plotted on the chart. The first expression calculates the average value of bytes and specifies that the results should be plotted on the first y-axis, which is the normal left one. The second expression calculates the sum of types and specifies that the results should be plotted on the third y-axis which is lefter one. (if yaxis=2, you will see the second y-axis shown on the right of the plot).

@ananzh
Copy link
Member Author

ananzh commented Mar 7, 2023

ToDo:

  1. how to deal with wb()? wbi()? worldbank_indicators()? (open an issue to propose to drop them)
    They are using customized world bank database and not working with 2023 data. They have same explanations but diff params which is very confusing.
  • wb() params: code, offset, fit
  • wbi()/worldbank_indicators() params: country, indicator, offset, fit

Customer might frequently see the following error due to no recent data.
2) understand and document .condition(), first(), graphite() (todo: propose to drop), .props(), .quandl()
3) understand difference between fit param in .es() function and .fit() function.
https://discuss.elastic.co/t/difference-between-fit-as-argument-and-fit-as-function/58329/2
4) add all the reference links and videos

@ananzh
Copy link
Member Author

ananzh commented May 4, 2023

Close this research issue since we have done major params usage research.

@ghost
Copy link

ghost commented Dec 19, 2023

This was closed but none of this documentation has been posted on the site. When is it planned to actually document this outside of a GitHub issue?

@csalt-liatrio
Copy link

This was closed but none of this documentation has been posted on the site. When is it planned to actually document this outside of a GitHub issue?

I am also actively looking for this documentation. What is here is close, but an official documentation is sorely needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
de-angular de-angularize work docs Improvements or additions to documentation timeline visualizations Issues and PRs related to visualizations
Projects
Development

No branches or pull requests

2 participants