Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic time warping objects' reference documentation #190

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions doc/DTW.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
:digest: Series Distance Calculation using Dynamic Time Warping
:species: data
:sc-categories: Classification, DTW
:sc-related:
:see-also: DataSeries
:description: Calculate the distance between two series using the dynamic time warping algorithm

:discussion:

To keep with the interface of the :fluid-obj:`DTWClassifier`, the DTWClassifier must first be ``fit`` with a :fluid-obj:`DataSeries` of data points and a target :fluid-obj:`LabelSet` with a label for each point in the DataSeries (by means of a shared identifier).

To classify a point, ``numNeighbours`` neighbours are determined for the incoming point, and each of those neighbours' label is given a score based on the distance to the target, neighbours with the same label only increase the likelyhood of that label being considered the nearest. The label with the highest score is considered to be the closest and returned.

Keep in mind that this is a brute-force measure, so evaluation will become very slow for large numbers of points or long series.

:control numNeighbours:

The number of neighbours to consider

:control constraint:

The constraint to use in the `DTW` algorithm when calculating the distance between two time series. 'Warping' in this context means how distorted the genral shape of the series is.
For example, a pulse with a fast attack and slow decay will register as identical to the case with fast decay and slow attack, since stretching the time series can make it match in shape. If constraints are applied however, the amount of warping is restricted, so that the general shape of the series is kept.

See https://rtavenar.github.io/blog/dtw.html#setting-additional-constraints for a beautiful visual explanation of the constraints

:enum:

:0:
**unconstrained** (any point can warp to any other)

:1:
**ikatura** (the start and end can only warp a little, whereas the middle can warp more)

:2:
**sakoe-chiba** (each point can only warp within a certain radius)


:control constraintParam:

The maximum radius a frame can warp away from its initial location when using a ``sakoe-chiba`` constraint, and parameter for the ``ikatura`` constraint when using that. A higher value results in being able to warp more. See https://rtavenar.github.io/blog/dtw.html#setting-additional-constraints for an explanation of the significance.


:message cost:

:arg dataSeries: Source :fluid-obj:`DataSeries`

:arg id1: Identifier of the first series in the :fluid-obj:`DataSeries`

:arg id2: Identifier of the second series in the :fluid-obj:`DataSeries`

Return the cost, i.e. distance, between the series ``id1`` and ``id2`` in the ``dataSeries``


:message bufCost:

:arg buffer1: |Buffer| with data for first series

:arg buffer2: |Buffer| with data for first series

Return the cost, i.e. distance, between the buffers ``buffer1`` and ``buffer2``
72 changes: 72 additions & 0 deletions doc/DTWClassifier.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
:digest: Series Classification with K-Nearest Neighbours using Dynamic Time Warping
:species: data
:sc-categories: Classification, DTW
:sc-related:
:see-also: DTW, DataSeries, LabelSet
:description: A nearest neighbour classifier using a :fluid-obj:`DTW`
:discussion:

To keep with the interface of the :fluid-obj:`KNNClassifier`, the DTWClassifier must first be ``fit`` with a :fluid-obj:`DataSeries` of data points and a target :fluid-obj:`LabelSet` with a label for each entry in the DataSeries (by means of a shared identifier).

To classify a series, ``numNeighbours`` neighbours are determined for the incoming series, and each of those neighbours' label is given a score based on the distance to the target, neighbours with the same label only increase the likelyhood of that label being considered the nearest. The label with the highest score is considered to be the closest and returned.

Keep in mind that this is a brute-force measure, so evaluation will become very slow for large numbers of series, or long series.

:control numNeighbours:

The number of neighbours to consider

:control constraint:

The constraint to use in the `DTW` algorithm when calculating the distance between two time series. 'Warping' in this context means how distorted the genral shape of the series is.
For example, a pulse with a fast attack and slow decay will register as identical to the case with fast decay and slow attack, since stretching the time series can make it match in shape. If constraints are applied however, the amount of warping is restricted, so that the general shape of the series is kept.

See https://rtavenar.github.io/blog/dtw.html#setting-additional-constraints for a beautiful visual explanation of the constraints

:enum:

:0:
**unconstrained** (any point can warp to any other)

:1:
**ikatura** (the start and end can only warp a little, whereas the middle can warp more)

:2:
**sakoe-chiba** (each point can only warp within a certain radius)


:control constraintParam:

The maximum radius a frame can warp away from its initial location when using a ``sakoe-chiba`` constraint, and parameter for the ``ikatura`` constraint when using that. A higher value results in being able to warp more. See https://rtavenar.github.io/blog/dtw.html#setting-additional-constraints for an explanation of the significance.


:message fit:

:arg dataSeries: Source :fluid-obj:`DataSeries`

:arg labelSet: A :fluid-obj:`LabelSet` of labels for the source :fluid-obj:`DataSeries`

Fit the model to a source :fluid-obj:`DataSeries` and a target :fluid-obj:`LabelSet`. The labels in the :fluid-obj:`LabelSet` correspond to the data points in the :fluid-obj:`DataSeries` by means of a shared identifier.


:message predict:

:arg dataSeries: A :fluid-obj:`DataSeries` of data series to predict labels for

:arg labelSet: A :fluid-obj:`LabelSet` to write the predicted labels into

Given the fitted model, predict labels for a :fluid-obj:`DataSeries` and write these to a :fluid-obj:`LabelSet`


:message predictSeries:

:arg buffer: A data series stored in a |buffer|

Given a fitted model, predict a label for a data series in |buffer| and return to the caller


:message clear:

Clears the :fluid-obj:`DataSeries` and :fluid-obj:`LabelSet`


72 changes: 72 additions & 0 deletions doc/DTWRegressor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
:digest: Series Regression with K-Nearest Neighbours using Dynamic Time Warping
:species: data
:sc-categories: Regression, DTW
:sc-related:
:see-also: DTW, DataSeries, DataSet
:description: A nearest neighbour interpolator/regressor using a :fluid-obj:`DTW`
:discussion:

To keep with the interface of the :fluid-obj:`KNNRegressor`, the DTWRegressor must first be ``fit`` with a :fluid-obj:`DataSeries` of data points and a target :fluid-obj:`DataSet` with a mapping for each series in the DataSeries (by means of a shared identifier).

To calculate a point, ``numNeighbours`` neighbours are determined for the incoming series, and a distance-weighted sum of those neighbours' corresponding outputs is returned.

Keep in mind that this is a brute-force measure, so evaluation will become very slow for large numbers of series or long series.

See https://rtavenar.github.io/blog/dtw.html for an explanation of the DTW algorithm, though it can roughly be summed up as a metric measuring the similarity between two sime series, while accounting for the fact that features arent necessarily the same length.

:control numNeighbours:

The number of neighbours to consider

:control constraint:

The constraint to use in the `DTW` algorithm when calculating the distance between two time series. 'Warping' in this context means how distorted the genral shape of the series is.
For example, a pulse with a fast attack and slow decay will register as identical to the case with fast decay and slow attack, since stretching the time series can make it match in shape. If constraints are applied however, the amount of warping is restricted, so that the general shape of the series is kept.

See https://rtavenar.github.io/blog/dtw.html#setting-additional-constraints for a beautiful visual explanation of the constraints

:enum:

:0:
**unconstrained** (any point can warp to any other)

:1:
**ikatura** (the start and end can only warp a little, whereas the middle can warp more)

:2:
**sakoe-chiba** (each point can only warp within a certain radius)


:control constraintParam:

The maximum radius a frame can warp away from its initial location when using a ``sakoe-chiba`` constraint, and parameter for the ``ikatura`` constraint when using that. A higher value results in being able to warp more. See https://rtavenar.github.io/blog/dtw.html#setting-additional-constraints for an explanation of the significance.

:message fit:

:arg dataSeries: Source :fluid-obj:`DataSeries`

:arg dataSet: A :fluid-obj:`DataSet` of outputs for the source :fluid-obj:`DataSeries`

Fit the model to a source :fluid-obj:`DataSeries` and a target :fluid-obj:`DataSet`. The outputs in the :fluid-obj:`DataSet` correspond to the data series in the :fluid-obj:`DataSeries` by means of a shared identifier.

:message predict:

:arg dataSeries: A :fluid-obj:`DataSeries` to predict regressions for

:arg dataSet: A :fluid-obj:`DataSet` to write the predicted outputs

Given the fitted model, predict the output for a :fluid-obj:`DataSeries` and write these to a :fluid-obj:`DataSet`

:message predictSeries:

:arg sourceBuffer: The input series stored in a |buffer|

:arg targetBuffer: A buffer to write the prediction to

Given a fitted model, predict the output for a single series in and write it to another buffer

:message clear:

Clears the :fluid-obj:`DataSeries` and :fluid-obj:`DataSet`


160 changes: 160 additions & 0 deletions doc/DataSeries.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
:digest: A set of data series associated with identifiers.
:species: data
:sc-categories: UGens>FluidManipulation
:sc-related: Classes/Dictionary
:see-also: LabelSet, DataSet, DTW
:max-seealso: dict
:description: FluidDataSeries is a container associating series of data points with identifiers
:control name:

The name of the FluidDataSeries. This is unique between all FluidDataSeries.

:message addFrame:

:arg identifier: The identifier for the series to add to.

:arg buffer: A |buffer| containing the data for the frame (only the first channel is used).

Add a new frame to the end of a series, creates the series if it does not exist. Sets the dimensionality of the DataSeries if it is the first frame added, otherwise if the buffer is too short an error will be reported.

:message addSeries:

:arg identifier: The identifier for the series to add.

:arg buffer: A |buffer| containing the data for the series (each channel is a distinct time frame).

Add a new series from a buffer. Sets the dimensionality of the DataSeries if it is the first series added, otherwise if the buffer is too short an error will be reported. If the identifier already exists an error will be reported.

:message getFrame:

:arg identifier: The identifier for the series to get from.

:arg time: which time frame to get.

:arg buffer: A |buffer| to write the frame to (only the first channel is used, will be resized).

Get a frame from a series. Negative indexing starts from the last frame. If the identifier doesn't exist or if that series doesnt have a frame for that time point an error will be reported.

:message getSeries:

:arg identifier: The identifier for the series to get.

:arg buffer: A |buffer| to write the series to (each channel is a distinct time frame, will be resized).

Get a series. If the identifier doesn't exist an error will be reported.

:message setFrame:

:arg identifier: The identifier for the series to set a frame in.

:arg time: which time frame to set.

:arg buffer: A |buffer| containing the data for the frame (only the first channel is used).

Updates a time frame in a series, or adds it to the end if there is no frame at that time point. Negative indexing starts from the last frame. Sets the dimensionality of the DataSeries if it is the first frame added, otherwise if the buffer is too short an error will be reported.

:message setSeries:

:arg identifier: The identifier for the series to set.

:arg buffer: A |buffer| containing the data for the series (each channel is a distinct time frame).

Updates a time series, or adds it if it doesn't exist. Sets the dimensionality of the DataSeries if it is the first series added, otherwise if the buffer is too short an error will be reported.

:message updateFrame:

:arg identifier: The identifier for the series to update a frame in.

:arg time: which time frame to update.

:arg buffer: A |buffer| containing the data for the frame (only the first channel is used).

Updates an existing frame. Negative indexing starts from the last frame. If the buffer is too short an error will be reported. If the identifier doesn't exist or if that series doesnt have a frame for that time point an error will be reported.

:message updateSeries:

:arg identifier: The identifier for the series to update.

:arg buffer: A |buffer| containing the data for the series (each channel is a distinct time frame).

Updates a new series. If the buffer is too short an error will be reported. If the identifier doesn't exist an error will be reported.

:message deleteFrame:

:arg identifier: The identifier for the series to delete a frame from.

:arg time: which time frame to remove.

Delete a frame from a series, deletes the series if it is the only frame. Negative indexing starts from the last frame. If the identifier doesn't exist or if that series doesnt have a frame for that time point an error will be reported.

:message deleteSeries:

:arg identifier: The identifier for the series to delete.

Delete a series. If the identifier doesn't exist an error will be reported.

:message getDataSet:

:arg time: which time frame to extract.

:arg dataSet: The Dataset to write the slice to. Will overwrite and resize.

Get a dataset with the `time`th frame of every series, i.e. can create a :fluid-obj:`DataSet` with every Nth frame of every series. Negative indexing starts from the last frame. If an identifier doesn't have enough frames it is merely not added to the output dataset.

:message clear:

Empty the data series of all series and frames.

:message getIds:

:arg labelSet: The FluidLabelSet to export to. Its content will be replaced.

Export the dataseries identifiers to a :fluid-obj:`LabelSet`.

:message merge:

:arg sourceDataSeries: The source DataSeries to be merged.

:arg overwrite: A flag to allow overwrite points with the same identifier.

Merge sourceDataSeries in the current DataSeries. It will replace the value of points with the same identifier if overwrite is set to 1.

:message kNearest:

:arg buffer: A |buffer| containing a data point to match against.

:arg k: The number of nearest neighbours to return.

Returns the identifiers of the ``k`` points nearest to the one passed in distance order (closest first). Note that this is a brute force distance measure, and inefficient for repeated queries against large dataseries.

:message kNearestDist:

:arg buffer: A |buffer| containing a data point to match against. The number of frames in the buffer must match the dimensionality of the DataSet.

:arg k: The number of nearest neighbours to return. The identifiers will be sorted, beginning with the nearest.

Returns the distances to the ``k`` points nearest to the one passed in descending order. Note that this is a brute force distance measure, and inefficient for repeated queries against large dataseries.

:message print:

Post an abbreviated content of the DataSeries in the window by default, but you can supply a custom action instead.

:message read:

:arg filename: optional, filename to save to

Read a saved object in JSON format from disk, will prompt for file location if not filename not provided

:message write:

:arg filename: optional, filename to save to

Save the contents of the object to a JSON file on disk to the file specified, will prompt for file location if not filename not provided

:message load:

Load the state of this object from a Dictionary.

:message dump:

Dump the state of this object as a Dictionary.
5 changes: 5 additions & 0 deletions example-code/sc/DTW.scd
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
code::

//soon

::
5 changes: 5 additions & 0 deletions example-code/sc/DTWClassifier.scd
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
code::

//soon

::
5 changes: 5 additions & 0 deletions example-code/sc/DTWRegressor.scd
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
code::

//soon

::
5 changes: 5 additions & 0 deletions example-code/sc/DataSeries.scd
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
code::

//soon

::
Loading