`DataSeries` object for time-based objects #247

lewardo · 2023-08-22T11:31:12Z

@weefuzzy For many potential new objects implementing a DataSeries object for datasets encompassing time would be incredibly useful.

In terms of implementation I had a few ideas but do not know which would be better with the rest of the codebase

A client wrapper around a DataSetClient, with each point being N * frameLen long and dealing with the views and accessing in the wrapper/client
A client based on a DataSet of rank 2, and the time being captured in the second point dimension
A whole new DataSeries algorithm that is similar to the DataSet but separates the subtleties out in a new alg for cleanliness

The issue with all of them I'm seeing is the memory allocation of the RT updating points, I wanted to confirm the way that FluidTensors grow and shrink is memory-efficient, so that updating the length of a point in the middle of a dataset won't have to shift the rest of it down in memory.
Currently, you can update a point but not add to it so pushing a time frame onto a point would involve copying that point in its entirety (not a FluidTensorView) and concatenating that externally then replacing that point in the DataSet, unless FluidTensor were to be given an equivalent of push_back that would allow in-place expansion of central elements

For interface @tremblap enjoyed the idea of keeping a similar one to DataSet, namely that addPoint be the message to push another frame from a buffer to that Id in the dataset, and additional messages could be implemented to load a whole series from a buffer (the issue here would be that the DataSeries time dimension would not be the buffer time dimension, time in the buffer would hav eot be captured from channel 1 to T), but all of these ideas are still up for discussion.

This object is the gateway into being able to implement the more useful algs like DTW and various flavours of RNN

Apologies for the barrage of UI and implementation questions, I am aware you haven't much time but any guidance however brief would be appreciated, I know not the style of implementation that would be best or the nuances of my implementation that may or may not lead to terrible memory performance.

weefuzzy · 2023-08-22T14:25:48Z

Thanks @lewardo

Yes, we need something like this container for time-series algorithms, but it's a can of worms. I'd started to think about it for my Echo State Network work (a kind of RNN), but ran out of time.

So, aspects of the can-of-worms-ness:

– The basic interface I had in mind was that each point would indeed be m frames of n-dim vectors, with the imposed constraint that within a DataSeries (or whatever) n is invariant across the whole container.
– That's immediately a pain in the arse on PD, which doesn't have multichannel arrays, and therefore no convenient mechanism for adding points (though the existing hackyness with clone could presumably be made to work)
– Meanwhile, it wasn't at all clear that the underlying data structure of DataSet would do for this. DataSet is neither especially cache-friendly, nor at all thread-safe, so I wasn't wild about making any more code depend on it.
– Updates into the middle of FluidTensor aren't especially efficient, because the underlying container is a std::vector. There are other types of semi-contiguous container that try and strike different balances between insertion cost and cache friendliness, but not yet in the standard lib. Moreover, we have to deal with a map structure as well, and the looming problems of thread safety.

FWIW, DataSet should never be mutated in the audio thread, until we have more sophisticated stuff in place both for concurrency and allocation.

lewardo · 2023-08-22T14:51:35Z

as a first concept to allow parallel development of time-based algorithms, do you think a std::vector of DataSets would be a wise hacky temporary solution? It would circumvent the middle-access issue as each Dataset (representing a single time slice/set of frames) isn't ordered and as long as I deal with having the same id for each slice it could feasibly be more efficient than other options; it does however introduce the overhead of many identical ID sets etc., so I could implement a lightweight version of a DataSet that is one of many in the DataSeries object, each time slice being one of those, unordered with the id-index mappings to allow middle insertion to be more efficient ?

weefuzzy · 2023-08-22T14:59:12Z

I think an adapted container that used a std::vector<std::vector>> internally, and a single map would cause you less pain. Longer-term, I don't think middle insertion is so common that its efficiency should necessarily be of prime importance, compared to thinking about the cache friendliness of typical access patterns.

lewardo · 2023-08-23T08:22:34Z

in terms of integrating that into the FluidTensor ecosystem, do you think the best route would be to create a raw std::vector of FluidTensors, or a FluidTensor of rank one higher, which would bring back the 'middle insertion' issues when inserting a new frame to the end of a central element (iirc under the hood it's all one contiguous std::vector)

weefuzzy · 2023-08-23T08:30:45Z

Hmm. Probably a std::vector<FluidTensor<T,2>> makes life easiest – then indexing map just needs to have an integral type for the index of a given ID.

lewardo · 2023-08-23T08:32:44Z

Also from what I understand, in terms of having a single ID-index map if a user were to add multiple frames to one point before then adding to another point later it would introduce the issue that T=n for point X will not be at the same index for every X.
e.g. addPoint id-1 inBuf; addPoint id-2 inBuf; addPoint id-2 inBuf would as far as I can see lead to a data structure like

T=0 {id-1}{id-2}
T=1 {id-2}

so the std::unordered_map would have to map to index 1 for T=0 and index 0 for T=1?
Or should pushing a point result in the same index in each vector of frames, but then the order in which the user pushes the points will lead to potential massive overhead with blank vector indicies...

lewardo · 2023-08-23T08:36:25Z

or perhaps I'm thinking of it the wrong way around, would time be down the std::vector along the FluidTensor?

weefuzzy · 2023-08-23T08:41:07Z

Ah, not quite how I was imagining the representation. In my mind, each ID maps to a time sequence, e.g:

ID	index	content
point1	0	T0..Tx
point2	1	T0..Ty
point3	2	T0..Tz

So, for an RNN of any variety, each training point is itself a time sequence (and remember, that for training an RNN, we'll want to shuffle the order of the sequences without breaking their internal order because the component frames aren't independent)

weefuzzy · 2023-08-23T08:42:03Z

Oh, our replies crossed

lewardo · 2023-08-24T10:54:46Z

After a brief meeting with @tremblap for example patches for use cases we concluded on an interface for reading and writing time series of data from a DataSeries object, which are now functional
the getseries setseries updateseries and removeseries all accept an id and (with the exception of the latter) a buffer to read from/write to.
The buffer will have as many channels as the extent of each datum/frame and as many frames as there are for that id in the DataSeries
We are aware it is divergent from the interface for getting/setting single frames, where the buffer time dimension is the frame extent, as opposed to getting/setting series where the time is time and the channels are the extent, so any ideas for UI refactoring for consistency would be welcome, although in the many example patches we went through in out meeting it seemed intuitive enough.

lewardo · 2023-08-24T11:54:09Z

There are various slicing operations that could be useful, that would use DataSet objects, for example the getDataSet message now gets a time slice of the dataSeries and returns it as a dataSet, ignoring any IDs that don't have that time frame.
Other similar messages could include getting a dataSet view into a dataSeries time slice so that writing to it can change the other object from another object? of course this would be as well as the usual {replace,remove,add,get}DataSet messages to dataSeries
In essence it would be much like frame-based operations, series-based operations, but be a different slice of the underlying tensor.

tremblap · 2023-09-25T09:23:09Z

Hey @lewardo do you want assert report (on print function) here or elsewhere?

…lly check the max length dynamically zero pad ceil(log10(maxlen)) zero padding would do but maybe we should store a new "maxlength" key

lewardo added 3 commits August 22, 2023 11:02

plagiarise datasetclient for now

6903f19

rename object to dataseries

46ecaf3

functional clone of dataset

9993a1e

lewardo changed the title ~~Data series object for time-based~~ Data series object for time-based objects Aug 22, 2023

lewardo changed the title ~~Data series object for time-based objects~~ DataSeries object for time-based objects Aug 22, 2023

lewardo added 8 commits August 23, 2023 10:35

dataseries CTORs

0aa7d8b

addSeries member

1bd2e93

getSeries member

12e52c4

addFrame/getFrame members

f54ce93

get member function

725342b

modify member getters

8ffcfd3

change container type to vector of tensors

8a9a92d

modify initFromData for vector change

c8aa44c

lewardo force-pushed the data-series branch from 7c6575a to 40c95dd Compare August 23, 2023 09:50

removeSeries/Frame and updateSeries/Frame member functions

a1b89a8

lewardo force-pushed the data-series branch from 40c95dd to a1b89a8 Compare August 23, 2023 09:50

lewardo added 5 commits August 23, 2023 11:10

remove superfluous dataset members and convert names

af564f9

remove kFrameLen paramter

db247f6

add const qualifiers to input views

2774b29

add DataSeries to libmanipulation

d875bd4

fix constness point setting issues

9e116fa

lewardo added 5 commits August 24, 2023 10:51

register new series-level messages

77251f5

regroup member functions

2845d37

getSeries message

784fff6

setSeries message

2c46d51

updateSeries message

6128d91

getDataSet message

54d48c6

lewardo mentioned this pull request Aug 24, 2023

dynamic time warping algorithm with optional constraints #250

Open

lewardo added 7 commits August 24, 2023 15:01

actually pointing to the right member might be helpful

3ab06c2

fix deleteframe case with single frame in series

7d15aaf

add toBuffer and fromBuffer message aliases

ce0efe8

use custom allocator on std containers

892b3a6

formatting

1e82ed9

convert dataseries template operations for consistency

dde8232

fix templature correction

f1ede98

lewardo mentioned this pull request Sep 5, 2023

fluid.dataseries reference documentation flucoma/flucoma-docs#191

Open

lewardo added 6 commits September 5, 2023 15:46

consistent printing terminology

b847494

check if id exists before resizing buffer

c372c22

remove toBuffer and fromBuffer aliases to unconfuse

6173b9c

added proper dataseries error messages

c800a96

negative frame indexing

69265ac

getdataset argument order

035638a

tremblap and others added 6 commits January 8, 2024 13:06

temporary fix to reload error - json is alphabetical - TODO: potentia…

890985f

…lly check the max length dynamically zero pad ceil(log10(maxlen)) zero padding would do but maybe we should store a new "maxlength" key

added multi-description example with dataset and json output

bbbb50c

Merge branch 'main' into data-series

46509a4

printseries fix print 2nd boundary counter

110aecb

remove the T in dataset printing and json dumping

da06258

correct padding per series instead

d38b81b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`DataSeries` object for time-based objects #247

`DataSeries` object for time-based objects #247

lewardo commented Aug 22, 2023

weefuzzy commented Aug 22, 2023

lewardo commented Aug 22, 2023

weefuzzy commented Aug 22, 2023

lewardo commented Aug 23, 2023

weefuzzy commented Aug 23, 2023

lewardo commented Aug 23, 2023 •

edited

Loading

lewardo commented Aug 23, 2023

weefuzzy commented Aug 23, 2023 •

edited

Loading

weefuzzy commented Aug 23, 2023

lewardo commented Aug 24, 2023 •

edited

Loading

lewardo commented Aug 24, 2023

tremblap commented Sep 25, 2023

DataSeries object for time-based objects #247

Are you sure you want to change the base?

DataSeries object for time-based objects #247

Conversation

lewardo commented Aug 22, 2023

weefuzzy commented Aug 22, 2023

lewardo commented Aug 22, 2023

weefuzzy commented Aug 22, 2023

lewardo commented Aug 23, 2023

weefuzzy commented Aug 23, 2023

lewardo commented Aug 23, 2023 • edited Loading

lewardo commented Aug 23, 2023

weefuzzy commented Aug 23, 2023 • edited Loading

weefuzzy commented Aug 23, 2023

lewardo commented Aug 24, 2023 • edited Loading

lewardo commented Aug 24, 2023

tremblap commented Sep 25, 2023

`DataSeries` object for time-based objects #247

`DataSeries` object for time-based objects #247

lewardo commented Aug 23, 2023 •

edited

Loading

weefuzzy commented Aug 23, 2023 •

edited

Loading

lewardo commented Aug 24, 2023 •

edited

Loading