-
Notifications
You must be signed in to change notification settings - Fork 3
Multidimensional Data
The MultidimensionalData
protocol provides powerful tools for handling multidimensional data of any kind. An entity following the MultidimensionalData
protocol is essentially a multidimensional array. To prevent misunderstanding, we generally try to avoid the term "dimension" (except in the name of the package and protocol). What we can think of as one layer in a nested multidimensional array, we call "mode", the number of elements in this layer is called "modeSize".
All elements of the MultidimensionalData
object are stored in a big flat values
array. Analogously (or as an extension) to row-major matrices, like they are used in C, the first mode is the one highest up in the hierarchy. This means, consecutive elements in the first mode are spread out with equal distance over the whole values
array, whereas consecutive elements in the last mode are consecutive elements in values
.
In a C-style nested array, the inner arrays may have various sizes in the same layer. This is not possible for this protocol. The modes can all have different sizes, but one specific mode must have the same size all the time, independent of the position in the data. We can think of a MultidimensionalData
object as a "hypercuboid" that may not be ragged.
Getting and setting single elements is done by using a subscript with multiple Integers defining the index of each mode as arguments. Therefore the number of Integers has to be the same as the number of modes. The Integers can be passed in either as a single array (data[[3, 4, 1]]
) or as variadic arguments (data[3, 4, 1]
), both ways are equivalent.
It is also possible to get and set a slice. The slice in itself is a multidimensional data object of the same type with either fewer modes, or smaller modes, or both. Arrays and ranges of Integers both can be used to define the indices in each mode contained in the slice. If a mode is completely retained in a slice, it can be indexed by the global variable all
. If only one index of a mode is retained in the slice, this mode will vanish in the slice (it is not possible in this case to only write a single Integer, instead a range or array with count one should be used). Again, variadic arguments (data[2...5, [3], all]
) or an argument array can be used. The array however must be preceded by the slice:
keyword (data[slice: [2...5, [3], all]]
). If there are less arguments than modes, the remaining last modes are retained completely.
This kind of syntax turns out to be a bit of a struggle with Swift generics. To allow heterogenous arrays as arguments for slicing, the DataSliceSubscript
protocol is defined, and Array
and Range
are extended to follow this protocol. Under the hood, they have to be dynamically type checked. The slicing indices for a data object can be stored anyway as an DataSliceSubscript
array.
There are two functions to help implementing operations on MultidimensionalData
objects. They reduce boilerplate code and provide automatic parallel computation. In many cases it is straightforward to extend operations on vectors and matrices to operations on arbitrary tensors by using these functions in combination with a default implementation of the initial operation.
func performForOuterModes(outerModes: [Int],
inout outputData: [Self],
calculate: (currentIndex: [DataSliceSubscript], outerIndex: [DataSliceSubscript], sourceData: Self) -> [Self],
writeOutput: (currentIndex: [DataSliceSubscript], outerIndex: [DataSliceSubscript], inputData: [Self], inout outputData: [Self]) -> ())
We often want to operate on a subset of modes of a single data object. These modes will either be contracted and vanish, or be extracted into a new object, or in any other way treated differently than the other modes. We could call these the "inner modes". The outer modes, which we give the performForOuterModes
function as the first argument, are all modes in the data object, that are not inner modes. performForOuterModes
recursively constructs nested for-loops, and for every index combination in the outer modes the calculate
and writeOutput
closures will be called.
Before we call performForOuterModes
, the output data objects (or probably only one object in many cases) have to be created (e.g. with zeros()
) and then given in an array as the outputData
argument to the function.
The calculate
closure will be called asynchronously by the performForOuterModes
function and provided with three arguments:
-
currentIndex
: The indices inself
that are currently affected. If we sliceself
with thecurrentIndex
, we will get a data object with only the inner modes, but every timecalculate
gets called it will contain different values. -
outerIndex
: The current index in only the outer modes. Although it is anDataSliceSubscript
array, it defines the position of exactly one element of the outer modes. -
sourceData
: Simplyself
, has to passed in as an argument because of the parallel computation.
One or several results have to be returned as an array. These results will be written synchronously to the outputData
object by the writeOutput
closure (technically this could also be done asynchronously, but then the dispatch engine gets confused). Here we have four arguments:
-
currentIndex
andouterIndex
: Same as forcalculate
. -
inputData
: Exactly the array of data objects returned bycalculate
. This has to be written to theoutputData
. -
inout outputData
: Reference to the array of data objects given to theperformForOuterModes
function.
One of the simplest examples of performForOuterModes
is the sum( overModes: )
function for tensors. Here the modes that will be summed are the inner modes and will be contracted, i.e. they vanish. The outputData
tensor will have only the remaining, the outer modes. In calculate
, the values from currentIndex
slice are summed and a single scalar tensor (wrapped in an array) is returned. In writeOutput
, this scalar is then written to the outerIndex
in the outputData
.
Slightly more complex is the normalize( overModes: )
function for tensors. We can define an arbitrary set of modes that will be normalized. Along these modes, the normalized tensor will have mean zero and a standard deviation of one. All modes that will not be normalized are the outer modes. This function has three outputData
objects: the normalized tensor, the mean tensor, and the deviation tensor. The normalized tensor will obviously have the same modes as the tensor that gets normalized, only with scaled and shifted elements. Mean and deviation tensor will only have the outer modes, containing mean and deviation of the corresponding slice of normalized inner modes. In the calculate
closure, the values of the currentIndex
slice get normalized by a standard array normalization function. The normalized values get returned as a tensor with all inner modes, mean and deviation as scalar tensors. In writeOutput
, the normalized slice is written to the currentIndex
of the normalized tensor, mean and deviation are written to the outerIndex
of the mean or deviation tensor respectively.
func combine<T: MultidimensionalData>(a: T, forOuterModes outerModesA: [Int],
with b: T, forOuterModes outerModesB: [Int],
inout outputData: [T],
calculate: (indexA: [DataSliceSubscript], indexB: [DataSliceSubscript], outerIndex: [DataSliceSubscript], sourceA: T, sourceB: T) -> [T],
writeOutput: (indexA: [DataSliceSubscript], indexB: [DataSliceSubscript], outerIndex: [DataSliceSubscript], inputData: [T], inout outputData: [T]) -> ())
The combine
function is used for operations that combine in some way two multidimensional data objects of the same type (but not necessarily with the same modes) and return one or several of these objects. It works very similarly to performForOuterModes
, only with two input objects (a
and b
). The constellation of inner modes depends on the calculation that will be performed, as we will see in the examples. The outer modes of the objects are independent. We can think of the outer modes from a
and the outer modes from b
as getting concatenated (a before b). For every index combination of this outer-modes-compound, calculate
and writeOutput
are called by the combine
function.
The calculate
closure now has five arguments:
-
indexA
andindexB
: The currently affected inner modes ofa
andb
. If theouterIndex
cycles through the outer modes ofa
,indexB
does not change, and vice versa. -
outerIndex
: The current index in the combined outer modes. -
sourceA
andsourceB
: Again simplya
andb
, for parallel computational reasons.
Again, the resulting data objects are returned. The workings of writeOutput
should be clear.
Two different kinds of tensor multiplications will serve as examples for the application of the combine
function. First we look at multiplyElementwise
. Here the inner modes simply get multiplied element wise, hence they must be equivalent, i.e. same number of modes, same order and same mode sizes. In the calculate
step, a conventional function for element wise vector multiplication is used to multiply the value arrays of the a
and b
slices. The returned tensor consists of the inner modes and has the product array as values. This tensor will be written to the outerIndex
slice of the outputData
.
The multiply
function for tensors is a lot more sophisticated, the use of the combine
function however is quite similar. In this kind of tensor multiplication, some modes that exist in both tensors may be summed over. To achieve this, we want to make as much use as possible of accelerated matrix multiplication algorithms. For this reason, the modes of both tensors are reordered if necessary, and the combine
function is applied. Note, that the inner modes in this case are not the summation modes. Instead, they are the modes of which the slices, or rather the values
arrays of the slices, can be used as matrices in a matrix multiplication. This mean, that the indexA
slice, the indexB
slice and the returned slice might all be different. The exact properties of these slices have been calculated before. The outer modes are those modes that cannot be integrated in a matrix multiplication.
-
Basic Documentation
-
Applications