[Blocked by upstream bug] Change Tensor backend to pointer+length #420

mratsim · 2020-03-04T23:39:43Z

This PR changes the CPU tensor backend to Laser pointer+length.

This would help on the issues tagged laser: https://github.com/mratsim/Arraymancer/issues?q=is%3Aissue+label%3ALaser+is%3Aopen

Changing the backend

The current backend is using Nim sequences, this limits interoperability with other libraries and framework, in particular, having Arraymancer being just a view over a memory buffer described by pointer + length would allow zero-copy operations with NumPy, PyTorch and Tensorflow.

An additional benefit is enabling custom allocator (#419) which would be useful for distributed computing, NUMA-aware allocators and for situations where it's desirable to use the stack for example.

To provide deep immutability checks even with pointer object, a scheme using distinct type has been implemented:

Arraymancer/src/laser/tensor/datatypes.nim

Lines 12 to 14 in 1a68830

    
           type 
        
             RawImmutableView*[T] = distinct ptr UncheckedArray[T] 
        
             RawMutableView*[T] = distinct ptr UncheckedArray[T]

Arraymancer/src/laser/tensor/datatypes.nim

Lines 77 to 89 in 1a68830

    
           func unsafe_raw_data*[T](t: Tensor[T], aligned: static bool = true): RawImmutableView[T] {.inline.} = 
        
             ## Unsafe: the pointer can outlive the input tensor 
        
             ## For optimization purposes, Laser will hint the compiler that 
        
             ## while the pointer is valid, all data accesses will be through it (no aliasing) 
        
             ## and that the data is aligned by LASER_MEM_ALIGN (default 64). 
        
             unsafe_raw_data_impl() 
        
           func unsafe_raw_data*[T](t: var Tensor[T], aligned: static bool = true): RawMutableView[T] {.inline.} = 
        
             ## Unsafe: the pointer can outlive the input tensor 
        
             ## For optimization purposes, Laser will hint the compiler that 
        
             ## while the pointer is valid, all data accesses will be through it (no aliasing) 
        
             ## and that the data is aligned by LASER_MEM_ALIGN (default 64). 
        
             unsafe_raw_data_impl()

Arraymancer/src/laser/tensor/datatypes.nim

Lines 105 to 112 in 1a68830

    
           template `[]`*[T](v: RawImmutableView[T], idx: int): T = 
        
             distinctBase(type v)(v)[idx] 
        
           template `[]`*[T](v: RawMutableView[T], idx: int): var T = 
        
             distinctBase(type v)(v)[idx] 
        
           template `[]=`*[T](v: RawMutableView[T], idx: int, val: T) = 
        
             distinctBase(type v)(v)[idx] = val

Iterators

The current map_inline/map2_inline/map3_inline and apply_inline/apply2_inline/apply3_inline have several limits:

We need a variadic version that can handle arbitrary number of tensor to iterate onto. This limitation caused an inefficient iteration in GRU for example:

Arraymancer/src/nn_primitives/nnp_gru.nim

Lines 76 to 88 in 8acfd68

    
           # Step 2 - Computing reset (r) and update (z) gate 
        
           var W2ru = W3x[_, srz] # shape [batch_size, 2*H] - we reuse the previous buffer 
        
           apply2_inline(W2ru, U3h[_, srz]): 
        
             sigmoid(x + y) 
        
           # Step 3 - Computing candidate hidden state ñ 
        
           var n = W3x[_, s] # shape [batch_size, H] - we reuse the previous buffer 
        
           apply3_inline(n, W2ru[_, sr], U3h[_, s]): 
        
             tanh(x + y * z) 
        
           # Step 4 - Update the hidden state 
        
           apply3_inline(hidden, W3x[_, sz], n): 
        
             (1 - y) * z + y * x

Instead we could fuse all of those iterations and avoid intermediate tensors, similar to Intel Nervana Neon: https://github.com/NervanaSystems/neon/blob/8c3fb8a9/neon/layers/recurrent.py#L710-L723

The x, y, z arbitrary injected variables seem to come out-of-nowhere Laser forEach will lead to more readable code as the variables can use any name: https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/benchmarks/loop_iteration/iter_bench_prod.nim#L87-L90

proc mainBench_libImpl(a, b, c: Tensor, nb_samples: int) =
bench("Production implementation for tensor iteration"):
  forEach o in output, x in a, y in b, z in c:
    o = x + y - sin z

The map_inline/map2_inline/map3_inline and apply_inline/apply2_inline/apply3_inline lead to extreme code duplication to efficiently handle the contiguous and non-contiguous cases:

Arraymancer/src/tensor/private/p_accessors.nim

Lines 208 to 245 in 8acfd68

    
           template tripleStridedIteration*(strider: IterKind, t1, t2, t3, iter_offset, iter_size: typed): untyped = 
        
             ## Iterate over two Tensors, displaying data as in C order, whatever the strides. 
        
             let t1_contiguous = t1.is_C_Contiguous() 
        
             let t2_contiguous = t2.is_C_Contiguous() 
        
             let t3_contiguous = t3.is_C_Contiguous() 
        
             # Get tensor data address with offset builtin 
        
             withMemoryOptimHints() 
        
             let t1data{.restrict.} = t1.dataArray # Warning ⚠: data pointed may be mutated 
        
             let t2data{.restrict.} = t2.dataArray 
        
             let t3data{.restrict.} = t3.dataArray 
        
             # Optimize for loops in contiguous cases 
        
             # Note that not all cases are handled here, just some probable ones 
        
             if t1_contiguous and t2_contiguous and t3_contiguous: 
        
               for i in iter_offset..<(iter_offset+iter_size): 
        
                 tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, i, i) 
        
             elif t1_contiguous and t2_contiguous: 
        
               initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size) 
        
               for i in iter_offset..<(iter_offset+iter_size): 
        
                 tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, i, t3_iter_pos) 
        
                 advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size) 
        
             elif t1_contiguous: 
        
               initStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size) 
        
               initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size) 
        
               for i in iter_offset..<(iter_offset+iter_size): 
        
                 tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, t2_iter_pos, t3_iter_pos) 
        
                 advanceStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size) 
        
                 advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size) 
        
             else: 
        
               initStridedIteration(t1_coord, t1_backstrides, t1_iter_pos, t1, iter_offset, iter_size) 
        
               initStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size) 
        
               initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size) 
        
               for i in iter_offset..<(iter_offset+iter_size): 
        
                 tripleStridedIterationYield(strider, t1data, t2data, t3data, i, t1_iter_pos, t2_iter_pos, t3_iter_pos) 
        
                 advanceStridedIteration(t1_coord, t1_backstrides, t1_iter_pos, t1, iter_offset, iter_size) 
        
                 advanceStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size) 
        
                 advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)

Arraymancer/src/tensor/private/p_accessors.nim

Lines 98 to 131 in 8acfd68

    
           template initStridedIteration*(coord, backstrides, iter_pos: untyped, t, iter_offset, iter_size: typed): untyped = 
        
             ## Iterator init 
        
             var iter_pos = 0 
        
             withMemoryOptimHints() # MAXRANK = 8, 8 ints = 64 Bytes, cache line = 64 Bytes --> profit ! 
        
             var coord {.align64, noInit.}: array[MAXRANK, int] 
        
             var backstrides {.align64, noInit.}: array[MAXRANK, int] 
        
             for i in 0..<t.rank: 
        
               backstrides[i] = t.strides[i]*(t.shape[i]-1) 
        
               coord[i] = 0 
        
             # Calculate initial coords and iter_pos from iteration offset 
        
             if iter_offset != 0: 
        
               var z = 1 
        
               for i in countdown(t.rank - 1,0): 
        
                 coord[i] = (iter_offset div z) mod t.shape[i] 
        
                 iter_pos += coord[i]*t.strides[i] 
        
                 z *= t.shape[i] 
        
           template advanceStridedIteration*(coord, backstrides, iter_pos, t, iter_offset, iter_size: typed): untyped = 
        
             ## Computing the next position 
        
             for k in countdown(t.rank - 1,0): 
        
               if coord[k] < t.shape[k]-1: 
        
                 coord[k] += 1 
        
                 iter_pos += t.strides[k] 
        
                 break 
        
               else: 
        
                 coord[k] = 0 
        
                 iter_pos -= backstrides[k] 
        
           template stridedIterationYield*(strider: IterKind, data, i, iter_pos: typed) = 
        
             ## Iterator the return value 
        
             when strider == IterKind.Values: yield data[iter_pos] 
        
             elif strider == IterKind.Iter_Values: yield (i, data[iter_pos]) 
        
             elif strider == IterKind.Offset_Values: yield (iter_pos, data[iter_pos]) ## TODO: remove workaround for C++ backend

The new code would only duplicate an iteration body twice whatever the number of input tensors, instead of 4 times for the tripleStridedIteration

The new code is also about 15% faster according to benchmark https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/benchmarks/loop_iteration/iter_bench.nim#L140-L178 when iterating on strided tensors (that often result from slicing/reshaping). In the benchmark the old scheme is "Per tensor reference iteration" while the new one is "Fused per tensor reference iteration".
The main difference is that instead of having one of such loop per tensor:

Arraymancer/src/laser/strided_iteration/foreach_common.nim

Lines 102 to 120 in 1a68830

    
           template stridedBodyTemplate*(): untyped {.dirty.} = 
        
             quote do: 
        
               # Initialisation 
        
               `init_strided_iteration` 
        
               # Iterator loop 
        
               for _ in 0 ..< `chunk_size`: 
        
                 # Apply computation 
        
                 `body` 
        
                 # Next position 
        
                 for `k` in countdown(`alias0`.rank - 1, 0): 
        
                   if `coord`[`k`] < `alias0`.shape[`k`] - 1: 
        
                     `coord`[`k`] += 1 
        
                     `increment_iter_pos` 
        
                     break 
        
                   else: 
        
                     `coord`[`k`] = 0 
        
                     `apply_backstrides`

, a single loop updates all tensors iteration state.

mratsim · 2020-03-04T23:43:36Z

Blocking:

Currently the compiler crashes without any error when compiling:

nim c -o:build/tests_tensor_part01 -r tests/_split_tests/tests_tensor_part01.nim

The stacktrace with koch temp c on the latest commit nim-lang/Nim@357edd8 is

[...]
Hint: io_hdf5 [Processing]
/home/beta/Programming/Nim/Arraymancer/src/io/io_hdf5.nim(8, 24) Warning: import os.nim instead; ospaths is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/io/io.nim(13, 16) Warning: imported and not used: 'nimhdf5' [UnusedImport]
Hint: stats [Processing]
Hint: nlp [Processing]
Hint: tokenizers [Processing]
/home/beta/Programming/Nim/Arraymancer/src/nlp/tokenizers.nim(10, 20) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
Hint: einsum [Processing]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(239, 22) Warning: Deprecated since version 0.18.1; All functionality is defined on 'NimNode'.; ident is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(239, 31) Warning: Deprecated since version 0.18.0: Use 'ident' or 'newIdentNode' instead.; toNimIdent is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(239, 28) Warning: Deprecated since version 0.18.1; Use '==' on 'NimNode' instead.; == is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(265, 25) Warning: Deprecated since version 0.18.1; All functionality is defined on 'NimNode'.; ident is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(265, 34) Warning: Deprecated since version 0.18.0: Use 'ident' or 'newIdentNode' instead.; toNimIdent is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(265, 31) Warning: Deprecated since version 0.18.1; Use '==' on 'NimNode' instead.; == is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(377, 10) Hint: 'enumerateIdx' is declared but not used [XDeclaredButNotUsed]
Hint: unittest [Processing]
Hint: terminal [Processing]
Hint: colors [Processing]
Hint: termios [Processing]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(91, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(94, 17) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(91, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(98, 17) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(119, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(122, 17) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(133, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(137, 7) template/generic instantiation of `check` from here
/home/beta/Programming/Nim/Nim/lib/pure/unittest.nim(670, 14) template/generic instantiation of `==` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/operators_comparison.nim(24, 18) template/generic instantiation of `zip` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(139, 23) template/generic instantiation of `dualStridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(176, 31) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(133, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(137, 7) template/generic instantiation of `check` from here
/home/beta/Programming/Nim/Nim/lib/pure/unittest.nim(670, 14) template/generic instantiation of `==` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/operators_comparison.nim(24, 18) template/generic instantiation of `zip` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(139, 23) template/generic instantiation of `dualStridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(177, 31) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
Traceback (most recent call last)
/home/beta/Programming/Nim/Nim/compiler/nim.nim(118) nim
/home/beta/Programming/Nim/Nim/compiler/nim.nim(95) handleCmdLine
/home/beta/Programming/Nim/Nim/compiler/cmdlinehelper.nim(77) loadConfigsAndRunMainCommand
/home/beta/Programming/Nim/Nim/compiler/main.nim(190) mainCommand
/home/beta/Programming/Nim/Nim/compiler/main.nim(92) commandCompileToC
/home/beta/Programming/Nim/Nim/compiler/modules.nim(143) compileProject
/home/beta/Programming/Nim/Nim/compiler/modules.nim(84) compileModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(216) processModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(86) processTopLevelStmt
/home/beta/Programming/Nim/Nim/compiler/sem.nim(600) myProcess
/home/beta/Programming/Nim/Nim/compiler/sem.nim(568) semStmtAndGenerateGenerics
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2239) semStmt
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(995) semExprNoType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2782) semExpr
/home/beta/Programming/Nim/Nim/compiler/importer.nim(218) evalImport
/home/beta/Programming/Nim/Nim/compiler/importer.nim(188) impMod
/home/beta/Programming/Nim/Nim/compiler/importer.nim(158) myImportModule
/home/beta/Programming/Nim/Nim/compiler/modules.nim(98) importModule
/home/beta/Programming/Nim/Nim/compiler/modules.nim(84) compileModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(210) processModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(86) processTopLevelStmt
/home/beta/Programming/Nim/Nim/compiler/sem.nim(600) myProcess
/home/beta/Programming/Nim/Nim/compiler/sem.nim(568) semStmtAndGenerateGenerics
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2239) semStmt
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(995) semExprNoType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2749) semExpr
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2179) semStmtList
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2646) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(977) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(870) afterCallActions
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(34) semTemplateExpr
(134 calls omitted) ...
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2458) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2260) prepareOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(49) semOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2673) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(892) semIndirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1413) semFieldAccess
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1278) builtinFieldAccess
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(64) semExprWithType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2673) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(899) semIndirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(825) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(528) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(331) resolveOverloads
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(93) pickBestCandidate
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2458) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2260) prepareOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(49) semOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2716) semExpr
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(1495) semLambda
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(1365) semParamList
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1189) semProcTypeNode
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1145) semParamType
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1688) semTypeNode
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1590) semTypeof
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(64) semExprWithType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(977) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(869) afterCallActions
/home/beta/Programming/Nim/Nim/compiler/sem.nim(471) semMacroExpr
/home/beta/Programming/Nim/Nim/compiler/sem.nim(414) semAfterMacroCall
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2646) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(977) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(869) afterCallActions
/home/beta/Programming/Nim/Nim/compiler/sem.nim(471) semMacroExpr
/home/beta/Programming/Nim/Nim/compiler/sem.nim(414) semAfterMacroCall
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(822) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(536) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(497) semResolvedCall
/home/beta/Programming/Nim/Nim/compiler/seminst.nim(388) generateInstance
/home/beta/Programming/Nim/Nim/compiler/seminst.nim(148) instantiateBody
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1753) semProcBody
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2749) semExpr
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2179) semStmtList
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2747) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1705) semAsgn
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(64) semExprWithType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(825) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(528) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(331) resolveOverloads
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(93) pickBestCandidate
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2458) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2260) prepareOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(49) semOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2673) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(899) semIndirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(825) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(528) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(331) resolveOverloads
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(93) pickBestCandidate
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2459) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2170) paramTypesMatch
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2014) paramTypesMatchAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(1488) typeRel
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(715) prepareMetatypeForSigmatch
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(514) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(391) handleGenericInvocation
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(608) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(242) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(199) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(514) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(391) handleGenericInvocation
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(598) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(608) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(242) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(221) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(558) internalError
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(529) liMessage
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(356) handleError
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(346) quit
FAILURE

mratsim · 2020-03-05T22:20:18Z

After fumbling left and right and buggy supportsCopyMem(T) in type sections, var seq as return value, the PR is now done.

It works with --gc:markandsweep however it triggers a GC bug, see https://travis-ci.org/mratsim/Arraymancer/jobs/658908198#L1105-L1111

Reproduction, eiher one of those (they give different errors)

nim c --verbosity:0 --hints:off --warnings:off -o:build/tests_cpu -r tests/tests_cpu.nim
nim c --verbosity:0 --hints:off --warnings:off -d:useSysAssert -d:useGcAssert -o:build/tests_cpu -r tests/tests_cpu.nim
nim c --debuginfo -d:noSignalHandler -d:useMalloc --verbosity:0 --hints:off --warnings:off -o:build/tests_cpu -r tests/tests_cpu.nim

Changing map_inline/apply_inline to Laser forEach is left for a separate PR. This one is already huge enough.

…build/tests_tensor_part01 -r tests/_split_tests/tests_tensor_part01.nim"

…assing all tests with gc:markandsweep

…leted without deprecation

mratsim · 2020-04-20T22:01:36Z

Some benches to investigate:

20% regression on the simple xor https://github.com/mratsim/Arraymancer/blob/v0.6.0/benchmarks/ex01_xor.nim

Note: limited impact of threads:on which is known to to make allocShared an allocation bottleneck due to a lock (note the proper solution is not to avoid allocShared but avoid allocating or implement a custom allocator) from Weave benches mratsim/weave@5d1bfb4

On a larger scale benchmark there is no perf diff:

Note the scaling is quite bad from my old dual core to my 18-core due to MNIST being about 28 by 28 images, see old benches: #303

Investigation to be done after the iteration part is also merged in. Tracked in #438

mratsim · 2020-04-20T22:38:46Z

@brentp @HugoGranstrom @Vindaar if you can confirm that this PR doesn't completely breaks your code that would be very helpful.

Otherwise I can do a 0.7 release with the Numpy indexing and then work on this change for 0.8

HugoGranstrom · 2020-04-21T05:40:10Z

Neat! Great work! 🤩 All NumericalNim's tests pass so this shouldn't be a problem there.

Vindaar · 2020-04-21T07:54:42Z

Awesome work! I'll let you know later today!

Vindaar · 2020-04-21T10:10:08Z

Ok, I can already report that in ggplotnim I stumble on problems, when a tensor of my variant kind Value (for object columns with mixed types) is created.

Happens here:
https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/dataframe/column.nim#L67

and doesn't compile because of here:
https://github.com/mratsim/Arraymancer/blob/laser-tensor-iterator/src/laser/tensor/allocator.nim#L30

with:

/home/basti/CastData/ExternCode/ggplotnim/src/ggplotnim/dataframe/column.nim(67, 44) template/generic instantiation of `newTensor` from here
/home/basti/src/nim/arraymancer/src/laser/tensor/initialization.nim(178, 18) template/generic instantiation of `allocCpuStorage` from here
/home/basti/src/nim/arraymancer/src/laser/tensor/allocator.nim(30, 23) Error: type mismatch: got <ptr UncheckedArray[Value], int>
but expected one of: 
proc newSeq[T](len = 0.Natural): seq[T]
  first type mismatch at position: 1
  required type for len: Natural
  but expression 'storage.raw_buffer' is of type: ptr UncheckedArray[Value]
proc newSeq[T](s: var seq[T]; len: Natural)
  first type mismatch at position: 1
  required type for s: var seq[T]
  but expression 'storage.raw_buffer' is of type: ptr UncheckedArray[Value]

expression: newSeq(storage.raw_buffer, size)

I'll be on lunch break now. I'll see if this is something I can help with afterwards, if you haven't taken a look at it by then.

edit1:
Yes, I think I found the problem. The issue seems to be that the CpuStorage object uses different terminology to determine which branch to use compared to allocCpuStorage:

  CpuStorage*[T] {.shallow.} = ref object # Total heap: 25 bytes = 1 cache-line
    # Workaround supportsCopyMem in type section - https://github.com/nim-lang/Nim/issues/13193
    when not(T is string or T is ref):
      raw_buffer*: ptr UncheckedArray[T] # 8 bytes
      memalloc*: pointer                 # 8 bytes
      isMemOwner*: bool                  # 1 byte
    else: # Tensors of strings, other ref types or non-trivial destructors
      raw_buffer*: seq[T]                # 8 bytes (16 for seq v2 backed by destructors?)

whereas:

  when T.supportsCopyMem:
    new(storage, finalizer[T])
    {.noSideEffect.}:
      storage.memalloc = allocShared(sizeof(T) * size + LASER_MEM_ALIGN - 1)
    storage.isMemOwner = true
    storage.raw_buffer = align_raw_data(T, storage.memalloc)
  else: # Always 0-initialize Tensors of seq, strings, ref types and types with non-trivial destructors
    new(storage)
    storage.raw_buffer.newSeq(size)

supportsCopyMem is a magic and I assume variant objects (or objects containing seq / ref / string maybe) are considered to not support copyMem.
Value is here:
https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/dataframe/value.nim#L12-L25

So Value is part of the when not(T is string or T is ref) branch, but supportCopyMem returns false. Changing the CpuStorage definition to use supportsCopyMem is probably enough to fix it. I'll check it locally.

edit2: And of course, supportsCopyMem is not valid there...

edit3: just found your issue for it: nim-lang/Nim#13193

brentp · 2020-04-21T13:18:19Z

my stuff seems to be working. thank you.

mratsim · 2020-04-21T14:10:09Z

@Vindaar I'll park this PR and wait until nim-lang/Nim#13193 is solved then.

Vindaar · 2020-04-21T14:20:36Z

Ok sorry, kinda unfortunate!

Let's hope the upstream issue is fixed quickly!

mratsim · 2020-04-22T10:01:44Z

From the discussion, @Araq already tried and it's non-trivial as it creates a cascade of regressions https://irclogs.nim-lang.org/21-04-2020.html#14:43:57

Clonkk · 2020-09-25T20:45:29Z

With release 1.4 of Nim coming and many bugs fixed in gc:arc and gc:orc is there any chance to have update / progress on this issue ?

I've seen in several threads on the forum that many people (including me) are excited to use Arraymancer with ARC/ORC :)

mratsim · 2020-09-27T08:00:42Z

I can't merge this PR until this is solved nim-lang/Nim#13193

Araq · 2020-09-27T08:07:43Z

What does it mean to "solve" nim-lang/Nim#13193 ? Clyybber said you need to use CpuStorage[T: type]

mratsim · 2020-09-29T08:16:43Z

From IRC: https://irclogs.nim-lang.org/21-04-2020.html#14:46:18

Vindaar · 2020-10-27T08:31:59Z

As an FYI I've finished a rebase of this on current master. I'm currently fixing up some things which were rebased incorrectly (either by me choosing the wrong branch maybe or by git itself?).

Just felt like letting people know that this is finally going forward.

The idea is to replace the current logic using when T.supportsCopyMem (in code) and when not (T is string or T is ref) (in Tensor definition) by a custom type class of supported types we know can be mem copied. That way it should be possible to work at least for the majority of use cases using pointer + length and seq as a fallback option.

Clonkk · 2020-10-28T20:54:00Z

Will this change the API (such as accessor) ?

Will it change how low-level Tensor transformation is done (copyMem tensor into buffer etc.) ?

Vindaar · 2020-10-29T07:58:27Z

Will this change the API (such as accessor) ?

The API will remain unchanged (with a few small exceptions, e.g. toRawSeq won't be possible for ptr + length of course; adding a normal toSeq proc, which copies might be a good idea though).

Will it change how low-level Tensor transformation is done (copyMem tensor into buffer etc.) ?

Yes, it will. But most of this will happen behind the scenes. In case you have a raw ptr array you then will be able to wrap it in a Tensor and perform all arraymancer functionality without copying.

mratsim · 2020-12-10T10:49:50Z

merged #477

mratsim force-pushed the laser-tensor-iterator branch from a6a4241 to 7d5f32d Compare March 5, 2020 01:52

mratsim mentioned this pull request Mar 5, 2020

Arraymancer is bugged with gc:arc #423

Closed

mratsim changed the title ~~[WIP] Change Tensor backend to pointer+length and map/apply iterators to Laser forEach~~ [Blocked by upstream] Change Tensor backend to pointer+length and map/apply iterators to Laser forEach Mar 5, 2020

mratsim changed the title ~~[Blocked by upstream] Change Tensor backend to pointer+length and map/apply iterators to Laser forEach~~ [Blocked by upstream] Change Tensor backend to pointer+length Mar 5, 2020

mratsim added the need upstream fix label Mar 5, 2020

mratsim mentioned this pull request Mar 6, 2020

GC refcounting bug nim-lang/Nim#13598

Closed

mratsim added 21 commits April 20, 2020 21:34

Lay out the base data structure and iteration from laser

cf996de

Replace the core tensor data structure by Laser and see what breaks

6c300ef

Stash changes: showstopper upstream bug nim-lang/Nim#13095

a33bfe6

Workaround + progress. Now compiler quits in the middle of "nim c -o:…

98cfaef

…build/tests_tensor_part01 -r tests/_split_tests/tests_tensor_part01.nim"

remove usage of dataArray

2c16b59

rebase merge leftover

7fc8aeb

Handle Nim distinctBase move to typetraits in devel

c3efd17

supportsCopyMem in 1.0.6

1aaf093

Small fixes on deprecated "data" proc

4206eee

Make the test suite compile (returning var seq is broken)

91850c5

Fix offset bug when iterating

3775a40

deprecated cleanup

ef52ea9

fix out-of-bounds test

4039abe

typetraits is always needed

9ca18ed

deprecate toRawSeq (cannot be made backward compatible)

d21bed8

The default LASER_MAX_RANK is 6

33fd439

Fix sparse softmax cross-entropy test. Nim random is inclusive :/

a655a62

Fix the new parallel copyFrom (impacts index_select/embedding/GRU). P…

ff867d3

…assing all tests with gc:markandsweep

Update changelog + unsafe_raw_data leftovers

e307721

Make laser-tensor-iterator compile with latest Nim

7f1fe92

update selectors to work with laser

a2508b2

mratsim force-pushed the laser-tensor-iterator branch from 1742e9b to a2508b2 Compare April 20, 2020 19:43

mratsim added 2 commits April 20, 2020 23:27

Fix the bug left (nim-lang/Nim#13598) and reactivate full test suite

da7ebe7

Don't use deprecated .data accessor + the mutable one is broken so de…

56de8c9

…leted without deprecation

mratsim changed the title ~~[Blocked by upstream] Change Tensor backend to pointer+length~~ Change Tensor backend to pointer+length Apr 20, 2020

use unsafe_raw_offset in example instead of get_offset_ptr

b6efdd7

mratsim mentioned this pull request Apr 20, 2020

Perf regression with change to Laser / pointer+length #438

Open

mratsim added Laser and removed need upstream fix labels Apr 20, 2020

mratsim changed the title ~~Change Tensor backend to pointer+length~~ [Ready] Change Tensor backend to pointer+length Apr 20, 2020

mratsim changed the title ~~[Ready] Change Tensor backend to pointer+length~~ [Blocked by upstream bug] Change Tensor backend to pointer+length Apr 21, 2020

Vindaar mentioned this pull request Nov 27, 2020

Laser tensor PR rebase #477

Merged

mratsim closed this Dec 10, 2020

mratsim deleted the laser-tensor-iterator branch December 12, 2020 10:44

Vindaar mentioned this pull request Jan 4, 2021

Future direction for memory management? #489

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Blocked by upstream bug] Change Tensor backend to pointer+length #420

[Blocked by upstream bug] Change Tensor backend to pointer+length #420

mratsim commented Mar 4, 2020

mratsim commented Mar 4, 2020

mratsim commented Mar 5, 2020 •

edited

Loading

mratsim commented Apr 20, 2020 •

edited

Loading

mratsim commented Apr 20, 2020

HugoGranstrom commented Apr 21, 2020

Vindaar commented Apr 21, 2020

Vindaar commented Apr 21, 2020 •

edited

Loading

brentp commented Apr 21, 2020

mratsim commented Apr 21, 2020

Vindaar commented Apr 21, 2020

mratsim commented Apr 22, 2020

Clonkk commented Sep 25, 2020

mratsim commented Sep 27, 2020

Araq commented Sep 27, 2020

mratsim commented Sep 29, 2020

Vindaar commented Oct 27, 2020

Clonkk commented Oct 28, 2020

Vindaar commented Oct 29, 2020

mratsim commented Dec 10, 2020

	type
	RawImmutableView*[T] = distinct ptr UncheckedArray[T]
	RawMutableView*[T] = distinct ptr UncheckedArray[T]

	func unsafe_raw_data*[T](t: Tensor[T], aligned: static bool = true): RawImmutableView[T] {.inline.} =
	## Unsafe: the pointer can outlive the input tensor
	## For optimization purposes, Laser will hint the compiler that
	## while the pointer is valid, all data accesses will be through it (no aliasing)
	## and that the data is aligned by LASER_MEM_ALIGN (default 64).
	unsafe_raw_data_impl()

	func unsafe_raw_data*[T](t: var Tensor[T], aligned: static bool = true): RawMutableView[T] {.inline.} =
	## Unsafe: the pointer can outlive the input tensor
	## For optimization purposes, Laser will hint the compiler that
	## while the pointer is valid, all data accesses will be through it (no aliasing)
	## and that the data is aligned by LASER_MEM_ALIGN (default 64).
	unsafe_raw_data_impl()

	template `[]`*[T](v: RawImmutableView[T], idx: int): T =
	distinctBase(type v)(v)[idx]

	template `[]`*[T](v: RawMutableView[T], idx: int): var T =
	distinctBase(type v)(v)[idx]

	template `[]=`*[T](v: RawMutableView[T], idx: int, val: T) =
	distinctBase(type v)(v)[idx] = val

	# Step 2 - Computing reset (r) and update (z) gate
	var W2ru = W3x[_, srz] # shape [batch_size, 2*H] - we reuse the previous buffer
	apply2_inline(W2ru, U3h[_, srz]):
	sigmoid(x + y)

	# Step 3 - Computing candidate hidden state ñ
	var n = W3x[_, s] # shape [batch_size, H] - we reuse the previous buffer
	apply3_inline(n, W2ru[_, sr], U3h[_, s]):
	tanh(x + y * z)

	# Step 4 - Update the hidden state
	apply3_inline(hidden, W3x[_, sz], n):
	(1 - y) * z + y * x

	template tripleStridedIteration*(strider: IterKind, t1, t2, t3, iter_offset, iter_size: typed): untyped =
	## Iterate over two Tensors, displaying data as in C order, whatever the strides.
	let t1_contiguous = t1.is_C_Contiguous()
	let t2_contiguous = t2.is_C_Contiguous()
	let t3_contiguous = t3.is_C_Contiguous()

	# Get tensor data address with offset builtin
	withMemoryOptimHints()
	let t1data{.restrict.} = t1.dataArray # Warning ⚠: data pointed may be mutated
	let t2data{.restrict.} = t2.dataArray
	let t3data{.restrict.} = t3.dataArray

	# Optimize for loops in contiguous cases
	# Note that not all cases are handled here, just some probable ones
	if t1_contiguous and t2_contiguous and t3_contiguous:
	for i in iter_offset..<(iter_offset+iter_size):
	tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, i, i)
	elif t1_contiguous and t2_contiguous:
	initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
	for i in iter_offset..<(iter_offset+iter_size):
	tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, i, t3_iter_pos)
	advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
	elif t1_contiguous:
	initStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
	initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
	for i in iter_offset..<(iter_offset+iter_size):
	tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, t2_iter_pos, t3_iter_pos)
	advanceStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
	advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
	else:
	initStridedIteration(t1_coord, t1_backstrides, t1_iter_pos, t1, iter_offset, iter_size)
	initStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
	initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
	for i in iter_offset..<(iter_offset+iter_size):
	tripleStridedIterationYield(strider, t1data, t2data, t3data, i, t1_iter_pos, t2_iter_pos, t3_iter_pos)
	advanceStridedIteration(t1_coord, t1_backstrides, t1_iter_pos, t1, iter_offset, iter_size)
	advanceStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
	advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)

	template initStridedIteration*(coord, backstrides, iter_pos: untyped, t, iter_offset, iter_size: typed): untyped =
	## Iterator init
	var iter_pos = 0
	withMemoryOptimHints() # MAXRANK = 8, 8 ints = 64 Bytes, cache line = 64 Bytes --> profit !
	var coord {.align64, noInit.}: array[MAXRANK, int]
	var backstrides {.align64, noInit.}: array[MAXRANK, int]
	for i in 0..<t.rank:
	backstrides[i] = t.strides[i]*(t.shape[i]-1)
	coord[i] = 0

	# Calculate initial coords and iter_pos from iteration offset
	if iter_offset != 0:
	var z = 1
	for i in countdown(t.rank - 1,0):
	coord[i] = (iter_offset div z) mod t.shape[i]
	iter_pos += coord[i]*t.strides[i]
	z *= t.shape[i]

	template advanceStridedIteration*(coord, backstrides, iter_pos, t, iter_offset, iter_size: typed): untyped =
	## Computing the next position
	for k in countdown(t.rank - 1,0):
	if coord[k] < t.shape[k]-1:
	coord[k] += 1
	iter_pos += t.strides[k]
	break
	else:
	coord[k] = 0
	iter_pos -= backstrides[k]

	template stridedIterationYield*(strider: IterKind, data, i, iter_pos: typed) =
	## Iterator the return value
	when strider == IterKind.Values: yield data[iter_pos]
	elif strider == IterKind.Iter_Values: yield (i, data[iter_pos])
	elif strider == IterKind.Offset_Values: yield (iter_pos, data[iter_pos]) ## TODO: remove workaround for C++ backend

	template stridedBodyTemplate*(): untyped {.dirty.} =
	quote do:
	# Initialisation
	`init_strided_iteration`

	# Iterator loop
	for _ in 0 ..< `chunk_size`:
	# Apply computation
	`body`

	# Next position
	for `k` in countdown(`alias0`.rank - 1, 0):
	if `coord`[`k`] < `alias0`.shape[`k`] - 1:
	`coord`[`k`] += 1
	`increment_iter_pos`
	break
	else:
	`coord`[`k`] = 0
	`apply_backstrides`

[Blocked by upstream bug] Change Tensor backend to pointer+length #420

[Blocked by upstream bug] Change Tensor backend to pointer+length #420

Conversation

mratsim commented Mar 4, 2020

Changing the backend

Iterators

mratsim commented Mar 4, 2020

Blocking:

mratsim commented Mar 5, 2020 • edited Loading

mratsim commented Apr 20, 2020 • edited Loading

mratsim commented Apr 20, 2020

HugoGranstrom commented Apr 21, 2020

Vindaar commented Apr 21, 2020

Vindaar commented Apr 21, 2020 • edited Loading

brentp commented Apr 21, 2020

mratsim commented Apr 21, 2020

Vindaar commented Apr 21, 2020

mratsim commented Apr 22, 2020

Clonkk commented Sep 25, 2020

mratsim commented Sep 27, 2020

Araq commented Sep 27, 2020

mratsim commented Sep 29, 2020

Vindaar commented Oct 27, 2020

Clonkk commented Oct 28, 2020

Vindaar commented Oct 29, 2020

mratsim commented Dec 10, 2020

mratsim commented Mar 5, 2020 •

edited

Loading

mratsim commented Apr 20, 2020 •

edited

Loading

Vindaar commented Apr 21, 2020 •

edited

Loading