Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blocked by upstream bug] Change Tensor backend to pointer+length #420

Closed
wants to merge 25 commits into from

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Mar 4, 2020

This PR changes the CPU tensor backend to Laser pointer+length.

This would help on the issues tagged laser: https://github.com/mratsim/Arraymancer/issues?q=is%3Aissue+label%3ALaser+is%3Aopen

Changing the backend

The current backend is using Nim sequences, this limits interoperability with other libraries and framework, in particular, having Arraymancer being just a view over a memory buffer described by pointer + length would allow zero-copy operations with NumPy, PyTorch and Tensorflow.

An additional benefit is enabling custom allocator (#419) which would be useful for distributed computing, NUMA-aware allocators and for situations where it's desirable to use the stack for example.

To provide deep immutability checks even with pointer object, a scheme using distinct type has been implemented:

type
RawImmutableView*[T] = distinct ptr UncheckedArray[T]
RawMutableView*[T] = distinct ptr UncheckedArray[T]

func unsafe_raw_data*[T](t: Tensor[T], aligned: static bool = true): RawImmutableView[T] {.inline.} =
## Unsafe: the pointer can outlive the input tensor
## For optimization purposes, Laser will hint the compiler that
## while the pointer is valid, all data accesses will be through it (no aliasing)
## and that the data is aligned by LASER_MEM_ALIGN (default 64).
unsafe_raw_data_impl()
func unsafe_raw_data*[T](t: var Tensor[T], aligned: static bool = true): RawMutableView[T] {.inline.} =
## Unsafe: the pointer can outlive the input tensor
## For optimization purposes, Laser will hint the compiler that
## while the pointer is valid, all data accesses will be through it (no aliasing)
## and that the data is aligned by LASER_MEM_ALIGN (default 64).
unsafe_raw_data_impl()

template `[]`*[T](v: RawImmutableView[T], idx: int): T =
distinctBase(type v)(v)[idx]
template `[]`*[T](v: RawMutableView[T], idx: int): var T =
distinctBase(type v)(v)[idx]
template `[]=`*[T](v: RawMutableView[T], idx: int, val: T) =
distinctBase(type v)(v)[idx] = val

Iterators

The current map_inline/map2_inline/map3_inline and apply_inline/apply2_inline/apply3_inline have several limits:

  1. We need a variadic version that can handle arbitrary number of tensor to iterate onto. This limitation caused an inefficient iteration in GRU for example:
    # Step 2 - Computing reset (r) and update (z) gate
    var W2ru = W3x[_, srz] # shape [batch_size, 2*H] - we reuse the previous buffer
    apply2_inline(W2ru, U3h[_, srz]):
    sigmoid(x + y)
    # Step 3 - Computing candidate hidden state ñ
    var n = W3x[_, s] # shape [batch_size, H] - we reuse the previous buffer
    apply3_inline(n, W2ru[_, sr], U3h[_, s]):
    tanh(x + y * z)
    # Step 4 - Update the hidden state
    apply3_inline(hidden, W3x[_, sz], n):
    (1 - y) * z + y * x

Instead we could fuse all of those iterations and avoid intermediate tensors, similar to Intel Nervana Neon: https://github.com/NervanaSystems/neon/blob/8c3fb8a9/neon/layers/recurrent.py#L710-L723

  1. The x, y, z arbitrary injected variables seem to come out-of-nowhere Laser forEach will lead to more readable code as the variables can use any name: https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/benchmarks/loop_iteration/iter_bench_prod.nim#L87-L90
proc mainBench_libImpl(a, b, c: Tensor, nb_samples: int) =
bench("Production implementation for tensor iteration"):
  forEach o in output, x in a, y in b, z in c:
    o = x + y - sin z
  1. The map_inline/map2_inline/map3_inline and apply_inline/apply2_inline/apply3_inline lead to extreme code duplication to efficiently handle the contiguous and non-contiguous cases:
  • template tripleStridedIteration*(strider: IterKind, t1, t2, t3, iter_offset, iter_size: typed): untyped =
    ## Iterate over two Tensors, displaying data as in C order, whatever the strides.
    let t1_contiguous = t1.is_C_Contiguous()
    let t2_contiguous = t2.is_C_Contiguous()
    let t3_contiguous = t3.is_C_Contiguous()
    # Get tensor data address with offset builtin
    withMemoryOptimHints()
    let t1data{.restrict.} = t1.dataArray # Warning ⚠: data pointed may be mutated
    let t2data{.restrict.} = t2.dataArray
    let t3data{.restrict.} = t3.dataArray
    # Optimize for loops in contiguous cases
    # Note that not all cases are handled here, just some probable ones
    if t1_contiguous and t2_contiguous and t3_contiguous:
    for i in iter_offset..<(iter_offset+iter_size):
    tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, i, i)
    elif t1_contiguous and t2_contiguous:
    initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
    for i in iter_offset..<(iter_offset+iter_size):
    tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, i, t3_iter_pos)
    advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
    elif t1_contiguous:
    initStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
    initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
    for i in iter_offset..<(iter_offset+iter_size):
    tripleStridedIterationYield(strider, t1data, t2data, t3data, i, i, t2_iter_pos, t3_iter_pos)
    advanceStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
    advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
    else:
    initStridedIteration(t1_coord, t1_backstrides, t1_iter_pos, t1, iter_offset, iter_size)
    initStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
    initStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
    for i in iter_offset..<(iter_offset+iter_size):
    tripleStridedIterationYield(strider, t1data, t2data, t3data, i, t1_iter_pos, t2_iter_pos, t3_iter_pos)
    advanceStridedIteration(t1_coord, t1_backstrides, t1_iter_pos, t1, iter_offset, iter_size)
    advanceStridedIteration(t2_coord, t2_backstrides, t2_iter_pos, t2, iter_offset, iter_size)
    advanceStridedIteration(t3_coord, t3_backstrides, t3_iter_pos, t3, iter_offset, iter_size)
  • template initStridedIteration*(coord, backstrides, iter_pos: untyped, t, iter_offset, iter_size: typed): untyped =
    ## Iterator init
    var iter_pos = 0
    withMemoryOptimHints() # MAXRANK = 8, 8 ints = 64 Bytes, cache line = 64 Bytes --> profit !
    var coord {.align64, noInit.}: array[MAXRANK, int]
    var backstrides {.align64, noInit.}: array[MAXRANK, int]
    for i in 0..<t.rank:
    backstrides[i] = t.strides[i]*(t.shape[i]-1)
    coord[i] = 0
    # Calculate initial coords and iter_pos from iteration offset
    if iter_offset != 0:
    var z = 1
    for i in countdown(t.rank - 1,0):
    coord[i] = (iter_offset div z) mod t.shape[i]
    iter_pos += coord[i]*t.strides[i]
    z *= t.shape[i]
    template advanceStridedIteration*(coord, backstrides, iter_pos, t, iter_offset, iter_size: typed): untyped =
    ## Computing the next position
    for k in countdown(t.rank - 1,0):
    if coord[k] < t.shape[k]-1:
    coord[k] += 1
    iter_pos += t.strides[k]
    break
    else:
    coord[k] = 0
    iter_pos -= backstrides[k]
    template stridedIterationYield*(strider: IterKind, data, i, iter_pos: typed) =
    ## Iterator the return value
    when strider == IterKind.Values: yield data[iter_pos]
    elif strider == IterKind.Iter_Values: yield (i, data[iter_pos])
    elif strider == IterKind.Offset_Values: yield (iter_pos, data[iter_pos]) ## TODO: remove workaround for C++ backend

    The new code would only duplicate an iteration body twice whatever the number of input tensors, instead of 4 times for the tripleStridedIteration
  1. The new code is also about 15% faster according to benchmark https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/benchmarks/loop_iteration/iter_bench.nim#L140-L178 when iterating on strided tensors (that often result from slicing/reshaping). In the benchmark the old scheme is "Per tensor reference iteration" while the new one is "Fused per tensor reference iteration".
    The main difference is that instead of having one of such loop per tensor:
    template stridedBodyTemplate*(): untyped {.dirty.} =
    quote do:
    # Initialisation
    `init_strided_iteration`
    # Iterator loop
    for _ in 0 ..< `chunk_size`:
    # Apply computation
    `body`
    # Next position
    for `k` in countdown(`alias0`.rank - 1, 0):
    if `coord`[`k`] < `alias0`.shape[`k`] - 1:
    `coord`[`k`] += 1
    `increment_iter_pos`
    break
    else:
    `coord`[`k`] = 0
    `apply_backstrides`
    , a single loop updates all tensors iteration state.

@mratsim
Copy link
Owner Author

mratsim commented Mar 4, 2020

Blocking:

Currently the compiler crashes without any error when compiling:

nim c -o:build/tests_tensor_part01 -r tests/_split_tests/tests_tensor_part01.nim

The stacktrace with koch temp c on the latest commit nim-lang/Nim@357edd8 is

[...]
Hint: io_hdf5 [Processing]
/home/beta/Programming/Nim/Arraymancer/src/io/io_hdf5.nim(8, 24) Warning: import os.nim instead; ospaths is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/io/io.nim(13, 16) Warning: imported and not used: 'nimhdf5' [UnusedImport]
Hint: stats [Processing]
Hint: nlp [Processing]
Hint: tokenizers [Processing]
/home/beta/Programming/Nim/Arraymancer/src/nlp/tokenizers.nim(10, 20) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
Hint: einsum [Processing]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(239, 22) Warning: Deprecated since version 0.18.1; All functionality is defined on 'NimNode'.; ident is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(239, 31) Warning: Deprecated since version 0.18.0: Use 'ident' or 'newIdentNode' instead.; toNimIdent is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(239, 28) Warning: Deprecated since version 0.18.1; Use '==' on 'NimNode' instead.; == is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(265, 25) Warning: Deprecated since version 0.18.1; All functionality is defined on 'NimNode'.; ident is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(265, 34) Warning: Deprecated since version 0.18.0: Use 'ident' or 'newIdentNode' instead.; toNimIdent is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(265, 31) Warning: Deprecated since version 0.18.1; Use '==' on 'NimNode' instead.; == is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/src/tensor/einsum.nim(377, 10) Hint: 'enumerateIdx' is declared but not used [XDeclaredButNotUsed]
Hint: unittest [Processing]
Hint: terminal [Processing]
Hint: colors [Processing]
Hint: termios [Processing]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(91, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(94, 17) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(91, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(98, 17) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(119, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(122, 17) template/generic instantiation of `items` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(49, 19) template/generic instantiation of `stridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(137, 28) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(133, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(137, 7) template/generic instantiation of `check` from here
/home/beta/Programming/Nim/Nim/lib/pure/unittest.nim(670, 14) template/generic instantiation of `==` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/operators_comparison.nim(24, 18) template/generic instantiation of `zip` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(139, 23) template/generic instantiation of `dualStridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(176, 31) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(19, 7) template/generic instantiation of `suite` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(133, 8) template/generic instantiation of `test` from here
/home/beta/Programming/Nim/Arraymancer/tests/tensor/test_init.nim(137, 7) template/generic instantiation of `check` from here
/home/beta/Programming/Nim/Nim/lib/pure/unittest.nim(670, 14) template/generic instantiation of `==` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/operators_comparison.nim(24, 18) template/generic instantiation of `zip` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/accessors.nim(139, 23) template/generic instantiation of `dualStridedIteration` from here
/home/beta/Programming/Nim/Arraymancer/src/tensor/private/p_accessors.nim(177, 31) Warning: Use unsafe_raw_data instead; dataArray is deprecated [Deprecated]
Traceback (most recent call last)
/home/beta/Programming/Nim/Nim/compiler/nim.nim(118) nim
/home/beta/Programming/Nim/Nim/compiler/nim.nim(95) handleCmdLine
/home/beta/Programming/Nim/Nim/compiler/cmdlinehelper.nim(77) loadConfigsAndRunMainCommand
/home/beta/Programming/Nim/Nim/compiler/main.nim(190) mainCommand
/home/beta/Programming/Nim/Nim/compiler/main.nim(92) commandCompileToC
/home/beta/Programming/Nim/Nim/compiler/modules.nim(143) compileProject
/home/beta/Programming/Nim/Nim/compiler/modules.nim(84) compileModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(216) processModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(86) processTopLevelStmt
/home/beta/Programming/Nim/Nim/compiler/sem.nim(600) myProcess
/home/beta/Programming/Nim/Nim/compiler/sem.nim(568) semStmtAndGenerateGenerics
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2239) semStmt
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(995) semExprNoType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2782) semExpr
/home/beta/Programming/Nim/Nim/compiler/importer.nim(218) evalImport
/home/beta/Programming/Nim/Nim/compiler/importer.nim(188) impMod
/home/beta/Programming/Nim/Nim/compiler/importer.nim(158) myImportModule
/home/beta/Programming/Nim/Nim/compiler/modules.nim(98) importModule
/home/beta/Programming/Nim/Nim/compiler/modules.nim(84) compileModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(210) processModule
/home/beta/Programming/Nim/Nim/compiler/passes.nim(86) processTopLevelStmt
/home/beta/Programming/Nim/Nim/compiler/sem.nim(600) myProcess
/home/beta/Programming/Nim/Nim/compiler/sem.nim(568) semStmtAndGenerateGenerics
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2239) semStmt
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(995) semExprNoType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2749) semExpr
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2179) semStmtList
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2646) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(977) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(870) afterCallActions
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(34) semTemplateExpr
(134 calls omitted) ...
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2458) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2260) prepareOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(49) semOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2673) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(892) semIndirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1413) semFieldAccess
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1278) builtinFieldAccess
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(64) semExprWithType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2673) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(899) semIndirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(825) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(528) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(331) resolveOverloads
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(93) pickBestCandidate
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2458) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2260) prepareOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(49) semOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2716) semExpr
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(1495) semLambda
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(1365) semParamList
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1189) semProcTypeNode
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1145) semParamType
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1688) semTypeNode
/home/beta/Programming/Nim/Nim/compiler/semtypes.nim(1590) semTypeof
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(64) semExprWithType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(977) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(869) afterCallActions
/home/beta/Programming/Nim/Nim/compiler/sem.nim(471) semMacroExpr
/home/beta/Programming/Nim/Nim/compiler/sem.nim(414) semAfterMacroCall
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2646) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(977) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(869) afterCallActions
/home/beta/Programming/Nim/Nim/compiler/sem.nim(471) semMacroExpr
/home/beta/Programming/Nim/Nim/compiler/sem.nim(414) semAfterMacroCall
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(822) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(536) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(497) semResolvedCall
/home/beta/Programming/Nim/Nim/compiler/seminst.nim(388) generateInstance
/home/beta/Programming/Nim/Nim/compiler/seminst.nim(148) instantiateBody
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1753) semProcBody
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2749) semExpr
/home/beta/Programming/Nim/Nim/compiler/semstmts.nim(2179) semStmtList
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2747) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(1705) semAsgn
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(64) semExprWithType
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(825) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(528) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(331) resolveOverloads
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(93) pickBestCandidate
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2458) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2260) prepareOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(49) semOperand
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2673) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(899) semIndirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(2671) semExpr
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(976) semDirectOp
/home/beta/Programming/Nim/Nim/compiler/semexprs.nim(825) semOverloadedCallAnalyseEffects
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(528) semOverloadedCall
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(331) resolveOverloads
/home/beta/Programming/Nim/Nim/compiler/semcall.nim(93) pickBestCandidate
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2521) matches
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2459) matchesAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2170) paramTypesMatch
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(2014) paramTypesMatchAux
/home/beta/Programming/Nim/Nim/compiler/sigmatch.nim(1488) typeRel
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(715) prepareMetatypeForSigmatch
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(514) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(391) handleGenericInvocation
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(608) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(242) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(199) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(514) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(391) handleGenericInvocation
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(598) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(120) replaceTypeVarsT
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(608) replaceTypeVarsTAux
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(242) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/semtypinst.nim(221) replaceTypeVarsN
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(558) internalError
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(529) liMessage
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(356) handleError
/home/beta/Programming/Nim/Nim/compiler/msgs.nim(346) quit
FAILURE

@mratsim mratsim changed the title [WIP] Change Tensor backend to pointer+length and map/apply iterators to Laser forEach [Blocked by upstream] Change Tensor backend to pointer+length and map/apply iterators to Laser forEach Mar 5, 2020
@mratsim
Copy link
Owner Author

mratsim commented Mar 5, 2020

After fumbling left and right and buggy supportsCopyMem(T) in type sections, var seq as return value, the PR is now done.

It works with --gc:markandsweep however it triggers a GC bug, see https://travis-ci.org/mratsim/Arraymancer/jobs/658908198#L1105-L1111

Reproduction, eiher one of those (they give different errors)

nim c --verbosity:0 --hints:off --warnings:off -o:build/tests_cpu -r tests/tests_cpu.nim
nim c --verbosity:0 --hints:off --warnings:off -d:useSysAssert -d:useGcAssert -o:build/tests_cpu -r tests/tests_cpu.nim
nim c --debuginfo -d:noSignalHandler -d:useMalloc --verbosity:0 --hints:off --warnings:off -o:build/tests_cpu -r tests/tests_cpu.nim

Changing map_inline/apply_inline to Laser forEach is left for a separate PR. This one is already huge enough.

@mratsim mratsim changed the title [Blocked by upstream] Change Tensor backend to pointer+length and map/apply iterators to Laser forEach [Blocked by upstream] Change Tensor backend to pointer+length Mar 5, 2020
@mratsim
Copy link
Owner Author

mratsim commented Apr 20, 2020

Some benches to investigate:

20% regression on the simple xor https://github.com/mratsim/Arraymancer/blob/v0.6.0/benchmarks/ex01_xor.nim
image
Note: limited impact of threads:on which is known to to make allocShared an allocation bottleneck due to a lock (note the proper solution is not to avoid allocShared but avoid allocating or implement a custom allocator) from Weave benches mratsim/weave@5d1bfb4
image

On a larger scale benchmark there is no perf diff:
image
image

Note the scaling is quite bad from my old dual core to my 18-core due to MNIST being about 28 by 28 images, see old benches: #303

Investigation to be done after the iteration part is also merged in. Tracked in #438

@mratsim mratsim changed the title [Blocked by upstream] Change Tensor backend to pointer+length Change Tensor backend to pointer+length Apr 20, 2020
@mratsim
Copy link
Owner Author

mratsim commented Apr 20, 2020

@brentp @HugoGranstrom @Vindaar if you can confirm that this PR doesn't completely breaks your code that would be very helpful.

Otherwise I can do a 0.7 release with the Numpy indexing and then work on this change for 0.8

@mratsim mratsim changed the title Change Tensor backend to pointer+length [Ready] Change Tensor backend to pointer+length Apr 20, 2020
@HugoGranstrom
Copy link

Neat! Great work! 🤩 All NumericalNim's tests pass so this shouldn't be a problem there.

@Vindaar
Copy link
Collaborator

Vindaar commented Apr 21, 2020

Awesome work! I'll let you know later today!

@Vindaar
Copy link
Collaborator

Vindaar commented Apr 21, 2020

Ok, I can already report that in ggplotnim I stumble on problems, when a tensor of my variant kind Value (for object columns with mixed types) is created.

Happens here:
https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/dataframe/column.nim#L67

and doesn't compile because of here:
https://github.com/mratsim/Arraymancer/blob/laser-tensor-iterator/src/laser/tensor/allocator.nim#L30

with:

/home/basti/CastData/ExternCode/ggplotnim/src/ggplotnim/dataframe/column.nim(67, 44) template/generic instantiation of `newTensor` from here
/home/basti/src/nim/arraymancer/src/laser/tensor/initialization.nim(178, 18) template/generic instantiation of `allocCpuStorage` from here
/home/basti/src/nim/arraymancer/src/laser/tensor/allocator.nim(30, 23) Error: type mismatch: got <ptr UncheckedArray[Value], int>
but expected one of: 
proc newSeq[T](len = 0.Natural): seq[T]
  first type mismatch at position: 1
  required type for len: Natural
  but expression 'storage.raw_buffer' is of type: ptr UncheckedArray[Value]
proc newSeq[T](s: var seq[T]; len: Natural)
  first type mismatch at position: 1
  required type for s: var seq[T]
  but expression 'storage.raw_buffer' is of type: ptr UncheckedArray[Value]

expression: newSeq(storage.raw_buffer, size)

I'll be on lunch break now. I'll see if this is something I can help with afterwards, if you haven't taken a look at it by then.

edit1:
Yes, I think I found the problem. The issue seems to be that the CpuStorage object uses different terminology to determine which branch to use compared to allocCpuStorage:

  CpuStorage*[T] {.shallow.} = ref object # Total heap: 25 bytes = 1 cache-line
    # Workaround supportsCopyMem in type section - https://github.com/nim-lang/Nim/issues/13193
    when not(T is string or T is ref):
      raw_buffer*: ptr UncheckedArray[T] # 8 bytes
      memalloc*: pointer                 # 8 bytes
      isMemOwner*: bool                  # 1 byte
    else: # Tensors of strings, other ref types or non-trivial destructors
      raw_buffer*: seq[T]                # 8 bytes (16 for seq v2 backed by destructors?)

whereas:

  when T.supportsCopyMem:
    new(storage, finalizer[T])
    {.noSideEffect.}:
      storage.memalloc = allocShared(sizeof(T) * size + LASER_MEM_ALIGN - 1)
    storage.isMemOwner = true
    storage.raw_buffer = align_raw_data(T, storage.memalloc)
  else: # Always 0-initialize Tensors of seq, strings, ref types and types with non-trivial destructors
    new(storage)
    storage.raw_buffer.newSeq(size)

supportsCopyMem is a magic and I assume variant objects (or objects containing seq / ref / string maybe) are considered to not support copyMem.
Value is here:
https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/dataframe/value.nim#L12-L25

So Value is part of the when not(T is string or T is ref) branch, but supportCopyMem returns false. Changing the CpuStorage definition to use supportsCopyMem is probably enough to fix it. I'll check it locally.

edit2: And of course, supportsCopyMem is not valid there...

edit3: just found your issue for it: nim-lang/Nim#13193

@brentp
Copy link
Contributor

brentp commented Apr 21, 2020

my stuff seems to be working. thank you.

@mratsim
Copy link
Owner Author

mratsim commented Apr 21, 2020

@Vindaar I'll park this PR and wait until nim-lang/Nim#13193 is solved then.

@mratsim mratsim changed the title [Ready] Change Tensor backend to pointer+length [Blocked by upstream bug] Change Tensor backend to pointer+length Apr 21, 2020
@Vindaar
Copy link
Collaborator

Vindaar commented Apr 21, 2020

Ok sorry, kinda unfortunate!

Let's hope the upstream issue is fixed quickly!

@mratsim
Copy link
Owner Author

mratsim commented Apr 22, 2020

From the discussion, @Araq already tried and it's non-trivial as it creates a cascade of regressions https://irclogs.nim-lang.org/21-04-2020.html#14:43:57

@Clonkk
Copy link
Contributor

Clonkk commented Sep 25, 2020

With release 1.4 of Nim coming and many bugs fixed in gc:arc and gc:orc is there any chance to have update / progress on this issue ?

I've seen in several threads on the forum that many people (including me) are excited to use Arraymancer with ARC/ORC :)

@mratsim
Copy link
Owner Author

mratsim commented Sep 27, 2020

I can't merge this PR until this is solved nim-lang/Nim#13193

@Araq
Copy link

Araq commented Sep 27, 2020

What does it mean to "solve" nim-lang/Nim#13193 ? Clyybber said you need to use CpuStorage[T: type]

@mratsim
Copy link
Owner Author

mratsim commented Sep 29, 2020

@Vindaar
Copy link
Collaborator

Vindaar commented Oct 27, 2020

As an FYI I've finished a rebase of this on current master. I'm currently fixing up some things which were rebased incorrectly (either by me choosing the wrong branch maybe or by git itself?).

Just felt like letting people know that this is finally going forward.

The idea is to replace the current logic using when T.supportsCopyMem (in code) and when not (T is string or T is ref) (in Tensor definition) by a custom type class of supported types we know can be mem copied. That way it should be possible to work at least for the majority of use cases using pointer + length and seq as a fallback option.

@Clonkk
Copy link
Contributor

Clonkk commented Oct 28, 2020

Will this change the API (such as accessor) ?

Will it change how low-level Tensor transformation is done (copyMem tensor into buffer etc.) ?

@Vindaar
Copy link
Collaborator

Vindaar commented Oct 29, 2020

Will this change the API (such as accessor) ?

The API will remain unchanged (with a few small exceptions, e.g. toRawSeq won't be possible for ptr + length of course; adding a normal toSeq proc, which copies might be a good idea though).

Will it change how low-level Tensor transformation is done (copyMem tensor into buffer etc.) ?

Yes, it will. But most of this will happen behind the scenes. In case you have a raw ptr array you then will be able to wrap it in a Tensor and perform all arraymancer functionality without copying.

@Vindaar Vindaar mentioned this pull request Nov 27, 2020
@mratsim
Copy link
Owner Author

mratsim commented Dec 10, 2020

merged #477

@mratsim mratsim closed this Dec 10, 2020
@mratsim mratsim deleted the laser-tensor-iterator branch December 12, 2020 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants