-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Blocked by upstream bug] Change Tensor backend to pointer+length #420
Conversation
Blocking:Currently the compiler crashes without any error when compiling:
The stacktrace with
|
a6a4241
to
7d5f32d
Compare
After fumbling left and right and buggy It works with Reproduction, eiher one of those (they give different errors) nim c --verbosity:0 --hints:off --warnings:off -o:build/tests_cpu -r tests/tests_cpu.nim
nim c --verbosity:0 --hints:off --warnings:off -d:useSysAssert -d:useGcAssert -o:build/tests_cpu -r tests/tests_cpu.nim
nim c --debuginfo -d:noSignalHandler -d:useMalloc --verbosity:0 --hints:off --warnings:off -o:build/tests_cpu -r tests/tests_cpu.nim Changing |
…build/tests_tensor_part01 -r tests/_split_tests/tests_tensor_part01.nim"
…assing all tests with gc:markandsweep
1742e9b
to
a2508b2
Compare
Some benches to investigate: 20% regression on the simple xor https://github.com/mratsim/Arraymancer/blob/v0.6.0/benchmarks/ex01_xor.nim On a larger scale benchmark there is no perf diff: Note the scaling is quite bad from my old dual core to my 18-core due to MNIST being about 28 by 28 images, see old benches: #303 Investigation to be done after the iteration part is also merged in. Tracked in #438 |
@brentp @HugoGranstrom @Vindaar if you can confirm that this PR doesn't completely breaks your code that would be very helpful. Otherwise I can do a 0.7 release with the Numpy indexing and then work on this change for 0.8 |
Neat! Great work! 🤩 All NumericalNim's tests pass so this shouldn't be a problem there. |
Awesome work! I'll let you know later today! |
Ok, I can already report that in ggplotnim I stumble on problems, when a tensor of my variant kind Happens here: and doesn't compile because of here: with:
I'll be on lunch break now. I'll see if this is something I can help with afterwards, if you haven't taken a look at it by then. edit1: CpuStorage*[T] {.shallow.} = ref object # Total heap: 25 bytes = 1 cache-line
# Workaround supportsCopyMem in type section - https://github.com/nim-lang/Nim/issues/13193
when not(T is string or T is ref):
raw_buffer*: ptr UncheckedArray[T] # 8 bytes
memalloc*: pointer # 8 bytes
isMemOwner*: bool # 1 byte
else: # Tensors of strings, other ref types or non-trivial destructors
raw_buffer*: seq[T] # 8 bytes (16 for seq v2 backed by destructors?) whereas: when T.supportsCopyMem:
new(storage, finalizer[T])
{.noSideEffect.}:
storage.memalloc = allocShared(sizeof(T) * size + LASER_MEM_ALIGN - 1)
storage.isMemOwner = true
storage.raw_buffer = align_raw_data(T, storage.memalloc)
else: # Always 0-initialize Tensors of seq, strings, ref types and types with non-trivial destructors
new(storage)
storage.raw_buffer.newSeq(size)
So edit2: And of course, edit3: just found your issue for it: nim-lang/Nim#13193 |
my stuff seems to be working. thank you. |
@Vindaar I'll park this PR and wait until nim-lang/Nim#13193 is solved then. |
Ok sorry, kinda unfortunate! Let's hope the upstream issue is fixed quickly! |
From the discussion, @Araq already tried and it's non-trivial as it creates a cascade of regressions https://irclogs.nim-lang.org/21-04-2020.html#14:43:57 |
With release 1.4 of Nim coming and many bugs fixed in gc:arc and gc:orc is there any chance to have update / progress on this issue ? I've seen in several threads on the forum that many people (including me) are excited to use Arraymancer with ARC/ORC :) |
I can't merge this PR until this is solved nim-lang/Nim#13193 |
What does it mean to "solve" nim-lang/Nim#13193 ? Clyybber said you need to use |
As an FYI I've finished a rebase of this on current master. I'm currently fixing up some things which were rebased incorrectly (either by me choosing the wrong branch maybe or by git itself?). Just felt like letting people know that this is finally going forward. The idea is to replace the current logic using |
Will this change the API (such as accessor) ? Will it change how low-level Tensor transformation is done (copyMem tensor into buffer etc.) ? |
The API will remain unchanged (with a few small exceptions, e.g.
Yes, it will. But most of this will happen behind the scenes. In case you have a raw ptr array you then will be able to wrap it in a Tensor and perform all arraymancer functionality without copying. |
merged #477 |
This PR changes the CPU tensor backend to Laser pointer+length.
This would help on the issues tagged laser: https://github.com/mratsim/Arraymancer/issues?q=is%3Aissue+label%3ALaser+is%3Aopen
Changing the backend
The current backend is using Nim sequences, this limits interoperability with other libraries and framework, in particular, having Arraymancer being just a view over a memory buffer described by pointer + length would allow zero-copy operations with NumPy, PyTorch and Tensorflow.
An additional benefit is enabling custom allocator (#419) which would be useful for distributed computing, NUMA-aware allocators and for situations where it's desirable to use the stack for example.
To provide deep immutability checks even with pointer object, a scheme using distinct type has been implemented:
Arraymancer/src/laser/tensor/datatypes.nim
Lines 12 to 14 in 1a68830
Arraymancer/src/laser/tensor/datatypes.nim
Lines 77 to 89 in 1a68830
Arraymancer/src/laser/tensor/datatypes.nim
Lines 105 to 112 in 1a68830
Iterators
The current map_inline/map2_inline/map3_inline and apply_inline/apply2_inline/apply3_inline have several limits:
Arraymancer/src/nn_primitives/nnp_gru.nim
Lines 76 to 88 in 8acfd68
Instead we could fuse all of those iterations and avoid intermediate tensors, similar to Intel Nervana Neon: https://github.com/NervanaSystems/neon/blob/8c3fb8a9/neon/layers/recurrent.py#L710-L723
x
,y
,z
arbitrary injected variables seem to come out-of-nowhere Laser forEach will lead to more readable code as the variables can use any name: https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/benchmarks/loop_iteration/iter_bench_prod.nim#L87-L90Arraymancer/src/tensor/private/p_accessors.nim
Lines 208 to 245 in 8acfd68
Arraymancer/src/tensor/private/p_accessors.nim
Lines 98 to 131 in 8acfd68
The new code would only duplicate an iteration body twice whatever the number of input tensors, instead of 4 times for the tripleStridedIteration
The main difference is that instead of having one of such loop per tensor:
Arraymancer/src/laser/strided_iteration/foreach_common.nim
Lines 102 to 120 in 1a68830