Support for ArrayOps functionality? #39

flock0 · 2015-06-01T09:43:28Z

Is there the possibility / an easy way of supporting the various methods from ArrayOps?

densh · 2015-06-01T10:43:05Z

Unfortunately that would trigger boxing twice on every operation and can cause up to an order of magnitude slow-down. Our current strategy is to re-implement all of the methods people care about ourselves. What methods are you missing?

flock0 · 2015-06-01T11:18:49Z

The most important ones would be (in order of importance):

filter
map
foldLeft
sameElements
endsWith

densh · 2015-06-01T11:34:34Z

map is already implemented. The rest are now on my to-do list.

flock0 · 2015-06-01T11:37:06Z

Thank you!

ghost · 2015-06-13T17:21:27Z

Hi Denys,

I can take this into a branch and work on it in parallel while we flush out the remaining jemalloc implementation.

densh · 2015-06-13T19:04:41Z

@arosenberger Sounds good. Implementation of the current array macros lies in macros/Array.scala. There isn't much documentation on internals yet so let me know if something is not clear.

ghost · 2015-07-18T16:57:58Z

@densh I'm unsure if there is a way to do the filter efficiently from avoiding either

double pass through the source array (to determine the size of the filtered array)
multiple potential resizes of a new memory area if we size the filtered array memory up front

Could be I'm missing something - what do you think?

densh · 2015-07-19T11:41:02Z

@arosenberger

filter on array is indeed hard (impossible?) to implement efficiently if you have to return array as the result of the filter. Something like vectors would probably fare much better as you could re-use the same memory and just change the structure of the tree without copying things excessively.

I think the rule of thumb for scala-offheap should be if we can't implement something efficiently we should rather not have it at all rather than providing slow default implementation like standard Scala collections do. filter seems to fall in this area at the moment but it might change if we add support for vectors in the future (see #9).

ghost · 2015-07-20T13:23:22Z

I've got a close to working implementation of filter that does a malloc with the length of the original array, adds elements as they pass the predicate, then a final realloc to the actual filtered element count. We can run some jmh benchmarks on different array sizes vs some baselines like on-heap array filter or e.g. offheap map. We can throw it away if it looks to be too slow.

densh · 2015-07-20T14:17:12Z

The problem is that it needs to work for any allocator (that is taken as an implicit parameter to all allocating methods), not just malloc. Region allocators don't support free and realloc at all for example. You can always just allocate new copy on every next iteration but that might massively over-allocate memory for big arrays.

Looking forward to jmh performance comparison anyway, maybe it's not that bad in practice and the cost can be tolerated.

DarkDimius · 2015-07-20T14:27:56Z

that does a malloc with the length of the original array, adds elements as they pass the predicate, then a final realloc to the actual filtered element count

A trick can be done to speed this up: allocate an array of longs, that has size of original array divided by 64, and set bit if predicate passes. Why longs? That's because you can collect a batch of 64 tests of predicate and make a single write. Additionally, you would maintain a counter of how many tests have succeeded. Than have a second pass that actually copies elements that have the bit set.

You could have used array of Booleans and relied on write combining inside of the CPU but as, you execute a custom user-supplied predicate, it is very easy for writes to stop being combined.

densh · 2015-07-27T12:47:27Z

@DarkDimius That's an excellent idea, we need to try that.
@arosenberger Any progress in this area yet? I was planning to split this large issue into smaller one-method-per-issue ones.

ghost · 2015-07-27T13:15:52Z

@DarkDimius Thanks for the suggestion, I will try that as a variant
@densh I've got this close but haven't had a chance to look in a few days. Right now I get errors when I try to run the test I put in place with the following code. Any chance you could spot check and see if something is obviously incorrect? https://gist.github.com/arosenberger/9b00492e94abf6910a38

densh · 2015-07-27T13:37:21Z

@arosenberger It looks like you might be forgetting to call .symbol in a number of places. freshVal and freshVar return val definition, not an identifier.

ghost · 2015-07-30T00:20:44Z

@densh I cleaned this up some and got a test mostly passing. But I'm unsure as to what I'm doing wrong for the final array size/length. It is possible that the realloc is not actually shrinking the allocated memory, or more likely I've not set things correctly in the macro body. I'll add some debug code to the jni binding on my other machine and test.

Gist is updated, and this test produces this output

test("filter") {
val arr = Array(1, 2, 3, 4)
val narr = arr.filter(v => v % 2 == 0)
assert(narr.nonEmpty)
// assert(narr.size == 2)
assert(narr(0) == 2)
assert(narr(1) == 4)
println(s"New array has length ${narr.length}")
println(s"New array has size ${narr.size}")
}

Element 2 matches predicate
Element 4 matches predicate
Final size is 2
New array has length 4
New array has size 4

densh · 2015-07-30T08:59:50Z

@arosenberger Reallocate takes number of bytes, not elements. You need to multiply computed size on sizeOf of element type.

ghost · 2015-07-30T11:22:46Z

@densh I believe I'm doing so. This is the size argument I'm passing to reallocate:

${newArrayFinalSize.symbol} = $sizeOfHeader + ${strideOf(A)} * ${newArrayIndex.symbol}

ghost · 2015-08-01T18:03:05Z

@densh First pull request opened for filter. Going to switch to other requested array methods before trying to optimize filter implementation

densh · 2015-08-10T12:16:48Z

I'm splitting this ticket into a number of smaller actionable tickets:

densh added Feature Help Needed labels Jun 5, 2015

densh mentioned this issue Jun 16, 2015

Add support for fixed sized arrays #47

Closed

densh mentioned this issue Jul 19, 2015

Consider adding support for off-heap vectors #9

Open

densh closed this as completed Aug 10, 2015

densh mentioned this issue Aug 10, 2015

Add support for Array.filter #78

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for ArrayOps functionality? #39

Support for ArrayOps functionality? #39

flock0 commented Jun 1, 2015

densh commented Jun 1, 2015

flock0 commented Jun 1, 2015

densh commented Jun 1, 2015

flock0 commented Jun 1, 2015

ghost commented Jun 13, 2015

densh commented Jun 13, 2015

ghost commented Jul 18, 2015

densh commented Jul 19, 2015

ghost commented Jul 20, 2015

densh commented Jul 20, 2015

DarkDimius commented Jul 20, 2015

densh commented Jul 27, 2015

ghost commented Jul 27, 2015

densh commented Jul 27, 2015

ghost commented Jul 30, 2015

densh commented Jul 30, 2015

ghost commented Jul 30, 2015

ghost commented Aug 1, 2015

densh commented Aug 10, 2015

Support for ArrayOps functionality? #39

Support for ArrayOps functionality? #39

Comments

flock0 commented Jun 1, 2015

densh commented Jun 1, 2015

flock0 commented Jun 1, 2015

densh commented Jun 1, 2015

flock0 commented Jun 1, 2015

ghost commented Jun 13, 2015

densh commented Jun 13, 2015

ghost commented Jul 18, 2015

densh commented Jul 19, 2015

ghost commented Jul 20, 2015

densh commented Jul 20, 2015

DarkDimius commented Jul 20, 2015

densh commented Jul 27, 2015

ghost commented Jul 27, 2015

densh commented Jul 27, 2015

ghost commented Jul 30, 2015

densh commented Jul 30, 2015

ghost commented Jul 30, 2015

ghost commented Aug 1, 2015

densh commented Aug 10, 2015