Skip to content
This repository has been archived by the owner on Apr 19, 2020. It is now read-only.

Support for ArrayOps functionality? #39

Closed
flock0 opened this issue Jun 1, 2015 · 19 comments
Closed

Support for ArrayOps functionality? #39

flock0 opened this issue Jun 1, 2015 · 19 comments

Comments

@flock0
Copy link
Contributor

flock0 commented Jun 1, 2015

Is there the possibility / an easy way of supporting the various methods from ArrayOps?

@densh
Copy link
Owner

densh commented Jun 1, 2015

Unfortunately that would trigger boxing twice on every operation and can cause up to an order of magnitude slow-down. Our current strategy is to re-implement all of the methods people care about ourselves. What methods are you missing?

@flock0
Copy link
Contributor Author

flock0 commented Jun 1, 2015

The most important ones would be (in order of importance):

  • filter
  • map
  • foldLeft
  • sameElements
  • endsWith

@densh
Copy link
Owner

densh commented Jun 1, 2015

map is already implemented. The rest are now on my to-do list.

@flock0
Copy link
Contributor Author

flock0 commented Jun 1, 2015

Thank you!

@ghost
Copy link

ghost commented Jun 13, 2015

Hi Denys,

I can take this into a branch and work on it in parallel while we flush out the remaining jemalloc implementation.

@densh
Copy link
Owner

densh commented Jun 13, 2015

@arosenberger Sounds good. Implementation of the current array macros lies in macros/Array.scala. There isn't much documentation on internals yet so let me know if something is not clear.

@ghost
Copy link

ghost commented Jul 18, 2015

@densh I'm unsure if there is a way to do the filter efficiently from avoiding either

  • double pass through the source array (to determine the size of the filtered array)
  • multiple potential resizes of a new memory area if we size the filtered array memory up front

Could be I'm missing something - what do you think?

@densh
Copy link
Owner

densh commented Jul 19, 2015

@arosenberger

filter on array is indeed hard (impossible?) to implement efficiently if you have to return array as the result of the filter. Something like vectors would probably fare much better as you could re-use the same memory and just change the structure of the tree without copying things excessively.

I think the rule of thumb for scala-offheap should be if we can't implement something efficiently we should rather not have it at all rather than providing slow default implementation like standard Scala collections do. filter seems to fall in this area at the moment but it might change if we add support for vectors in the future (see #9).

@ghost
Copy link

ghost commented Jul 20, 2015

I've got a close to working implementation of filter that does a malloc with the length of the original array, adds elements as they pass the predicate, then a final realloc to the actual filtered element count. We can run some jmh benchmarks on different array sizes vs some baselines like on-heap array filter or e.g. offheap map. We can throw it away if it looks to be too slow.

@densh
Copy link
Owner

densh commented Jul 20, 2015

The problem is that it needs to work for any allocator (that is taken as an implicit parameter to all allocating methods), not just malloc. Region allocators don't support free and realloc at all for example. You can always just allocate new copy on every next iteration but that might massively over-allocate memory for big arrays.

Looking forward to jmh performance comparison anyway, maybe it's not that bad in practice and the cost can be tolerated.

@DarkDimius
Copy link
Collaborator

that does a malloc with the length of the original array, adds elements as they pass the predicate, then a final realloc to the actual filtered element count

A trick can be done to speed this up: allocate an array of longs, that has size of original array divided by 64, and set bit if predicate passes. Why longs? That's because you can collect a batch of 64 tests of predicate and make a single write. Additionally, you would maintain a counter of how many tests have succeeded. Than have a second pass that actually copies elements that have the bit set.

You could have used array of Booleans and relied on write combining inside of the CPU but as, you execute a custom user-supplied predicate, it is very easy for writes to stop being combined.

@densh
Copy link
Owner

densh commented Jul 27, 2015

@DarkDimius That's an excellent idea, we need to try that.
@arosenberger Any progress in this area yet? I was planning to split this large issue into smaller one-method-per-issue ones.

@ghost
Copy link

ghost commented Jul 27, 2015

@DarkDimius Thanks for the suggestion, I will try that as a variant
@densh I've got this close but haven't had a chance to look in a few days. Right now I get errors when I try to run the test I put in place with the following code. Any chance you could spot check and see if something is obviously incorrect? https://gist.github.com/arosenberger/9b00492e94abf6910a38

@densh
Copy link
Owner

densh commented Jul 27, 2015

@arosenberger It looks like you might be forgetting to call .symbol in a number of places. freshVal and freshVar return val definition, not an identifier.

@ghost
Copy link

ghost commented Jul 30, 2015

@densh I cleaned this up some and got a test mostly passing. But I'm unsure as to what I'm doing wrong for the final array size/length. It is possible that the realloc is not actually shrinking the allocated memory, or more likely I've not set things correctly in the macro body. I'll add some debug code to the jni binding on my other machine and test.

Gist is updated, and this test produces this output

test("filter") {
val arr = Array(1, 2, 3, 4)
val narr = arr.filter(v => v % 2 == 0)
assert(narr.nonEmpty)
// assert(narr.size == 2)
assert(narr(0) == 2)
assert(narr(1) == 4)
println(s"New array has length ${narr.length}")
println(s"New array has size ${narr.size}")
}

Element 2 matches predicate
Element 4 matches predicate
Final size is 2
New array has length 4
New array has size 4

@densh
Copy link
Owner

densh commented Jul 30, 2015

@arosenberger Reallocate takes number of bytes, not elements. You need to multiply computed size on sizeOf of element type.

@ghost
Copy link

ghost commented Jul 30, 2015

@densh I believe I'm doing so. This is the size argument I'm passing to reallocate:

${newArrayFinalSize.symbol} = $sizeOfHeader + ${strideOf(A)} * ${newArrayIndex.symbol}

@ghost
Copy link

ghost commented Aug 1, 2015

@densh First pull request opened for filter. Going to switch to other requested array methods before trying to optimize filter implementation

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants