-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scala 2.13.0 support #42
Conversation
Aligning this more with the new collections design is generally desirable. After all, we did make the changes there for a reason. My plan for the parallel collections was to keep
It should extend
I'd try to avoid unnecessary changes for now. Unless there's a compelling reason for
Sequential operation can still be useful for the same reason that sequential collections are allowed to override them instead of always using |
Hi Julien,
Interesting that this approach was untilmately easier than keeping
CanBuildFrom! But, as Stefan says that's even better; we should align
parallel and sequential where possible.
I also think that we should keep the API as close as possible to what we
have, including sequential operations.
Cheers
- Martin
…On Tue, Aug 14, 2018 at 6:17 PM, Stefan Zeiger ***@***.***> wrote:
not use CanBuildFrom (or CanCombineFrom) for transformation operations
(e.g. map), similarly to what we have in the new collections, but that
requires introducing a new type parameter (CC[_]) to the template traits (
ParIterableLike). I’d like to discuss this change with you. I made that
change for symmetry with the new collections but it is not required
Aligning this more with the new collections design is generally desirable.
After all, we did make the changes there for a reason. My plan for the
parallel collections was to keep CanBuildFrom (at least initially) in
order to make it simpler to port. If that's not actually true we may as
well change it.
flatMap takes a function f: A => IterableOnce[B] (instead of f: A =>
GenTraversableOnce[B], previously), although the parallel collections
hierarchy does *not* extend IterableOnce (yet?)
It should extend IterableOnce. An Iterator is a natural way of traversing
a parallel collection sequentially. This was not the case with
TraversableOnce vs GenTraversableOnce because their internal iteration
allowed parallel collections to perform it concurrently.
They could, they already have an iterator member (which is typed as
Splitter, a specialization of Iterator). Or, should we completely
decouple Splitter from Iterator?
I'd try to avoid unnecessary changes for now. Unless there's a compelling
reason for Splitter not to extend Iterator anymore, keep it that way.
Depending on the first point, what operations should we have on parallel
collections? I’m tempted to drop sequential operations (such as foldLeft)
in favor of parallel-only operations (such as fold). But that will cause
migration pain to users, and this is more work for us.
Sequential operation can still be useful for the same reason that
sequential collections are allowed to override them instead of always using
Iterator implementations: Better performance.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAwlVhcPbOtOwtLvPvh2Ml-sW9UDyRdtks5uQvgQgaJpZM4V7Mh->
.
--
Martin Odersky
EPFL and Lightbend
|
Great, thanks for your feedback! So, let’s make parallel collections extend |
I’ve made some progress but I’ve been unable to make mutable and immutable hashmap combiners work. The model of the underlying data structure based on an scala> import scala.collection.parallel.mutable._
import scala.collection.parallel.mutable._
scala> val pmap = ParHashMap(1 -> 0, 2 -> 0, 3 -> 0)
pmap: scala.collection.parallel.mutable.ParHashMap[Int,Int] = ParHashMap(1 -> 0, 2 -> 0, 3 -> 0)
scala> pmap ++ List(1 -> 1)
res0: scala.collection.parallel.mutable.ParHashMap[Int,Int] = ParHashMap(1 -> 0, 2 -> 0, 3 -> 0)
scala> pmap += (1 -> 1)
res1: pmap.type = ParHashMap(1 -> 1, 2 -> 0, 3 -> 0) As you can see, the result of the concatenation shows the |
b ++= elems | ||
b.result() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods (empty
and apply
) were previously inherited from GenericCompanion
, which has been removed from the standard library.
extends Factory[A, CC[A]] with Serializable{ | ||
def fromSpecific(it: IterableOnce[A]): CC[A] = (parFactory.newBuilder[A] ++= it).result() | ||
def newBuilder: mutable.Builder[A, CC[A]] = parFactory.newBuilder | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These conversions towards scala.collection.Factory
provide interoperability with the standard collections. It makes it possible to write List(1, 2, 3).to(ParArray)
.
) extends Factory[(K, V), CC[K, V]] with Serializable { | ||
def fromSpecific(it: IterableOnce[(K, V)]): CC[K, V] = (parFactory.newCombiner[K, V] ++= it).result() | ||
def newBuilder: mutable.Builder[(K, V), CC[K, V]] = parFactory.newCombiner | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with maps, to support Map(1 -> 0, 2 -> 0).to(ParMap)
.
} | ||
} | ||
b.result() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the above factory methods were previously inherited from GenTraversableFactory
.
extends GenericParMapCompanion[CC] { | ||
|
||
type Coll = MapColl | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods below were previously inherited from GenMapFactory
.
junit/src/test/scala/MiscTest.scala
Outdated
@@ -77,7 +71,7 @@ class MiscTest { | |||
assert(ex.getSuppressed.size > 0) | |||
assert(ex.getSuppressed.forall(_.isInstanceOf[MultipleOf37Exception])) | |||
assert(ex.i == 37) | |||
assert(ex.getSuppressed.map(_.asInstanceOf[MultipleOf37Exception].i).toList == List(74, 148, 259, 518)) | |||
assert(ex.getSuppressed.map(_.asInstanceOf[MultipleOf37Exception].i).toList == List(/*74,*/ 148, 259, 518)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly, the value 74
is not part of the suppressed elements. I didn’t understand why. Maybe it’s a bug somewhere in ParSeq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should depend on how the scheduler exactly divided the tasks.
It's possible that the task corresponding to 74
never got scheduled (e.g. because tryCancel
was called on the ForkJoinTask
in an attempt to piggy-back it on the same worker).
I'd say that this assertion should check that all i
are a multiple of 37
, but not make any specific assumptions about the exact exceptions.
@@ -92,7 +86,8 @@ class MiscTest { | |||
def check[T](i: Int, f: Int => T): Unit = { | |||
val gseq = seqarr(i).toSeq.groupBy(f) | |||
val gpar = pararr(i).groupBy(f) | |||
assertEquals(gseq, gpar) | |||
assertTrue(gseq.forall { case (k, vs) => gpar.get(k).exists(_.sameElements(vs)) }) | |||
assertTrue(gpar.forall { case (k, vs) => gseq.get(k).exists(_.sameElements(vs)) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since sequential and parallel collections are not anymore comparable with ==
/equals
, I’m using these expressions instead.
@@ -1,44 +0,0 @@ | |||
package scala.collection.immutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does not make sense anymore since we don’t support equality comparisons between sequential and parallel collections.
|
||
val parIterable = ParIterable("a", "", " c", null) | ||
assertEquals(s"""${parIterable.stringPrefix}(a, "", " c", null)""", stringOf(parIterable)) | ||
assertEquals(s"""${parIterable.stringPrefix}(a, "")""", stringOf(parIterable, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn’t pass anymore. I guess we don’t want to support the fact that parallel collections used to be treated specially by ScalaRunTime
?
case e: java.lang.Exception => | ||
throw e | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above properties test the overloads that return maps.
scala/scala#7636 needs to be merged for these changes to compile. |
The idea is that the combiner data structure separates the keys into a fixed number of buckets. There are some additional details about this here: http://aleksandar-prokopec.com/resources/docs/techrep.pdf http://aleksandar-prokopec.com/resources/docs/my_thesis.pdf (page 49, PDF-wise) If you want, we can have a call, and I can explain in more detail.
I don't 100% remember if what I will say is correct, but I think what happens is the following:
Maybe this semantics should change, and right-hand-side-wins is correct.
The |
The reason why it stores all the updates is because it's more efficient to:
You could alternatively have a |
Yes, this is what I observe. But why was that “left-hand-side-wins” semantics correct? |
I'm not sure if it was correct, in retrospect. Maybe it was more appropriate from the viewpoint of some other transformation operations, but having just taken a look at the methods supported by I think right-hand wins semantics are actually the correct ones. |
It had to be correct because tests were passing :) |
Scala 2.12.4:
Ok, so you're saying that it used to work correctly, but it's not working correctly in the new implementation? |
Yes.
|
Actually the tests that used to pass still pass. I realized that there were no tests for the case where the combiner is a ParMap combiner (all tests use a ParIterable combiner). I’ve introduced new methods (overloads of |
Which methods need opposite reduction rules? |
Consider the following code: val mConcat = parallel.mutable.ParMap(1 -> 0) ++ List(1 -> 1)
assertEquals(parallel.mutable.ParMap(1 -> 1), mConcat)
val mMap = parallel.mutable.ParMap(1 -> 0, 3 -> 1).map { case (k, v) => (k % 2, v) }
assertEquals(parallel.mutable.ParMap(1 -> 1), mMap) I can either get the first assertion pass or the second but not both. |
How is the ordering of Array(UnrolledBuffer(DefaultEntry(1 -> 1), DefaultEntry(1 -> 0)), null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null) Which shows that the result of the second entry of the map is before the result of the first entry. |
Actually, I think the behavior I’m seeing was not specified. Here are some examples on Scala 2.12: import collection.parallel._
mutable.ParMap(1 -> 0) ++ List(1 -> 1) // ParHashMap(1 -> 0)
immutable.ParMap(1 -> 0) ++ List(1 -> 1) // ParMap(1 -> 1) It seems that when there is a clash in keys, the one that wins is unspecified. Therefore, I estimate that there is no regression introduced in this PR. |
Before rebasing: I think I should also squash the commits into a single one to not pollute the git history. Indeed, at the beginning of the work I’ve commented out most of the files and then I’ve uncommented them one by one until everything compiled. As a consequence, a |
sounds good -- I'd still keep the WIP branch available too, as it can help with reviewing |
- Old code from scala.collection.generic package has been imported from the 2.12.x branch. - Make ParIterableLike extend IterableOnce - Operations that used to take GenIterable collections as parameter now have two overloads, one taking a ParIterable and another taking an Iterable - VectorIterator has been deprecated, which is unfortunate because it is used for implementing ParVector. As a consequence, we now get a deprecation warning when we compile the project - Inline GenTraversableFactory and GenericCompanion from scala/scala - Move the contents that was previously in GenericCompanion into the GenericParCompanion trait - Remove ParallelConsistencyTest, which does not make sense anymore since sequential collection don’t take “generic” collections as parameter. - Move ExposedArraySeq logic to ParArray. - Parallel collections are not anymore comparable with sequential collections. Tests systematically replace `==` with `sameElements`. - Add support for the `to` conversion method in both directions (sequential to parallel and parallel to sequential). `toSeq`, `toIterable`, `toSet` and `toMap` used to be overridden to return parallel collections. I’ve kept that choice (although that makes it impossible to inherit from `IterableOnceOps`, then). - Restore usage of DoublingUnrolledBuffer - Restore FlatHashTable from scala/scala@9008c2f. Systematically use `areEqual` in tests with no strict order. - Implementation of the sequential Set returned by the `seq` member has been copied from scala/scala@056e1e9.
I’ve squashed the commits. The full history is still available in the 2.13.x-unsquashed branch. |
I think that, in order to provide different ordering semantics for different types of operations, the builder factories (e.g.
The method implementation could then instantiate the correct combiner, depending on its semantics. As far as I remember, this was not done in the past, partly because the operation semantics were not explicitly specified, I think. |
I don't think it is a regression either, mainly because the behavior was not documented, as far as I remember. |
Now that the required changes have been merged to scala/scala, the next thing that prevents this PR to be endorsed by the CI is the fact that there is no (public) ScalaCheck release which is binary compatible with the last Scala nightly. |
we can use dbuild here. trying it: https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/1725/ |
fwiw, another possibility would have been to modify .travis.yml to check out the scalacheck repo and do |
https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/1729/
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julienrf you added some comments here on GitHub that I think would be even better as code comments. and I see in the diffs, you commented out some things without explanation, could you either include a brief explanation for each of these, or just delete the code, as appropriate?
other than that, this looks good to go
I'd like to get this merged and roll a release. @julienrf up to you if you think it's worth it to do my comment suggestion in a separate PR |
hmm, I guess there's no point in rolling a release until 2.13.0-RC1 is out. we can't roll it for M5 |
we need to re-starr anyway, so i was thinking we could consider that starr as RC0 and use it for trial running releases like this one |
after scala/scala#7758, new STARR is 2.13.0-M5-1775dba which is on Maven Central but, I don't see a real need to do a trial scala-parallel-collections release, I don't think it would be useful in isolation, and I don't anticipate it making sense to publish a bunch of other libraries for 1775dba, either let's just continue using dbuild to test things. RC1 will be along soon enough |
scala/community-build@96fd70a moves the 2.13 community build to 1775dba; as usual, I'll track down resulting regressions, if any |
I’ve taken the existing code base, commented out everything but
ParVector
,ParIterable
andParIterableLike
, and fixed compilation errors. In some places, I’ve left the previous code commented, for reference, because the changes I’ve made are temporary.I’ve made a few decisions:
CanBuildFrom
(orCanCombineFrom
) for transformation operations (e.g.map
), similarly to what we have in the new collections, but that requires introducing a new type parameter (CC[_]
) to the template traits (ParIterableLike
). I’d like to discuss this change with you. I made that change for symmetry with the new collections but it is not required: we could still useCanCombineFrom
to drive the return type of transformation operations (we would drop theCanBuildFrom
part, though),flatMap
takes a functionf: A => IterableOnce[B]
(instead off: A => GenTraversableOnce[B]
, previously), although the parallel collections hierarchy does not extendIterableOnce
(yet?),ParIterable
does not extend anything from the standard collections (not evenIterableOnce
).Remaining design decisions to make:
flatMap
takes a function producing anIterableOnce
makes it currently impossible to use it with parallel collections. Relatedly: do we want to make it possible tozip
together a parallel collection and a sequential collection? Should parallel collections extendIterableOnce
? They could, they already have aniterator
member (which is typed asSplitter
, a specialization ofIterator
). Or, should we completely decoupleSplitter
fromIterator
?ParIterableLike
extendIterableOnce
foldLeft
) in favor of parallel-only operations (such asfold
). But that will cause migration pain to users, and this is more work for us.Remaining things to do:
ParIterableLike
extendIterableOnce
(easy)ParSeqLike
(easy),mutable.ParIterable
(easy),mutable.ParSeq
(easy),ParFactory
. It used to extendGenTraversableFactory
but that type doesn’t exist anymore. Do we want to replace it withIterableFactory
? (medium),mutable.ParArray
(hard, because thescala.collection.mutable.ArraySeq
API has changed a bit),immutable.ParSeq
(easy),immutable.ParRange
(easy)ParSet
,mutable.ParSet
andmutable.ParHashSet
(hard, and requires changes in scala-library to tune the visibility ofmutable.FlatHashTable
),immutable.ParSet
andimmutable.ParHashSet
(medium),ParMap
,mutable.ParMap
andmutable.ParHashMap
(hard, and requires changes in scala-library to tune the visibility ofmutable.HashTable
),immutable.ParMap
andimmutable.ParHashMap
(medium),mutable.ParTrieMap
(medium?)ParIterableLike
that I have commented to simplify the migration (e.g.to
) (hard),CollectionConverters
(medium)GenXxx
types intoParXxx
types. TheGenXxx
types have no reason to exist since they are not anymore a generalization of sequential and parallel collections (easy)immutable.ParArray
(easy?)This is a lot of work. Let’s try to parallelize it. Everybody is welcome! Send me a message on https://gitter.im/julienrf to tell me which bullet point you want to work on and I will update the PR description accordingly.