Adapt the parallel-collections module to the new design #328

julienrf · 2017-12-22T12:43:04Z

I’m creating the issue here but it’s about the following module: https://github.com/scala/scala-parallel-collections

We might not have a GenIterable type in the standard-library, but for people willing to generically program against an arbitrary collection, whether it is parallel or sequential, we could add a GenIterable type in the scala-parallel-collections module, and create implicit conversions from the standard collections to this type.

The text was updated successfully, but these errors were encountered:

julienrf · 2018-01-17T15:01:36Z

Here is one possible path to migrate the parallel-collections:

Make ParIterable, ParSet, ParArray, etc. a type hierarchy that shares nothing with the standard (sequential) collections. They will have a seq operation that transforms a parallel collection into a sequential one, but they will have no common supertype with the sequential collections (except maybe for IterableOnce which could simplify interoperability with sequential collections -- e.g. that would allow users to write List.from(ParVector(1, 2, 3))) ;
- more precisely, ParIterable[A] will have a seq: Iterable[A] operation, whereas ParSet[A] will have a seq: Set[A] operation: the corresponding sequential type of a parallel collection will be specialized for each parallel collection (this is already the case today),
- in 2.12 we can write List(1, 2, 3).to[ParArray] and ParArray(1, 2, 3).to[List], because there is a CanCombineFrom type that extends CanBuildFrom (which is used by the to operation). I think we don’t have to support that level of interoperability and users could just write ParArray.from(List(1, 2, 3)) or List.from(ParArray(1, 2, 3)) (note that the latter requires that a parallel collection can at least be viewed as an IterableOnce),
  - that being said if this is really important we could easily make parallel collection companion objects be implicitly convertible to a strawman.collection.Factory type, which is used by the new to operation,
- the GenXxx types would be removed (this means that people using them will not be able to migrate),
- we could (optionally) provide a conversion from a parallel collection to a Java Spliterator,
As a second step, which can be performed after 2.13 is released, could be to re-introduce GenXxx traits, make parallel collections inherit them, and provide an implicit conversion from the sequential collections to the GenXxx traits so that people willing to program generically against collections, whether they are parallel or sequential, can do it (for some operations that totally makes sense). More importantly, this step will make it easier for people using these types to migrate existing code bases or to cross-compile between 2.12 and 2.13.
As a third step, since we will not anymore be tied to the sequential collections nor the Scala releases, we could redesign the parallel collections to give more control to users on execution context and evaluation (currently every transformation operation blocks until it is completed).

odersky · 2018-01-17T18:05:49Z

Looks good! Actually I don't think that IterableOnce would work as a common supertype since it also assumes that things are done sequentially. IterableInAnyOrder comes closer, that's actually the same as the previous GenIterable. Maybe something like this?

IterableInAnyOrder[A] {
  type Collection[A]
  // operation signatures defined in terms of `A` and `Collection[A]`
}

Ichoran · 2018-01-17T18:17:30Z

I don't think it should even be called an IterableWhatever; an Iterable produces an Iterator, which operates sequentially. Parallel collections invert control: you can pass them functions, but you can't say "okay, now give me another one", which is the key operation of Iterator.

I don't think you need such a supertrait, though. Handling both regular and parallel collections generically is an anti-pattern. There are too many ways that an assumption of sequential access can seep into the code. It's fine to have methods with the same signatures, but I'd rather that they were unrelated in the hierarchy. At the very most, maybe they could both have a typeclass that abstracted the common methods.

julienrf · 2018-01-17T20:58:06Z

Just to clarify: IterableOnce[A] has only one operation: iterator(): Iterator[A]. So, the only thing you can do with an IterableOnce is to get an Iterator from it. That iterator will always be sequential and will never be used by parallel operations. So the only purpose of making parallel collections extend IterableOnce is to improve interop: a parallel collection could then be passed as a parameter where an IterableOnce is expected (now you can argue that it’s better to ask the programmer to explicitly convert its parallel collection to a sequential one by using the seq method).
IterableOnce is not going to replace GenIterable: it is not going to be a way to generically program against both sequential and parallel collections. Speaking about that, I don’t think having a such a generic type of collection is that bad. It’s like having an apply(idx: Int): A method on List[A]. It can be convenient in some cases, if you know what you are doing. Typically, calling map without knowing if this operation is going to be performed sequentially or in parallel does not seem to be bad to me.

sjrd · 2018-01-17T21:07:08Z

Giving one data point: in the Scala.js linker we do use parallel collections for the parallel optimizer. However, we also support a sequential optimizer that does not depend on parallel collections, since our linker cross-compiles to JVM/JS, and on JS it cannot use parallel collections.

To that end, we kind of wanted to use the generic aspect of interfaces for sequential and parallel collections. In practice, though, we ended up needing an ad hoc abstract interface with abstract types for the operations that we care about, and implement it separately in the sequential optimizer and the parallel optimizer. Concretely, this means that even the high amount of sharing in the Scala 2.8-2.12 collections is not enough for a real use case, and requires ad hoc infrastructure anyway. Having less than that is only going to make it worse.

I would therefore recommend that sequential and parallel collections be completely separated in the new collections, without any common interface.

SethTisue · 2019-02-20T22:28:15Z

scala/scala-parallel-collections#42

julienrf added the blocker label Dec 22, 2017

julienrf mentioned this issue Jan 18, 2018

Various bug fixes, additional methods and deprecated forwarders #346

Merged

szeiger added this to the Scala 2.13.0-M4 milestone Apr 23, 2018

SethTisue modified the milestones: Scala 2.13.0-M4, Scala 2.13.0-M5 May 14, 2018

julienrf added the in progress label Jul 19, 2018

SethTisue modified the milestones: Scala 2.13.0-M5, Scala 2.13.0-RC1 Aug 15, 2018

SethTisue closed this as completed Feb 20, 2019

SethTisue mentioned this issue May 12, 2020

Scala 2.13 woes: Parallelism silently lost scala/scala-parallel-collections#101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt the parallel-collections module to the new design #328

Adapt the parallel-collections module to the new design #328

julienrf commented Dec 22, 2017

julienrf commented Jan 17, 2018 •

edited

Loading

odersky commented Jan 17, 2018

Ichoran commented Jan 17, 2018

julienrf commented Jan 17, 2018

sjrd commented Jan 17, 2018

SethTisue commented Feb 20, 2019

Adapt the parallel-collections module to the new design #328

Adapt the parallel-collections module to the new design #328

Comments

julienrf commented Dec 22, 2017

julienrf commented Jan 17, 2018 • edited Loading

odersky commented Jan 17, 2018

Ichoran commented Jan 17, 2018

julienrf commented Jan 17, 2018

sjrd commented Jan 17, 2018

SethTisue commented Feb 20, 2019

julienrf commented Jan 17, 2018 •

edited

Loading