Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Adapt the parallel-collections module to the new design #328

Closed
julienrf opened this issue Dec 22, 2017 · 6 comments
Closed

Adapt the parallel-collections module to the new design #328

julienrf opened this issue Dec 22, 2017 · 6 comments

Comments

@julienrf
Copy link
Contributor

I’m creating the issue here but it’s about the following module: https://github.com/scala/scala-parallel-collections

We might not have a GenIterable type in the standard-library, but for people willing to generically program against an arbitrary collection, whether it is parallel or sequential, we could add a GenIterable type in the scala-parallel-collections module, and create implicit conversions from the standard collections to this type.

@julienrf
Copy link
Contributor Author

julienrf commented Jan 17, 2018

Here is one possible path to migrate the parallel-collections:

  • Make ParIterable, ParSet, ParArray, etc. a type hierarchy that shares nothing with the standard (sequential) collections. They will have a seq operation that transforms a parallel collection into a sequential one, but they will have no common supertype with the sequential collections (except maybe for IterableOnce which could simplify interoperability with sequential collections -- e.g. that would allow users to write List.from(ParVector(1, 2, 3))) ;
    • more precisely, ParIterable[A] will have a seq: Iterable[A] operation, whereas ParSet[A] will have a seq: Set[A] operation: the corresponding sequential type of a parallel collection will be specialized for each parallel collection (this is already the case today),
    • in 2.12 we can write List(1, 2, 3).to[ParArray] and ParArray(1, 2, 3).to[List], because there is a CanCombineFrom type that extends CanBuildFrom (which is used by the to operation). I think we don’t have to support that level of interoperability and users could just write ParArray.from(List(1, 2, 3)) or List.from(ParArray(1, 2, 3)) (note that the latter requires that a parallel collection can at least be viewed as an IterableOnce),
      • that being said if this is really important we could easily make parallel collection companion objects be implicitly convertible to a strawman.collection.Factory type, which is used by the new to operation,
    • the GenXxx types would be removed (this means that people using them will not be able to migrate),
    • we could (optionally) provide a conversion from a parallel collection to a Java Spliterator,
  • As a second step, which can be performed after 2.13 is released, could be to re-introduce GenXxx traits, make parallel collections inherit them, and provide an implicit conversion from the sequential collections to the GenXxx traits so that people willing to program generically against collections, whether they are parallel or sequential, can do it (for some operations that totally makes sense). More importantly, this step will make it easier for people using these types to migrate existing code bases or to cross-compile between 2.12 and 2.13.
  • As a third step, since we will not anymore be tied to the sequential collections nor the Scala releases, we could redesign the parallel collections to give more control to users on execution context and evaluation (currently every transformation operation blocks until it is completed).

@odersky
Copy link
Contributor

odersky commented Jan 17, 2018

Looks good! Actually I don't think that IterableOnce would work as a common supertype since it also assumes that things are done sequentially. IterableInAnyOrder comes closer, that's actually the same as the previous GenIterable. Maybe something like this?

IterableInAnyOrder[A] {
  type Collection[A]
  // operation signatures defined in terms of `A` and `Collection[A]`
}

@Ichoran
Copy link
Contributor

Ichoran commented Jan 17, 2018

I don't think it should even be called an IterableWhatever; an Iterable produces an Iterator, which operates sequentially. Parallel collections invert control: you can pass them functions, but you can't say "okay, now give me another one", which is the key operation of Iterator.

I don't think you need such a supertrait, though. Handling both regular and parallel collections generically is an anti-pattern. There are too many ways that an assumption of sequential access can seep into the code. It's fine to have methods with the same signatures, but I'd rather that they were unrelated in the hierarchy. At the very most, maybe they could both have a typeclass that abstracted the common methods.

@julienrf
Copy link
Contributor Author

Just to clarify: IterableOnce[A] has only one operation: iterator(): Iterator[A]. So, the only thing you can do with an IterableOnce is to get an Iterator from it. That iterator will always be sequential and will never be used by parallel operations. So the only purpose of making parallel collections extend IterableOnce is to improve interop: a parallel collection could then be passed as a parameter where an IterableOnce is expected (now you can argue that it’s better to ask the programmer to explicitly convert its parallel collection to a sequential one by using the seq method).
IterableOnce is not going to replace GenIterable: it is not going to be a way to generically program against both sequential and parallel collections. Speaking about that, I don’t think having a such a generic type of collection is that bad. It’s like having an apply(idx: Int): A method on List[A]. It can be convenient in some cases, if you know what you are doing. Typically, calling map without knowing if this operation is going to be performed sequentially or in parallel does not seem to be bad to me.

@sjrd
Copy link
Member

sjrd commented Jan 17, 2018

Giving one data point: in the Scala.js linker we do use parallel collections for the parallel optimizer. However, we also support a sequential optimizer that does not depend on parallel collections, since our linker cross-compiles to JVM/JS, and on JS it cannot use parallel collections.

To that end, we kind of wanted to use the generic aspect of interfaces for sequential and parallel collections. In practice, though, we ended up needing an ad hoc abstract interface with abstract types for the operations that we care about, and implement it separately in the sequential optimizer and the parallel optimizer. Concretely, this means that even the high amount of sharing in the Scala 2.8-2.12 collections is not enough for a real use case, and requires ad hoc infrastructure anyway. Having less than that is only going to make it worse.

I would therefore recommend that sequential and parallel collections be completely separated in the new collections, without any common interface.

@SethTisue
Copy link
Member

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants