Skip to content

Commit

Permalink
document laziness of parallelize
Browse files Browse the repository at this point in the history
Took me several hours to figure out this behavior. It would be good to highlight it in the documentation.

Author: Ariel Rabkin <[email protected]>

Closes apache#1070 from asrabkin/master and squashes the following commits:

29a076e [Ariel Rabkin] doc fix
  • Loading branch information
Ariel Rabkin authored and rxin committed Jun 13, 2014
1 parent a6e0afd commit 0154587
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions core/src/main/scala/org/apache/spark/SparkContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -434,12 +434,21 @@ class SparkContext(config: SparkConf) extends Logging {

// Methods for creating RDDs

/** Distribute a local Scala collection to form an RDD. */
/** Distribute a local Scala collection to form an RDD.
*
* @note Parallelize acts lazily. If `seq` is a mutable collection and is
* altered after the call to parallelize and before the first action on the
* RDD, the resultant RDD will reflect the modified collection. Pass a copy of
* the argument to avoid this.
*/
def parallelize[T: ClassTag](seq: Seq[T], numSlices: Int = defaultParallelism): RDD[T] = {
new ParallelCollectionRDD[T](this, seq, numSlices, Map[Int, Seq[String]]())
}

/** Distribute a local Scala collection to form an RDD. */
/** Distribute a local Scala collection to form an RDD.
*
* This method is identical to `parallelize`.
*/
def makeRDD[T: ClassTag](seq: Seq[T], numSlices: Int = defaultParallelism): RDD[T] = {
parallelize(seq, numSlices)
}
Expand Down

0 comments on commit 0154587

Please sign in to comment.