-
Notifications
You must be signed in to change notification settings - Fork 10
/
problem82
22 lines (16 loc) · 1.22 KB
/
problem82
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
You have been given below code snippet, with intermediate output.
val z = sc.parallelize(List(1,2,3,4,5,6), 2)
// lets first print out the contents of the RDD with partition labels
def myfunc(index: Int, iter: lterator[(lnt)]): lterator[String] = {
iter.toList.map(x => "[partID:" + index + ", val: " + x + "]").iterator
}
//In each run , output could be different, while solving problem assume belowm output only.
z.mapPartitionsWithlndex(myfunc).collect
res28: Array[String] = Array([partlD:0, val: 1], [partlD:0, val: 2], [partlD:0, val: 3], [partlD:1, val: 4], [partlD:1, val: S], [partlD:1, val: 6])
Now apply aggregate method on RDD z , with two reduce function , first will select max value in each partition and second will add all the maximum values from all partitions.
Initialize the aggregate with value 5. hence expected output will be 16.
A. z.aggregate(5)(math.max(_, J, _ + _)
scala> val z = sc.parallelize(List(1,2,3,4,5,6), 2)
z: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at <console>:27
scala> def myfunc(index: Int, iter: Iterator[(Int)]): Iterator[String] = {iter.toList.map(x => "[partID:" + index + ", val: " + x + "]").iterator}
myfunc: (index: Int, iter: Iterator[Int])Iterator[String]