-
Notifications
You must be signed in to change notification settings - Fork 2
Spark and Fast Data Analytics SIG
- Increase the Spark and Fast Data Analytics interoperability between different vendors
- Spark vs. Traditional Map Reduce – Which one is for you?
- Why we need Spark Streaming over other Streaming services (Flume, Storm, Kafka)? How do we make integration of Spark with these services easier for customers?
- Provide guidelines and use cases for Spark and Fast Data Analytics
- Provide guidelines for different Deployment methods for Spark on YARN, Mesos or Spark stand alone
With the increase in the popularity and usage of Hadoop and Spark the notion of Spark replacing Hadoop is getting its own popularity. But the answer to the above phrase is it depends, as Spark may totally replace Hadoop (Map-Reduce) for one particular use-case but not for all.
Technically Spark is already part of Hadoop, there are several components from Hadoop stack which Spark depends on like Zookeeper, Sentry, HDFS etc.
This SIG will help clearing the above myth, we will come up with use-cases and guidelines which will help businesses to choose between traditional Map-Reduce and Spark. We will focus on answering the question “Is Spark mature enough for standardization of its API’s and configurations?”
SIG will provide guidance and best practices to customers on securing a Spark cluster and application. Similarly, we will clear the gaps between usage of different streaming services and provide fast data analytics.
The SIG will focus on creating heuristics on Spark application performance enhancements as per the datatype, cluster size, node size and type of analytics in order to use optimum numbers of core, the number of executors, CPU etc.
SIG will come up or propose optimal configuration for Spark on YARN.SIG will propose on improvements in terms of logging and metrics on Spark itself.
- For all the above work SIG will use the latest stable release for Spark.
- Pradeep Roy, IBM (SIG Champion)
- Roman Shaposhnik, Linux Foundation
- Raj Desai, IBM
- Luciano Resende, IBM
- Nitin Lamba, Ampool
- Tanping Wang, IBM
- Bikas Saha, Hortonworks