Spark and Fast Data Analytics SIG

Mission

Increase the Spark and Fast Data Analytics interoperability between different vendors
Spark vs. Traditional Map Reduce – Which one is for you?
Why we need Spark Streaming over other Streaming services (Flume, Storm, Kafka)? How do we make integration of Spark with these services easier for customers?
Provide guidelines and use cases for Spark and Fast Data Analytics
Provide guidelines for different Deployment methods for Spark on YARN, Mesos or Spark stand alone

Introduction

With the increase in the popularity and usage of Hadoop and Spark the notion of Spark replacing Hadoop is getting its own popularity. But the answer to the above phrase is it depends, as Spark may totally replace Hadoop (Map-Reduce) for one particular use-case but not for all.

Technically Spark is already part of Hadoop, there are several components from Hadoop stack which Spark depends on like Zookeeper, Sentry, HDFS etc.

This SIG will help clearing the above myth, we will come up with use-cases and guidelines which will help businesses to choose between traditional Map-Reduce and Spark. We will focus on answering the question “Is Spark mature enough for standardization of its API’s and configurations?”

SIG will provide guidance and best practices to customers on securing a Spark cluster and application. Similarly, we will clear the gaps between usage of different streaming services and provide fast data analytics.

The SIG will focus on creating heuristics on Spark application performance enhancements as per the datatype, cluster size, node size and type of analytics in order to use optimum numbers of core, the number of executors, CPU etc.

SIG will come up or propose optimal configuration for Spark on YARN.SIG will propose on improvements in terms of logging and metrics on Spark itself.

For all the above work SIG will use the latest stable release for Spark.

SIG Membership

Pradeep Roy, IBM (SIG Champion)
Roman Shaposhnik, Linux Foundation
Raj Desai, IBM
Luciano Resende, IBM
Nitin Lamba, Ampool
Tanping Wang, IBM
Bikas Saha, Hortonworks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark and Fast Data Analytics SIG

Mission

Introduction

SIG Membership

Clone this wiki locally