Skip to content
dharmeshkakadia edited this page Oct 10, 2012 · 16 revisions

We try to provide a solution of deploying Hadoop jobs across multiple cloud service providers.

Features

  • Auto Scaling - Based on the deadline, infrastructure and cost requirements, the scheduler will smartly scale the resources allocated to a job. The user is also provided an option to manually scale up/down.
  • Ability to run across multiple cloud providers - If you have multiple jobs and access to multiple cloud providers, HadoopStack provides you the ability to run different jobs on different clouds.
  • Job scheduling for minimizing cost and completion time - HadoopStack uses machine learning to smartly allocate jobs across multiple cloud providers aiming to reduce your cost and task-completion time.
  • Performance optimization with storage integration - With storage administration privileges, HadoopStack auto-replicates the resources that being utilized heavily, thus improving performance.
  • Client Tools - A web interface and a simple command line interface for interacting with HadoopStack.

Use Cases

  • Running multiple jobs on a private cloud
  • Running multiple jobs on a public cloud
  • Running multiple jobs across private and public clouds

HadoopStack-deployment

Architecture

HadoopStack Server Architecture

RoadMap

  • Improving instance monitoring for better scheduling.
  • Integration of other frameworks - R, mahout, hive/pig etc.
Clone this wiki locally