Home

Jump to bottom Edit New page

dharmeshkakadia edited this page Oct 10, 2012 · 16 revisions

We try to provide a solution of deploying Hadoop jobs across multiple cloud service providers.

Features

Auto Scaling - Based on the deadline, infrastructure and cost requirements, the scheduler will smartly scale the resources allocated to a job. The user is also provided an option to manually scale up/down.
Ability to run across multiple cloud providers - If you have multiple jobs and access to multiple cloud providers, HadoopStack provides you the ability to run different jobs on different clouds.
Job scheduling for minimizing cost and completion time - HadoopStack uses machine learning to smartly allocate jobs across multiple cloud providers aiming to reduce your cost and task-completion time.
Performance optimization with storage integration - With storage administration privileges, HadoopStack auto-replicates the resources that being utilized heavily, thus improving performance.
Client Tools - A web interface and a simple command line interface for interacting with HadoopStack.

Use Cases

Running multiple jobs on a private cloud
Running multiple jobs on a public cloud
Running multiple jobs across private and public clouds

HadoopStack-deployment

Architecture

HadoopStack Server Architecture

RoadMap

Improving instance monitoring for better scheduling.
Integration of other frameworks - R, mahout, hive/pig etc.

Toggle table of contents Pages 4

Add a custom sidebar

Clone this wiki locally