-
Notifications
You must be signed in to change notification settings - Fork 33
Home
henryr edited this page Mar 19, 2015
·
11 revisions
Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.
Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:
- Best of breed performance and scalability.
- Support for data stored in HDFS, Apache HBase and Amazon S3.
- Wide analytic SQL support, including window functions and subqueries.
- On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
- Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
- Apache-licensed, 100% open source.
To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.
This wiki is intended for developers interested in building Impala, learning about Impala's architecture and implementation or contributing - patches or otherwise - to the Impala project. For more information, check out the links in the sidebar, particularly Contributing to Impala and the developer mailing list.