Skip to content

Latest commit

 

History

History
89 lines (74 loc) · 6.03 KB

README.md

File metadata and controls

89 lines (74 loc) · 6.03 KB

DataFusion Examples

This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.

Prerequisites

Run git submodule update --init to init test files.

Running Examples

To run an example, use the cargo run command, such as:

git clone https://github.com/apache/datafusion
cd datafusion
# Download test data
git submodule update --init

# Change to the examples directory
cd datafusion-examples/examples

# Run the `dataframe` example:
# ... use the equivalent for other examples
cargo run --example dataframe

Single Process

  • advanced_udaf.rs: Define and invoke a more complicated User Defined Aggregate Function (UDAF)
  • advanced_udf.rs: Define and invoke a more complicated User Defined Scalar Function (UDF)
  • advanced_udwf.rs: Define and invoke a more complicated User Defined Window Function (UDWF)
  • advanced_parquet_index.rs: Creates a detailed secondary index that covers the contents of several parquet files
  • analyzer_rule.rs: Use a custom AnalyzerRule to change a query's semantics (row level access control)
  • catalog.rs: Register the table into a custom catalog
  • composed_extension_codec: Example of using multiple extension codecs for serialization / deserialization
  • csv_sql_streaming.rs: Build and run a streaming query plan from a SQL statement against a local CSV file
  • csv_json_opener.rs: Use low level FileOpener APIs to read CSV/JSON into Arrow RecordBatches
  • custom_datasource.rs: Run queries against a custom datasource (TableProvider)
  • custom_file_format.rs: Write data to a custom file format
  • dataframe-to-s3.rs: Run a query using a DataFrame against a parquet file from s3 and writing back to s3
  • dataframe.rs: Run a query using a DataFrame API against parquet files, csv files, and in-memory data, including multiple subqueries. Also demonstrates the various methods to write out a DataFrame to a table, parquet file, csv file, and json file.
  • deserialize_to_struct.rs: Convert query results (Arrow ArrayRefs) into Rust structs
  • expr_api.rs: Create, execute, simplify, analyze and coerce Exprs
  • file_stream_provider.rs: Run a query on FileStreamProvider which implements StreamProvider for reading and writing to arbitrary stream sources / sinks.
  • flight_sql_server.rs: Run DataFusion as a standalone process and execute SQL queries from JDBC clients
  • function_factory.rs: Register CREATE FUNCTION handler to implement SQL macros
  • optimizer_rule.rs: Use a custom OptimizerRule to replace certain predicates
  • parquet_index.rs: Create an secondary index over several parquet files and use it to speed up queries
  • parquet_exec_visitor.rs: Extract statistics by visiting an ExecutionPlan after execution
  • parse_sql_expr.rs: Parse SQL text into DataFusion Expr.
  • plan_to_sql.rs: Generate SQL from DataFusion Expr and LogicalPlan
  • planner_api.rs APIs to manipulate logical and physical plans
  • pruning.rs: Use pruning to rule out files based on statistics
  • query-aws-s3.rs: Configure object_store and run a query against files stored in AWS S3
  • query-http-csv.rs: Configure object_store and run a query against files vi HTTP
  • regexp.rs: Examples of using regular expression functions
  • remote_catalog.rs: Examples of interfacing with a remote catalog (e.g. over a network)
  • simple_udaf.rs: Define and invoke a User Defined Aggregate Function (UDAF)
  • simple_udf.rs: Define and invoke a User Defined Scalar Function (UDF)
  • simple_udfw.rs: Define and invoke a User Defined Window Function (UDWF)
  • sql_analysis.rs: Analyse SQL queries with DataFusion structures
  • sql_frontend.rs: Create LogicalPlans (only) from sql strings
  • sql_dialect.rs: Example of implementing a custom SQL dialect on top of DFParser
  • sql_query.rs: Query data using SQL (in memory RecordBatches, local Parquet files)
  • date_time_function.rs: Examples of date-time related functions and queries.

Distributed