See the separate dataframe for information on the
DataFrame API and especially it's query language.
File |
What's Illustrated |
Basic |
Simple SQL queries. |
Types |
Overview of querying various data types. Note that dates and timestamps are treated separately below. |
DateTime |
Dealing with dates and timestamps. |
File |
What's Illustrated |
PartitionedTable |
Creating, updating and querying partitioned tables. |
ComplexTypes |
Dealing with arrays and nested records in SQL queries |
UDF |
How to use Scala "user defined functions" (UDFs) in Spark SQL. See this question on StackOverflow. |
UDT.scala |
User defined types from a SQL perspective -- depends on understanding UDF.scala |
UDAF.scala |
A simple User Defined Aggregation Function as introduced in Spark 1.5.0 |
UDAF2.scala |
A User Defined Aggregation Function with two parameters |
UDAF_Multi.scala |
A User Defined Aggregation Function that accumulates and returns multiple values |
File |
What's Illustrated |
JSON |
Basic JSON data. |
OutputJSON |
Write the result of a query against JSON back out as JSON text. This functionality is built into Spark 1.2.0, but the example was written to answer this question on StackOverflow in the days of Spark 1.1.0. |
JSONSchemaInference |
Examples of how Spark SQL infers a schema for a file of JSON documents, including multiple cases of schema conflict. |
Note: The following examples use the original external data source API. For the new V2 API introduced in
Spark 2.3.0, see https://github.com/spirom/spark-data-sources,
which explores the new API in some detail.
File |
What's Illustrated |
CustomRelationProvider |
How to use the external data source provider for simple integration with an external database engine. See the blog post External Data Sources in Spark 1.2.0. |
RelationProviderFilterPushdown |
More advanced integration using the external data source API, enabling filter and projection pushdown. See the blog post Filtering and Projection in Spark SQL External Data Sources. |
ExternalNonRectangular |
An illustration that the Spark SQL query compiler doesn't make much use of the above pushdown possibilities in the presence of a non-rectangular Schema, like that inferred from JSON data. |