Skip to content

Spark Native Functions

Matthew Powers edited this page May 2, 2021 · 1 revision

Spark native functions offer ease-of-use, flexibility and performance benefits far beyond what Spark user-defined functions (UDFs) can do. To learn more about Spark native functions, read this.

Once you build some native functions, you have to make a decision whether you want to use them only from Scala or also make them available via SparkSQL. Using them from Scala simply requires creating a Column-oriented instantiation API as done in Spark's functions. Using them from SparkSQL requires registration. spark-alchemy makes this easy.

Once you have built one or more native functions, you create a registration object that extends NativeFunctionRegistration and implements expressions, e.g., the way HLLFunctionRegistration does. As the comment in NativeFunctionRegistration says, this is code pulled from FunctionRegistry in OSS Spark.

To use the functions from SparkSQL, you have to register them by calling the equivalent of HLLFunctionRegistration.registerFunctions(spark).

That's all there is to it.

Clone this wiki locally