layout | title | nav_order |
---|---|---|
page |
Configuration |
3 |
The following is the list of options that rapids-plugin-4-spark
supports.
On startup use: --conf [conf key]=[conf value]
. For example:
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.2.0.jar,cudf-0.15-cuda10-1.jar' \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleOps.enabled=true
At runtime use: spark.conf.set("[conf key]", [conf value])
. For example:
scala> spark.conf.set("spark.rapids.sql.incompatibleOps.enabled", true)
All configs can be set on startup, but some configs, especially for shuffle, will not work if they are set at runtime.
The RAPIDS Accelerator for Apache Spark can be configured to enable or disable specific GPU accelerated expressions. Enabled expressions are candidates for GPU execution. If the expression is configured as disabled, the accelerator plugin will not attempt replacement, and it will run on the CPU.
Please leverage the spark.rapids.sql.explain
setting to get
feedback from the plugin as to why parts of a query may not be executing on the GPU.
NOTE: Setting
spark.rapids.sql.incompatibleOps.enabled=true
will enable all the settings in the table below which are not enabled by default due to
incompatibilities.
Name | SQL Function(s) | Description | Default Value | Notes |
---|---|---|---|---|
spark.rapids.sql.expression.Abs | abs |
Absolute value | true | None |
spark.rapids.sql.expression.Acos | acos |
Inverse cosine | true | None |
spark.rapids.sql.expression.Acosh | acosh |
Inverse hyperbolic cosine | true | None |
spark.rapids.sql.expression.Add | + |
Addition | true | None |
spark.rapids.sql.expression.Alias | Gives a column a name | true | None | |
spark.rapids.sql.expression.And | and |
Logical AND | true | None |
spark.rapids.sql.expression.AnsiCast | Convert a column of one type of data into another type | true | None | |
spark.rapids.sql.expression.Asin | asin |
Inverse sine | true | None |
spark.rapids.sql.expression.Asinh | asinh |
Inverse hyperbolic sine | true | None |
spark.rapids.sql.expression.AtLeastNNonNulls | Checks if number of non null/Nan values is greater than a given value | true | None | |
spark.rapids.sql.expression.Atan | atan |
Inverse tangent | true | None |
spark.rapids.sql.expression.Atanh | atanh |
Inverse hyperbolic tangent | true | None |
spark.rapids.sql.expression.AttributeReference | References an input column | true | None | |
spark.rapids.sql.expression.BitwiseAnd | & |
Returns the bitwise AND of the operands | true | None |
spark.rapids.sql.expression.BitwiseNot | ~ |
Returns the bitwise NOT of the operands | true | None |
spark.rapids.sql.expression.BitwiseOr | | |
Returns the bitwise OR of the operands | true | None |
spark.rapids.sql.expression.BitwiseXor | ^ |
Returns the bitwise XOR of the operands | true | None |
spark.rapids.sql.expression.CaseWhen | when |
CASE WHEN expression | true | None |
spark.rapids.sql.expression.Cast | timestamp , tinyint , binary , float , smallint , string , decimal , double , boolean , cast , date , int , bigint |
Convert a column of one type of data into another type | true | None |
spark.rapids.sql.expression.Cbrt | cbrt |
Cube root | true | None |
spark.rapids.sql.expression.Ceil | ceiling , ceil |
Ceiling of a number | true | None |
spark.rapids.sql.expression.Coalesce | coalesce |
Returns the first non-null argument if exists. Otherwise, null | true | None |
spark.rapids.sql.expression.Concat | concat |
String concatenate NO separator | true | None |
spark.rapids.sql.expression.Contains | Contains | true | None | |
spark.rapids.sql.expression.Cos | cos |
Cosine | true | None |
spark.rapids.sql.expression.Cosh | cosh |
Hyperbolic cosine | true | None |
spark.rapids.sql.expression.Cot | cot |
Cotangent | true | None |
spark.rapids.sql.expression.CurrentRow$ | Special boundary for a window frame, indicating stopping at the current row | true | None | |
spark.rapids.sql.expression.DateAdd | date_add |
Returns the date that is num_days after start_date | true | None |
spark.rapids.sql.expression.DateDiff | datediff |
Returns the number of days from startDate to endDate | true | None |
spark.rapids.sql.expression.DateSub | date_sub |
Returns the date that is num_days before start_date | true | None |
spark.rapids.sql.expression.DayOfMonth | dayofmonth , day |
Returns the day of the month from a date or timestamp | true | None |
spark.rapids.sql.expression.DayOfWeek | dayofweek |
Returns the day of the week (1 = Sunday...7=Saturday) | true | None |
spark.rapids.sql.expression.DayOfYear | dayofyear |
Returns the day of the year from a date or timestamp | true | None |
spark.rapids.sql.expression.Divide | / |
Division | true | None |
spark.rapids.sql.expression.EndsWith | Ends with | true | None | |
spark.rapids.sql.expression.EqualNullSafe | <=> |
Check if the values are equal including nulls <=> | true | None |
spark.rapids.sql.expression.EqualTo | = , == |
Check if the values are equal | true | None |
spark.rapids.sql.expression.Exp | exp |
Euler's number e raised to a power | true | None |
spark.rapids.sql.expression.Expm1 | expm1 |
Euler's number e raised to a power minus 1 | true | None |
spark.rapids.sql.expression.Floor | floor |
Floor of a number | true | None |
spark.rapids.sql.expression.FromUnixTime | from_unixtime |
Get the string from a unix timestamp | true | None |
spark.rapids.sql.expression.GetArrayItem | Gets the field at ordinal in the Array |
true | None | |
spark.rapids.sql.expression.GreaterThan | > |
> operator | true | None |
spark.rapids.sql.expression.GreaterThanOrEqual | >= |
>= operator | true | None |
spark.rapids.sql.expression.Hour | hour |
Returns the hour component of the string/timestamp | true | None |
spark.rapids.sql.expression.If | if |
IF expression | true | None |
spark.rapids.sql.expression.In | in |
IN operator | true | None |
spark.rapids.sql.expression.InSet | INSET operator | true | None | |
spark.rapids.sql.expression.InitCap | initcap |
Returns str with the first letter of each word in uppercase. All other letters are in lowercase | false | This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see rapidsai/cudf#3132 Spark also only sees the space character as a word deliminator, but this uses more white space characters. |
spark.rapids.sql.expression.InputFileBlockLength | input_file_block_length |
Returns the length of the block being read, or -1 if not available | true | None |
spark.rapids.sql.expression.InputFileBlockStart | input_file_block_start |
Returns the start offset of the block being read, or -1 if not available | true | None |
spark.rapids.sql.expression.InputFileName | input_file_name |
Returns the name of the file being read, or empty string if not available | true | None |
spark.rapids.sql.expression.IntegralDivide | div |
Division with a integer result | true | None |
spark.rapids.sql.expression.IsNaN | isnan |
Checks if a value is NaN | true | None |
spark.rapids.sql.expression.IsNotNull | isnotnull |
Checks if a value is not null | true | None |
spark.rapids.sql.expression.IsNull | isnull |
Checks if a value is null | true | None |
spark.rapids.sql.expression.KnownFloatingPointNormalized | Tag to prevent redundant normalization | true | None | |
spark.rapids.sql.expression.LastDay | last_day |
Returns the last day of the month which the date belongs to | true | None |
spark.rapids.sql.expression.Length | length , character_length , char_length |
String character length | true | None |
spark.rapids.sql.expression.LessThan | < |
< operator | true | None |
spark.rapids.sql.expression.LessThanOrEqual | <= |
<= operator | true | None |
spark.rapids.sql.expression.Like | like |
Like | true | None |
spark.rapids.sql.expression.Literal | Holds a static value from the query | true | None | |
spark.rapids.sql.expression.Log | ln |
Natural log | true | None |
spark.rapids.sql.expression.Log10 | log10 |
Log base 10 | true | None |
spark.rapids.sql.expression.Log1p | log1p |
Natural log 1 + expr | true | None |
spark.rapids.sql.expression.Log2 | log2 |
Log base 2 | true | None |
spark.rapids.sql.expression.Logarithm | log |
Log variable base | true | None |
spark.rapids.sql.expression.Lower | lower , lcase |
String lowercase operator | false | This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see rapidsai/cudf#3132 |
spark.rapids.sql.expression.Minute | minute |
Returns the minute component of the string/timestamp | true | None |
spark.rapids.sql.expression.MonotonicallyIncreasingID | monotonically_increasing_id |
Returns monotonically increasing 64-bit integers | true | None |
spark.rapids.sql.expression.Month | month |
Returns the month from a date or timestamp | true | None |
spark.rapids.sql.expression.Multiply | * |
Multiplication | true | None |
spark.rapids.sql.expression.NaNvl | nanvl |
Evaluates to left iff left is not NaN, right otherwise |
true | None |
spark.rapids.sql.expression.Not | ! , not |
Boolean not operator | true | None |
spark.rapids.sql.expression.Or | or |
Logical OR | true | None |
spark.rapids.sql.expression.Pmod | pmod |
Pmod | true | None |
spark.rapids.sql.expression.Pow | pow , power |
lhs ^ rhs | true | None |
spark.rapids.sql.expression.PythonUDF | UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated. | true | None | |
spark.rapids.sql.expression.Quarter | quarter |
Returns the quarter of the year for date, in the range 1 to 4 | true | None |
spark.rapids.sql.expression.Rand | random , rand |
Generate a random column with i.i.d. uniformly distributed values in [0, 1) | true | None |
spark.rapids.sql.expression.RegExpReplace | regexp_replace |
RegExpReplace support for string literal input patterns | true | None |
spark.rapids.sql.expression.Remainder | % , mod |
Remainder or modulo | true | None |
spark.rapids.sql.expression.Rint | rint |
Rounds up a double value to the nearest double equal to an integer | true | None |
spark.rapids.sql.expression.RowNumber | row_number |
Window function that returns the index for the row within the aggregation window | true | None |
spark.rapids.sql.expression.Second | second |
Returns the second component of the string/timestamp | true | None |
spark.rapids.sql.expression.ShiftLeft | shiftleft |
Bitwise shift left (<<) | true | None |
spark.rapids.sql.expression.ShiftRight | shiftright |
Bitwise shift right (>>) | true | None |
spark.rapids.sql.expression.ShiftRightUnsigned | shiftrightunsigned |
Bitwise unsigned shift right (>>>) | true | None |
spark.rapids.sql.expression.Signum | sign , signum |
Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive | true | None |
spark.rapids.sql.expression.Sin | sin |
Sine | true | None |
spark.rapids.sql.expression.Sinh | sinh |
Hyperbolic sine | true | None |
spark.rapids.sql.expression.SortOrder | Sort order | true | None | |
spark.rapids.sql.expression.SparkPartitionID | spark_partition_id |
Returns the current partition id | true | None |
spark.rapids.sql.expression.SpecifiedWindowFrame | Specification of the width of the group (or "frame") of input rows around which a window function is evaluated | true | None | |
spark.rapids.sql.expression.Sqrt | sqrt |
Square root | true | None |
spark.rapids.sql.expression.StartsWith | Starts with | true | None | |
spark.rapids.sql.expression.StringLPad | lpad |
Pad a string on the left | true | None |
spark.rapids.sql.expression.StringLocate | position , locate |
Substring search operator | true | None |
spark.rapids.sql.expression.StringRPad | rpad |
Pad a string on the right | true | None |
spark.rapids.sql.expression.StringReplace | replace |
StringReplace operator | true | None |
spark.rapids.sql.expression.StringSplit | split |
Splits str around occurrences that match regex |
true | None |
spark.rapids.sql.expression.StringTrim | trim |
StringTrim operator | true | None |
spark.rapids.sql.expression.StringTrimLeft | ltrim |
StringTrimLeft operator | true | None |
spark.rapids.sql.expression.StringTrimRight | rtrim |
StringTrimRight operator | true | None |
spark.rapids.sql.expression.Substring | substr , substring |
Substring operator | true | None |
spark.rapids.sql.expression.SubstringIndex | substring_index |
substring_index operator | true | None |
spark.rapids.sql.expression.Subtract | - |
Subtraction | true | None |
spark.rapids.sql.expression.Tan | tan |
Tangent | true | None |
spark.rapids.sql.expression.Tanh | tanh |
Hyperbolic tangent | true | None |
spark.rapids.sql.expression.TimeAdd | Adds interval to timestamp | true | None | |
spark.rapids.sql.expression.TimeSub | Subtracts interval from timestamp | true | None | |
spark.rapids.sql.expression.ToDegrees | degrees |
Converts radians to degrees | true | None |
spark.rapids.sql.expression.ToRadians | radians |
Converts degrees to radians | true | None |
spark.rapids.sql.expression.ToUnixTimestamp | to_unix_timestamp |
Returns the UNIX timestamp of the given time | false | This is not 100% compatible with the Spark version because Incorrectly formatted strings and bogus dates produce garbage data instead of null |
spark.rapids.sql.expression.UnaryMinus | negative |
Negate a numeric value | true | None |
spark.rapids.sql.expression.UnaryPositive | positive |
A numeric value with a + in front of it | true | None |
spark.rapids.sql.expression.UnboundedFollowing$ | Special boundary for a window frame, indicating all rows preceding the current row | true | None | |
spark.rapids.sql.expression.UnboundedPreceding$ | Special boundary for a window frame, indicating all rows preceding the current row | true | None | |
spark.rapids.sql.expression.UnixTimestamp | unix_timestamp |
Returns the UNIX timestamp of current or specified time | false | This is not 100% compatible with the Spark version because Incorrectly formatted strings and bogus dates produce garbage data instead of null |
spark.rapids.sql.expression.Upper | upper , ucase |
String uppercase operator | false | This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see rapidsai/cudf#3132 |
spark.rapids.sql.expression.WeekDay | weekday |
Returns the day of the week (0 = Monday...6=Sunday) | true | None |
spark.rapids.sql.expression.WindowExpression | Calculates a return value for every input row of a table based on a group (or "window") of rows | true | None | |
spark.rapids.sql.expression.WindowSpecDefinition | Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window | true | None | |
spark.rapids.sql.expression.Year | year |
Returns the year from a date or timestamp | true | None |
spark.rapids.sql.expression.AggregateExpression | Aggregate expression | true | None | |
spark.rapids.sql.expression.Average | avg , mean |
Average aggregate operator | true | None |
spark.rapids.sql.expression.Count | count |
Count aggregate operator | true | None |
spark.rapids.sql.expression.First | first_value , first |
first aggregate operator | true | None |
spark.rapids.sql.expression.Last | last , last_value |
last aggregate operator | true | None |
spark.rapids.sql.expression.Max | max |
Max aggregate operator | true | None |
spark.rapids.sql.expression.Min | min |
Min aggregate operator | true | None |
spark.rapids.sql.expression.Sum | sum |
Sum aggregate operator | true | None |
spark.rapids.sql.expression.NormalizeNaNAndZero | Normalize NaN and zero | true | None |
CUDF can compile GPU kernels at runtime using a just-in-time (JIT) compiler. The
resulting kernels are cached on the filesystem. The default location for this cache is
under the .cudf
directory in the user's home directory. When running in an environment
where the user's home directory cannot be written, such as running in a container
environment on a cluster, the JIT cache path will need to be specified explicitly with
the LIBCUDF_KERNEL_CACHE_PATH
environment variable.
The specified kernel cache path should be specific to the user to avoid conflicts with
others running on the same host. For example, the following would specify the path to a
user-specific location under /tmp
:
--conf spark.executorEnv.LIBCUDF_KERNEL_CACHE_PATH="/tmp/cudf-$USER"