Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support explode outer #2047

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
6653e3f
support explode with GenerateOuter
sperlingxx Mar 26, 2021
7c3ee85
fix typo
sperlingxx Mar 31, 2021
0b964f8
update doc
sperlingxx Apr 1, 2021
c5cba83
init branch-0.6 [skip ci] (#2124)
pxLi Apr 15, 2021
6f792dd
Merge pull request #2136 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
461dd01
Merge pull request #2137 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
2173c07
Merge pull request #2138 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
34d8c89
support truncate cuda version for image build [skip ci] (#2139)
pxLi Apr 15, 2021
de2573c
Merge pull request #2141 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
b3a0874
Merge pull request #2143 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
7c9db82
Merge pull request #2147 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
426f229
Merge pull request #2148 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
75cfd17
Merge pull request #2150 from NVIDIA/branch-0.5
nvauto Apr 15, 2021
543bf47
Restore CUDA classifier handling to Jenkins scripts (#2142)
jlowe Apr 16, 2021
ed3e702
Merge pull request #2158 from NVIDIA/branch-0.5
nvauto Apr 16, 2021
116668d
Merge pull request #2162 from NVIDIA/branch-0.5
nvauto Apr 16, 2021
8849be5
Merge pull request #2164 from NVIDIA/branch-0.5
nvauto Apr 16, 2021
3acbd23
Merge pull request #2165 from NVIDIA/branch-0.5
nvauto Apr 16, 2021
76a55c6
Merge pull request #2170 from NVIDIA/branch-0.5
nvauto Apr 17, 2021
52117e6
Merge pull request #2180 from NVIDIA/branch-0.5
nvauto Apr 19, 2021
4dc2076
Merge pull request #2182 from NVIDIA/branch-0.5
nvauto Apr 19, 2021
1bb3b8f
Merge pull request #2184 from NVIDIA/branch-0.5
nvauto Apr 19, 2021
954fffa
Merge pull request #2186 from NVIDIA/branch-0.5
nvauto Apr 19, 2021
d398e8b
Merge pull request #2187 from NVIDIA/branch-0.5
nvauto Apr 19, 2021
6a3c281
Merge pull request #2189 from NVIDIA/branch-0.5
nvauto Apr 19, 2021
d3775a1
Branch 0.5 doc update (#2175)
sameerz Apr 20, 2021
9150a05
Merge pull request #2194 from pxLi/m-0.5-to-0.6
pxLi Apr 20, 2021
dbc67cd
fix merge conflict for 0.5 doc
pxLi Apr 20, 2021
72e7814
Merge pull request #2197 from pxLi/fix-merge-conflict-from-0.5
pxLi Apr 20, 2021
9831c1e
fix merge conflict for udf doc from 0.5
pxLi Apr 20, 2021
e6fe915
Merge pull request #2199 from pxLi/fix-merge-conflict-2198
pxLi Apr 20, 2021
08a5bc1
Init scripts to install cuda11 runtime [skip ci] (#2185)
NvTimLiu Apr 20, 2021
931e6c5
Merge pull request #2201 from NVIDIA/branch-0.5
nvauto Apr 20, 2021
31dc4f6
Merge pull request #2202 from NVIDIA/branch-0.5
nvauto Apr 20, 2021
a974994
Merge pull request #2205 from NVIDIA/branch-0.5
nvauto Apr 20, 2021
c2eaa9e
Use CPM to fetch libcudf dependency for native UDF example build (#2191)
jlowe Apr 20, 2021
575f645
Merge pull request #2211 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
1193806
Merge pull request #2212 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
19c43d8
Merge pull request #2213 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
5cc049b
Merge remote-tracking branch 'origin/branch-0.6' into explode_outer
sperlingxx Apr 21, 2021
f389cbd
sign
sperlingxx Apr 21, 2021
dbb3a64
append sign
sperlingxx Apr 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions integration_tests/src/main/python/generate_expr_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,27 @@ def test_explode_nested_array_data(spark_tmp_path, data_gen):
'a', 'explode(b) as c').selectExpr('a', 'explode(c)'),
conf=conf_to_enforce_split_input)

#sort locally because of https://github.com/NVIDIA/spark-rapids/issues/84
# After 3.1.0 is the min spark version we can drop this
@ignore_order(local=True)
@pytest.mark.parametrize('data_gen', all_gen, ids=idfn)
def test_explode_outer_array_data(spark_tmp_path, data_gen):
data_gen = [int_gen, ArrayGen(data_gen)]
assert_gpu_and_cpu_are_equal_collect(
lambda spark: two_col_df(spark, *data_gen).selectExpr('a', 'explode_outer(b)'),
conf=conf_to_enforce_split_input)

#sort locally because of https://github.com/NVIDIA/spark-rapids/issues/84
# After 3.1.0 is the min spark version we can drop this
@ignore_order(local=True)
@pytest.mark.parametrize('data_gen', all_gen, ids=idfn)
def test_explode_outer_nested_array_data(spark_tmp_path, data_gen):
data_gen = [int_gen, ArrayGen(ArrayGen(data_gen))]
assert_gpu_and_cpu_are_equal_collect(
lambda spark: two_col_df(spark, *data_gen).selectExpr(
'a', 'explode_outer(b) as c').selectExpr('a', 'explode_outer(c)'),
conf=conf_to_enforce_split_input)


#sort locally because of https://github.com/NVIDIA/spark-rapids/issues/84
# After 3.1.0 is the min spark version we can drop this
Expand Down Expand Up @@ -108,3 +129,24 @@ def test_posexplode_nested_array_data(spark_tmp_path, data_gen):
lambda spark: two_col_df(spark, *data_gen).selectExpr(
'a', 'posexplode(b) as (pos, c)').selectExpr('a', 'pos', 'posexplode(c)'),
conf=conf_to_enforce_split_input)

#sort locally because of https://github.com/NVIDIA/spark-rapids/issues/84
# After 3.1.0 is the min spark version we can drop this
@ignore_order(local=True)
@pytest.mark.parametrize('data_gen', all_gen, ids=idfn)
def test_posexplode_outer_array_data(spark_tmp_path, data_gen):
data_gen = [int_gen, ArrayGen(data_gen)]
assert_gpu_and_cpu_are_equal_collect(
lambda spark: two_col_df(spark, *data_gen).selectExpr('a', 'posexplode_outer(b)'),
conf=conf_to_enforce_split_input)

#sort locally because of https://github.com/NVIDIA/spark-rapids/issues/84
# After 3.1.0 is the min spark version we can drop this
@ignore_order(local=True)
@pytest.mark.parametrize('data_gen', all_gen, ids=idfn)
def test_posexplode_nested_outer_array_data(spark_tmp_path, data_gen):
data_gen = [int_gen, ArrayGen(ArrayGen(data_gen))]
assert_gpu_and_cpu_are_equal_collect(
lambda spark: two_col_df(spark, *data_gen).selectExpr(
'a', 'posexplode_outer(b) as (pos, c)').selectExpr('a', 'pos', 'posexplode_outer(c)'),
conf=conf_to_enforce_split_input)
Original file line number Diff line number Diff line change
Expand Up @@ -342,8 +342,10 @@ case class GpuExplode(child: Expression) extends GpuExplodeBase {
require(inputBatch.numCols() - 1 == generatorOffset,
"Internal Error GpuExplode supports one and only one input attribute.")
val schema = resultSchema(GpuColumnVector.extractTypes(inputBatch), generatorOffset)
val explodeFun = (t: Table) =>
if (outer) t.explodeOuter(generatorOffset) else t.explode(generatorOffset)
withResource(GpuColumnVector.from(inputBatch)) { table =>
withResource(table.explode(generatorOffset)) { exploded =>
withResource(explodeFun(table)) { exploded =>
GpuColumnVector.from(exploded, schema)
}
}
Expand All @@ -362,8 +364,10 @@ case class GpuPosExplode(child: Expression) extends GpuExplodeBase {
"Internal Error GpuPosExplode supports one and only one input attribute.")
val schema = resultSchema(
GpuColumnVector.extractTypes(inputBatch), generatorOffset, includePos = true)
val explodePosFun = (t: Table) =>
if (outer) t.explodeOuterPosition(generatorOffset) else t.explodePosition(generatorOffset)
withResource(GpuColumnVector.from(inputBatch)) { table =>
withResource(table.explodePosition(generatorOffset)) { exploded =>
withResource(explodePosFun(table)) { exploded =>
GpuColumnVector.from(exploded, schema)
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2393,8 +2393,7 @@ object GpuOverrides {
GpuMakeDecimal(child, a.precision, a.scale, a.nullOnOverflow)
}),
expr[Explode](
"Given an input array produces a sequence of rows for each value in the array. "
+ "Explode with outer Generate is not supported under GPU runtime." ,
"Given an input array produces a sequence of rows for each value in the array.",
ExprChecks.unaryProject(
// Here is a walk-around representation, since multi-level nested type is not supported yet.
// related issue: https://github.com/NVIDIA/spark-rapids/issues/1901
Expand All @@ -2405,11 +2404,11 @@ object GpuOverrides {
TypeSig.commonCudfTypes + TypeSig.DECIMAL + TypeSig.NULL + TypeSig.ARRAY),
(TypeSig.ARRAY + TypeSig.MAP).nested(TypeSig.all)),
(a, conf, p, r) => new GeneratorExprMeta[Explode](a, conf, p, r) {
override val supportOuter: Boolean = true
revans2 marked this conversation as resolved.
Show resolved Hide resolved
override def convertToGpu(): GpuExpression = GpuExplode(childExprs(0).convertToGpu())
}),
expr[PosExplode](
"Given an input array produces a sequence of rows for each value in the array. "
+ "PosExplode with outer Generate is not supported under GPU runtime." ,
"Given an input array produces a sequence of rows for each value in the array.",
ExprChecks.unaryProject(
// Here is a walk-around representation, since multi-level nested type is not supported yet.
// related issue: https://github.com/NVIDIA/spark-rapids/issues/1901
Expand All @@ -2421,6 +2420,7 @@ object GpuOverrides {
TypeSig.ARRAY.nested(
TypeSig.commonCudfTypes + TypeSig.DECIMAL + TypeSig.NULL + TypeSig.ARRAY)),
(a, conf, p, r) => new GeneratorExprMeta[PosExplode](a, conf, p, r) {
override val supportOuter: Boolean = true
revans2 marked this conversation as resolved.
Show resolved Hide resolved
override def convertToGpu(): GpuExpression = GpuPosExplode(childExprs(0).convertToGpu())
}),
expr[CollectList](
Expand Down