You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following query needs to scan the table nation multiple times.
select n_regionkey , n_name, '0'as t, avg(n_nationkey) +count(n_name) as s from nation where n_nationkey % 3=0group by n_name, n_regionkey
union allselect n_regionkey , n_name, '1'as t, avg(n_nationkey) +count(n_name) as s from nation where n_nationkey % 2=1group by n_name, n_regionkey
Following is the physical plan
AdaptiveSparkPlan isFinalPlan=false
+- Union
:- HashAggregate(keys=[n_name#5, _groupingexpression#24L], functions=[avg(cast(n_nationkey#4L as double)), count(n_name#5)], output=[(n_regionkey + 1)#18L, n_name#5, t#0, s#1])
: +- Exchange hashpartitioning(n_name#5, _groupingexpression#24L, 5), ENSURE_REQUIREMENTS, [plan_id=38]
: +- HashAggregate(keys=[n_name#5, _groupingexpression#24L], functions=[partial_avg(cast(n_nationkey#4L as double)), partial_count(n_name#5)], output=[n_name#5, _groupingexpression#24L, sum#29, count#30L, count#31L])
: +- Project [n_nationkey#4L, n_name#5, (n_regionkey#6L + cast(1 as bigint)) AS _groupingexpression#24L]
: +- Filter (isnotnull(n_nationkey#4L) AND ((n_nationkey#4L % cast(2 as bigint)) = cast(0 as bigint)))
: +- FileScan parquet tpch_pq.nation[n_nationkey#4L,n_name#5,n_regionkey#6L]
+- HashAggregate(keys=[n_name#9, _groupingexpression#25L], functions=[avg(cast(n_nationkey#8L as double)), count(n_name#9)], output=[(n_regionkey + 1)#19L, n_name#9, t#2, s#3])
+- Exchange hashpartitioning(n_name#9, _groupingexpression#25L, 5), ENSURE_REQUIREMENTS, [plan_id=40]
+- HashAggregate(keys=[n_name#9, _groupingexpression#25L], functions=[partial_avg(cast(n_nationkey#8L as double)), partial_count(n_name#9)], output=[n_name#9, _groupingexpression#25L, sum#35, count#36L, count#37L])
+- Project [n_nationkey#8L, n_name#9, (n_regionkey#10L + cast(1 as bigint)) AS _groupingexpression#25L]
+- Filter (isnotnull(n_nationkey#8L) AND ((n_nationkey#8L % cast(2 as bigint)) = cast(1 as bigint)))
+- FileScan parquet tpch_pq.nation[n_nationkey#8L,n_name#9,n_regionkey#10L]
When the table is too large, the scan takes up most of the execution time.
We could rewrite the query as following, to make nation be scanned only once.
select n_regionkey, n_name, t, avg(n_nationkey) +count(n_name) from (
selectst.n_regionkeyas n_regionkey, st.n_nameas n_name, st.tas t, st.n_nationkeyas n_nationkey, st.gidas gid from (
select st from (
select explode(a) as st from (
select array(
if( n_nationkey % 3=0, named_struct('n_regionkey', n_regionkey, 'n_nationkey', n_nationkey, 'n_name', n_name, 't', '0', 'gid', 0), null),
if(n_nationkey % 2, named_struct('n_regionkey', n_regionkey, 'n_nationkey', n_nationkey, 'n_name', n_name, 't', '1', 'gid', 1), null)) as a from (
select*from nation where n_nationkey % 3=0or n_nationkey % 2
)
)
) where st is not null
)
) group by n_regionkey, n_nationkey, t, gid
The second query is more difficult to write then the first one.
The text was updated successfully, but these errors were encountered:
Description
Following query needs to scan the table
nation
multiple times.Following is the physical plan
When the table is too large, the
scan
takes up most of the execution time.We could rewrite the query as following, to make
nation
be scanned only once.The second query is more difficult to write then the first one.
The text was updated successfully, but these errors were encountered: