SPARK-1330 removed extra echo from comput_classpath.sh #241

tgravescs · 2014-03-26T20:38:46Z

remove the extra echo which prevents spark-class from working. Note that I did not update the comment above it, which is also wrong because I'm not sure what it should do.

Should hive only be included if explicitly built with sbt hive/assembly or should sbt assembly build it?

tgravescs · 2014-03-26T20:40:22Z

@marmbrus do you know what should happen during the sbt assembly?

AmplabJenkins · 2014-03-26T21:12:49Z

Merged build triggered.

AmplabJenkins · 2014-03-26T21:12:49Z

Merged build started.

aarondav · 2014-03-26T21:30:34Z

The full patch for the assembly issue is here: #237

It includes this change (but just outputs to stderr instead), but I think we should go ahead and merge this since the other one may be a couple more iterations until being merged.

AmplabJenkins · 2014-03-26T22:12:33Z

Merged build finished.

AmplabJenkins · 2014-03-26T22:12:33Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13477/

sryza · 2014-03-26T22:56:42Z

bin/compute-classpath.sh

@@ -36,7 +36,6 @@ CLASSPATH="$SPARK_CLASSPATH:$FWDIR/conf"
 # Hopefully we will find a way to avoid uber-jars entirely and deploy only the needed packages in
 # the future.
 if [ -f "$FWDIR"/sql/hive/target/scala-$SCALA_VERSION/spark-hive-assembly-*.jar ]; then
-  echo "Hive assembly found, including hive support.  If this isn't desired run sbt hive/clean."


Would it make sense to send this to stderr?

#237 will do this anyway

tgravescs · 2014-03-27T13:22:36Z

I agree, I think we should merge this in if #237 isn't going in today.

@aarondav If you are ok with this patch I'll merge it in.

aarondav · 2014-03-27T15:50:10Z

Yup, LGTM
On Mar 27, 2014 6:22 AM, "Tom Graves" [email protected] wrote:

I agree, I think we should merge this in if #237 https://github.com/apache/spark/pull/237isn't going in today.

@aarondav https://github.com/aarondav If you are ok with this patch
I'll merge it in.

Reply to this email directly or view it on GitHubhttps://github.com//pull/241#issuecomment-38801213
.

tgravescs · 2014-03-27T16:56:57Z

I committed this.

Fix race condition in JobLoggerSuite [0.8 branch] I found this when running the tests locally. It's similar to a race condition found when making the 0.8.0 release.

Update broken links and add HDP 2.0 version string I ran a link checker on the UI and found several broken links. (cherry picked from commit 1f4a4bc) Signed-off-by: Patrick Wendell <[email protected]>

remove the extra echo which prevents spark-class from working. Note that I did not update the comment above it, which is also wrong because I'm not sure what it should do. Should hive only be included if explicitly built with sbt hive/assembly or should sbt assembly build it? Author: Thomas Graves <[email protected]> Closes apache#241 from tgravescs/SPARK-1330 and squashes the following commits: b10d708 [Thomas Graves] SPARK-1330 removed extra echo from comput_classpath.sh

## What changes were proposed in this pull request? This is a backport of apache@31da755 In current Spark we could add customized SparkListener through `SparkContext#addListener` API, but there's no equivalent API to remove the registered one. In our scenario SparkListener will be added repeatedly accordingly to the changed environment. If lacks the ability to remove listeners, there might be many registered listeners finally, this is unnecessary and potentially affects the performance. So here propose to add an API to remove registered listener. ## How was this patch tested? Add an unit test to verify it. Author: jerryshao <[email protected]> Closes apache#241 from ala/remove-spark-listener.

Would use 2.6.7 everywhere but upstream released a 2.6.7.1 for jackson-databind but not a corresponding 2.6.7 for the rest of jackson, so those remain on 2.6.7 This requires splitting the version variable in /pom.xml

…pache#241) * working on new spam/ham example * basic NB model builder * messy, but have end-to-end model server working * add jobs README, re-arrange spam classifier * add documentation to scala jobs * added secrets docs * added secure TeraSort walkthrough * wip, run ci, add keytabs to executors also * fix executorenv property string * add flag for sasl secret * finished ml pipeline walkthrough * don't test on the training data * added code explanation * addressed comments, updated test_rpc_auth test * add link to walkthroughs * addressed documentation comments * addressed comments in SpamHam.scala example * added unit test for sasl secret

Add install-k8s role

### What changes were proposed in this pull request? Push down filter through expand. For case below: ``` create table t1(pid int, uid int, sid int, dt date, suid int) using parquet; create table t2(pid int, vs int, uid int, csid int) using parquet; SELECT years, appversion, SUM(uusers) AS users FROM (SELECT Date_trunc('year', dt) AS years, CASE WHEN h.pid = 3 THEN 'iOS' WHEN h.pid = 4 THEN 'Android' ELSE 'Other' END AS viewport, h.vs AS appversion, Count(DISTINCT u.uid) AS uusers ,Count(DISTINCT u.suid) AS srcusers FROM t1 u join t2 h ON h.uid = u.uid GROUP BY 1, 2, 3) AS a WHERE viewport = 'iOS' GROUP BY 1, 2 ``` Plan. before this pr: ``` == Physical Plan == *(5) HashAggregate(keys=[years#30, appversion#32], functions=[sum(uusers#33L)]) +- Exchange hashpartitioning(years#30, appversion#32, 200), true, [id=#251] +- *(4) HashAggregate(keys=[years#30, appversion#32], functions=[partial_sum(uusers#33L)]) +- *(4) HashAggregate(keys=[date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#45, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46, vs#12], functions=[count(if ((gid#44 = 1)) u.`uid`#47 else null)]) +- Exchange hashpartitioning(date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#45, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46, vs#12, 200), true, [id=#246] +- *(3) HashAggregate(keys=[date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#45, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46, vs#12], functions=[partial_count(if ((gid#44 = 1)) u.`uid`#47 else null)]) +- *(3) HashAggregate(keys=[date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#45, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46, vs#12, u.`uid`#47, u.`suid`#48, gid#44], functions=[]) +- Exchange hashpartitioning(date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#45, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46, vs#12, u.`uid`#47, u.`suid`#48, gid#44, 200), true, [id=#241] +- *(2) HashAggregate(keys=[date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#45, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46, vs#12, u.`uid`#47, u.`suid`#48, gid#44], functions=[]) +- *(2) Filter (CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46 = iOS) +- *(2) Expand [ArrayBuffer(date_trunc(year, cast(dt#9 as timestamp), Some(Etc/GMT+7)), CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END, vs#12, uid#7, null, 1), ArrayBuffer(date_trunc(year, cast(dt#9 as timestamp), Some(Etc/GMT+7)), CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END, vs#12, null, suid#10, 2)], [date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#45, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#46, vs#12, u.`uid`#47, u.`suid`#48, gid#44] +- *(2) Project [uid#7, dt#9, suid#10, pid#11, vs#12] +- *(2) BroadcastHashJoin [uid#7], [uid#13], Inner, BuildRight :- *(2) Project [uid#7, dt#9, suid#10] : +- *(2) Filter isnotnull(uid#7) : +- *(2) ColumnarToRow : +- FileScan parquet default.t1[uid#7,dt#9,suid#10] Batched: true, DataFilters: [isnotnull(uid#7)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-bin-hadoop3.2/spark-warehouse/t1], PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct<uid:int,dt:date,suid:int> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[2, int, true] as bigint))), [id=#233] +- *(1) Project [pid#11, vs#12, uid#13] +- *(1) Filter isnotnull(uid#13) +- *(1) ColumnarToRow +- FileScan parquet default.t2[pid#11,vs#12,uid#13] Batched: true, DataFilters: [isnotnull(uid#13)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-bin-hadoop3.2/spark-warehouse/t2], PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct<pid:int,vs:int,uid:int> ``` Plan. after. this pr. : ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[years#0, appversion#2], functions=[sum(uusers#3L)], output=[years#0, appversion#2, users#5L]) +- Exchange hashpartitioning(years#0, appversion#2, 5), true, [id=#71] +- HashAggregate(keys=[years#0, appversion#2], functions=[partial_sum(uusers#3L)], output=[years#0, appversion#2, sum#22L]) +- HashAggregate(keys=[date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12], functions=[count(distinct uid#7)], output=[years#0, appversion#2, uusers#3L]) +- Exchange hashpartitioning(date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12, 5), true, [id=#67] +- HashAggregate(keys=[date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12], functions=[partial_count(distinct uid#7)], output=[date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12, count#27L]) +- HashAggregate(keys=[date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12, uid#7], functions=[], output=[date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12, uid#7]) +- Exchange hashpartitioning(date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12, uid#7, 5), true, [id=#63] +- HashAggregate(keys=[date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles)) AS date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END AS CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12, uid#7], functions=[], output=[date_trunc(year, cast(dt#9 as timestamp), Some(America/Los_Angeles))#23, CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END#24, vs#12, uid#7]) +- Project [uid#7, dt#9, pid#11, vs#12] +- BroadcastHashJoin [uid#7], [uid#13], Inner, BuildRight, false :- Filter isnotnull(uid#7) : +- FileScan parquet default.t1[uid#7,dt#9] Batched: true, DataFilters: [isnotnull(uid#7)], Format: Parquet, Location: InMemoryFileIndex[file:/private/var/folders/4l/7_c5c97s1_gb0d9_d6shygx00000gn/T/warehouse-c069d87..., PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct<uid:int,dt:date> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[2, int, false] as bigint)),false), [id=#58] +- Filter ((CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END = iOS) AND isnotnull(uid#13)) +- FileScan parquet default.t2[pid#11,vs#12,uid#13] Batched: true, DataFilters: [(CASE WHEN (pid#11 = 3) THEN iOS WHEN (pid#11 = 4) THEN Android ELSE Other END = iOS), isnotnull..., Format: Parquet, Location: InMemoryFileIndex[file:/private/var/folders/4l/7_c5c97s1_gb0d9_d6shygx00000gn/T/warehouse-c069d87..., PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct<pid:int,vs:int,uid:int> ``` ### Why are the changes needed? Improve performance, filter more data. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #30278 from AngersZhuuuu/SPARK-33302. Authored-by: angerszhu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

SPARK-1330 removed extra echo from comput_classpath.sh

b10d708

sryza reviewed Mar 26, 2014
View reviewed changes

asfgit closed this in 426042a Mar 27, 2014

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request May 8, 2017

Build with sbt and fix scalastyle checks. (apache#241)

ba151c0

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Build with sbt and fix scalastyle checks. (apache#241)

f09bf4a

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#241 from mrhillsman/installk8s

a52f7af

Add install-k8s role

wangyum mentioned this pull request Aug 19, 2020

[SPARK-32444][SQL] Infer filters from DPP #29243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-1330 removed extra echo from comput_classpath.sh #241

SPARK-1330 removed extra echo from comput_classpath.sh #241

tgravescs commented Mar 26, 2014

tgravescs commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

aarondav commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

sryza Mar 26, 2014

aarondav Mar 26, 2014

tgravescs commented Mar 27, 2014

aarondav commented Mar 27, 2014

tgravescs commented Mar 27, 2014

SPARK-1330 removed extra echo from comput_classpath.sh #241

SPARK-1330 removed extra echo from comput_classpath.sh #241

Conversation

tgravescs commented Mar 26, 2014

tgravescs commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

aarondav commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

AmplabJenkins commented Mar 26, 2014

sryza Mar 26, 2014

Choose a reason for hiding this comment

aarondav Mar 26, 2014

Choose a reason for hiding this comment

tgravescs commented Mar 27, 2014

aarondav commented Mar 27, 2014

tgravescs commented Mar 27, 2014