Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6841] [SPARKR] add support for mean, median, stdev etc. #5446

Closed
wants to merge 2 commits into from

Conversation

hqzizania
Copy link
Contributor

Moving here from amplab-extras/SparkR-pkg#241
sum() has been implemented. (amplab-extras/SparkR-pkg#242)

Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841

@hqzizania hqzizania changed the title [SPARK-6841] Similar to stats.py in Python, add support for mean, median, stdev etc. [SPARK-6841] [SPARKR] add support for mean, median, stdev etc. Apr 10, 2015
@shivaram
Copy link
Contributor

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30007 has finished for PR 5446 at commit f1a1455.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@hqzizania
Copy link
Contributor Author

This update implements the describe() as DataFrame API as comments suggested (https://issues.apache.org/jira/browse/SPARK-6841?jql=text%20~%20%22sparkr%22). We could add this to the RDD API in the future if we find a need. Thus, some functions also don't need to be implemented like "histogram", "sampleStdev", "sampleVariance" now, since these still don't exist in DataFrame API of spark.

@shivaram
Copy link
Contributor

shivaram commented May 5, 2015

Thanks @hqzizania -- Could you add a unit test for this ?

cc @rxin

@shivaram
Copy link
Contributor

shivaram commented May 5, 2015

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31906 has finished for PR 5446 at commit 7e73d89.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented May 5, 2015

@shaneknapp this seems to have hit the same error from maven

@shivaram
Copy link
Contributor

shivaram commented May 5, 2015

@hqzizania we'll also need to bring this up to date with the latest master

@shaneknapp
Copy link
Contributor

yep. i'll take worker 03 offline until i can get a chance to investigate
(i'm OOO this afternoon).

Thanks for the catch!

@shaneknapp https://github.com/shaneknapp this seems to have hit the same
error from maven


Reply to this email directly or view it on GitHub
#5446 (comment).

@hqzizania hqzizania reopened this May 6, 2015
@shivaram
Copy link
Contributor

shivaram commented May 6, 2015

Jenkins, ok to test

@shivaram
Copy link
Contributor

shivaram commented May 6, 2015

Jenkins, add to whitelist

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #31933 has finished for PR 5446 at commit f283572.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 6, 2015

Thanks. merging in master.

asfgit pushed a commit that referenced this pull request May 6, 2015
Moving here from amplab-extras/SparkR-pkg#241
sum() has been implemented. (amplab-extras/SparkR-pkg#242)

Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841

Author: qhuang <[email protected]>

Closes #5446 from hqzizania/R and squashes the following commits:

f283572 [qhuang] add test unit for describe()
2e74d5a [qhuang] add describe() DataFrame API

(cherry picked from commit a466944)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in a466944 May 6, 2015
@hqzizania hqzizania deleted the R branch May 6, 2015 03:46
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Moving here from amplab-extras/SparkR-pkg#241
sum() has been implemented. (amplab-extras/SparkR-pkg#242)

Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841

Author: qhuang <[email protected]>

Closes apache#5446 from hqzizania/R and squashes the following commits:

f283572 [qhuang] add test unit for describe()
2e74d5a [qhuang] add describe() DataFrame API
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Moving here from amplab-extras/SparkR-pkg#241
sum() has been implemented. (amplab-extras/SparkR-pkg#242)

Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841

Author: qhuang <[email protected]>

Closes apache#5446 from hqzizania/R and squashes the following commits:

f283572 [qhuang] add test unit for describe()
2e74d5a [qhuang] add describe() DataFrame API
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Moving here from amplab-extras/SparkR-pkg#241
sum() has been implemented. (amplab-extras/SparkR-pkg#242)

Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841

Author: qhuang <[email protected]>

Closes apache#5446 from hqzizania/R and squashes the following commits:

f283572 [qhuang] add test unit for describe()
2e74d5a [qhuang] add describe() DataFrame API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants