-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6841] [SPARKR] add support for mean, median, stdev etc. #5446
Conversation
stats.py
in Python, add support for mean, median, stdev etc.
Jenkins, ok to test |
Test build #30007 has finished for PR 5446 at commit
|
This update implements the describe() as DataFrame API as comments suggested (https://issues.apache.org/jira/browse/SPARK-6841?jql=text%20~%20%22sparkr%22). We could add this to the RDD API in the future if we find a need. Thus, some functions also don't need to be implemented like "histogram", "sampleStdev", "sampleVariance" now, since these still don't exist in DataFrame API of spark. |
Thanks @hqzizania -- Could you add a unit test for this ? cc @rxin |
Jenkins, ok to test |
Test build #31906 has finished for PR 5446 at commit
|
@shaneknapp this seems to have hit the same error from maven |
@hqzizania we'll also need to bring this up to date with the latest master |
yep. i'll take worker 03 offline until i can get a chance to investigate Thanks for the catch! @shaneknapp https://github.com/shaneknapp this seems to have hit the same — |
Jenkins, ok to test |
Jenkins, add to whitelist |
Test build #31933 has finished for PR 5446 at commit
|
Thanks. merging in master. |
Moving here from amplab-extras/SparkR-pkg#241 sum() has been implemented. (amplab-extras/SparkR-pkg#242) Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841 Author: qhuang <[email protected]> Closes #5446 from hqzizania/R and squashes the following commits: f283572 [qhuang] add test unit for describe() 2e74d5a [qhuang] add describe() DataFrame API (cherry picked from commit a466944) Signed-off-by: Reynold Xin <[email protected]>
Moving here from amplab-extras/SparkR-pkg#241 sum() has been implemented. (amplab-extras/SparkR-pkg#242) Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841 Author: qhuang <[email protected]> Closes apache#5446 from hqzizania/R and squashes the following commits: f283572 [qhuang] add test unit for describe() 2e74d5a [qhuang] add describe() DataFrame API
Moving here from amplab-extras/SparkR-pkg#241 sum() has been implemented. (amplab-extras/SparkR-pkg#242) Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841 Author: qhuang <[email protected]> Closes apache#5446 from hqzizania/R and squashes the following commits: f283572 [qhuang] add test unit for describe() 2e74d5a [qhuang] add describe() DataFrame API
Moving here from amplab-extras/SparkR-pkg#241 sum() has been implemented. (amplab-extras/SparkR-pkg#242) Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841 Author: qhuang <[email protected]> Closes apache#5446 from hqzizania/R and squashes the following commits: f283572 [qhuang] add test unit for describe() 2e74d5a [qhuang] add describe() DataFrame API
Moving here from amplab-extras/SparkR-pkg#241
sum() has been implemented. (amplab-extras/SparkR-pkg#242)
Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841