-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7383][ML] Feature Parity in PySpark for ml.features #5991
Conversation
Merged build triggered. |
Merged build started. |
Test build #32156 has started for PR 5991 at commit |
Test build #32156 has finished for PR 5991 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Traceback (most recent call last): | ||
... | ||
TypeError: Method setParams forces keyword arguments. | ||
>>> df = sc.parallelize([Row(values=0.5)]).toDF() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I'm not sure which one is the recommended approach to create a DataFrame. @rxin
df = sc.parallelize([Row(values=0.5)]).toDF()
vs.
df = sqlContext.createDataFrame([(0.5,)], ["values"])
# don't need to import Row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the 2nd approach
@brkyvz Thanks for working on this! It looks good except the variable naming in the doctests. It seems that |
Merged build triggered. |
Merged build started. |
Test build #32240 has started for PR 5991 at commit |
LGTM pending Jenkins. |
Test build #32240 has finished for PR 5991 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Merged into master and branch-1.4. Thanks! |
Implemented python wrappers for Scala functions that don't exist in `ml.features` Author: Burak Yavuz <[email protected]> Closes #5991 from brkyvz/ml-feat-PR and squashes the following commits: adcca55 [Burak Yavuz] add regex tokenizer to __all__ b91cb44 [Burak Yavuz] addressed comments bd39fd2 [Burak Yavuz] remove addition b82bd7c [Burak Yavuz] Parity in PySpark for ml.features (cherry picked from commit f5ff4a8) Signed-off-by: Xiangrui Meng <[email protected]>
Implemented python wrappers for Scala functions that don't exist in `ml.features` Author: Burak Yavuz <[email protected]> Closes apache#5991 from brkyvz/ml-feat-PR and squashes the following commits: adcca55 [Burak Yavuz] add regex tokenizer to __all__ b91cb44 [Burak Yavuz] addressed comments bd39fd2 [Burak Yavuz] remove addition b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
Implemented python wrappers for Scala functions that don't exist in `ml.features` Author: Burak Yavuz <[email protected]> Closes apache#5991 from brkyvz/ml-feat-PR and squashes the following commits: adcca55 [Burak Yavuz] add regex tokenizer to __all__ b91cb44 [Burak Yavuz] addressed comments bd39fd2 [Burak Yavuz] remove addition b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
Implemented python wrappers for Scala functions that don't exist in `ml.features` Author: Burak Yavuz <[email protected]> Closes apache#5991 from brkyvz/ml-feat-PR and squashes the following commits: adcca55 [Burak Yavuz] add regex tokenizer to __all__ b91cb44 [Burak Yavuz] addressed comments bd39fd2 [Burak Yavuz] remove addition b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
Implemented python wrappers for Scala functions that don't exist in
ml.features