-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding query 2, 4 and 5 #243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see any changes to Q02, Q04, Q05. Am i missing something?
I've added the changes now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have requested changes.
- We should use
sklearn/dask-mll CPU
models for CPU backend - The check should be appropriate for
cuDF
anddask-cudf
- We should test success on both CPU and GPU backends for success.
@@ -84,7 +87,11 @@ def main(data_dir, client, c, config): | |||
result = result.persist() | |||
|
|||
result = result.compute() | |||
result_df = cudf.DataFrame({"sum(pagecount)/count(*)": [result]}) | |||
|
|||
if isinstance(merged_df, cudf.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged_df is a dask_cudf.DataFrame no , so below check is invalid.
I would suggest testing with both backed=GPU/CPU
both so that we dont run into issues as were pointed out in PR #244
@@ -94,11 +97,16 @@ def build_and_predict_model(ml_input_df): | |||
results_dict = {} | |||
y_pred = model.predict(X) | |||
|
|||
results_dict["auc"] = roc_auc_score(y.values_host, y_pred.values_host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use cuML
models for both CPU and GPU, we should use sklearn
for CPU and cuML
for GPU.
No description provided.