-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrations: sklearn #5
Comments
I think it's easy enough for users to add integrations as needed (or for the dvc team to add them in response to demand), so it's probably not worthwhile to spend time adding more now. How do we plan to handle dependencies for multiple frameworks? Each supported framework is pretty heavy, and I think it's unreasonable already to expect an XGBoost user to install Tensorflow to use dvclive. Similar concerns would apply for dvcx. |
See #25 for more discussion of dependency management. |
@dberenbaum As to installation, you are right, we do it already in |
On second thought here, is it worthwhile to add sklearn integration? Since this is such a large framework, integration may be more complex, and if you have an opinion about how to implement it, probably better to add the integration now than wait for contributions. Even if it means implementing one particular model or class of models, it may be a worthwhile template. Thoughts? |
Makes sense, I will get to that once I am done with supporting |
sklearn is largely not focused on deep learning, which has been the primary use case for dvclive. Should other algorithms be supported? If the primary purpose is to track model training progress, it seems only useful where models are trained iteratively. I only know of a couple of classes of algorithms where this is true:
|
@dberenbaum Yes, after digging through documentation, it seems to me that in general, learning algorithms divide to those which utilize The only place I could probably see some integration is methods accepting |
I am considering to work on the integration with |
I added an integration with mmcv: |
@daavoo Thats a great news! Can we do something to help with that pull request? |
It has been already approved so I think it will be merged soon, thanks! |
I think it might be a good idea to have separated issues for each integration in order to better track the progress and have specific discussions for each one (i.e. this issue got "populated" by specific I.e: #83 |
@daavoo That is right, in the beggining we intended it to be an umbrella issue, since singular implementations seemed like easy tasks. As For future reference: |
Reviving this as I think that Taking a quick look at our example repositories using sklearn (https://github.com/iterative/example-get-started), it looks that it would be a low-hanging fruit to add some utility to go from ( Given that example repo, we would be removing quite a few lines for users: # Given labels, predictions
precision, recall, prc_thresholds = metrics.precision_recall_curve(labels, predictions)
fpr, tpr, roc_thresholds = metrics.roc_curve(labels, predictions)
# ROC has a drop_intermediate arg that reduces the number of points.
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve.
# PRC lacks this arg, so we manually reduce to 1000 points as a rough estimate.
nth_point = math.ceil(len(prc_thresholds) / 1000)
prc_points = list(zip(precision, recall, prc_thresholds))[::nth_point]
with open(prc_file, "w") as fd:
json.dump(
{
"prc": [
{"precision": p, "recall": r, "threshold": t}
for p, r, t in prc_points
]
},
fd,
indent=4,
)
with open(roc_file, "w") as fd:
json.dump(
{
"roc": [
{"fpr": fp, "tpr": tp, "threshold": t}
for fp, tp, t in zip(fpr, tpr, roc_thresholds)
]
},
fd,
indent=4,
) To: from dvclive.sklearn import log_precision_recall_curve, log_roc_curve
log_precision_recall_curve(labels, predictions)
log_roc_curve(labels, predictions) |
Seems we should be supporting at least few popular frameworks.
Considering their popularity, we should probably start with:
sklearnWorth considering:
FastAi- integrations: fastai #136TF
andPyTorch
- it seems to me that using their pure form is done when users need highly custom models, and probably in that cases they will be able to handledvclive
by hand.@dmpetrov did I miss some popular framework?
EDIT:
crossing out FastAi as it has its own issue now
The text was updated successfully, but these errors were encountered: