-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cases: list of ideas #2544
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Cc @mnrozhkov I know you've worked quite a bit on this topic. So just pinging you here for visibility p.s. our docs use cases are not enterprise-level so far, rather high-level and short. If you'd be interested in drafting one around these topics using your existing material please lmk! |
Guys I'm giving this priority again per our current roadmap (now that #2587 is basically finished). I think Experiment Management is the most needed topic now, and along the lines @iesahin and I are working on (rel. #2548). But if anyone thinks another direction should have higher priority please comment. And if we agree on Exp Mgmt. What should be the spin? i.e. user perspective problem/solution and key concepts. I discussed briefly with @shcheklein and we think it could be centered around running and managing rapid iterations in DS projects (without Git overhead) and concepts bookkeeping, hyperparameters, metrics, visualization. What do you think? Cc @dberenbaum @flippedcoder @jendefig @casperdcl @tapadipti @dmpetrov @pmrowla |
Bookkeeping + visualization seems the most relevant path to follow. Something along the lines of "push experiments to a central repository and see their comparative plots." |
Some ideas for 3 (re - production environments/ MLOps)
From https://megagon.ai/blog/whatmlflowsolvesanddoesntforus/ |
Interesting diagram inspiration for 1.3 or 1.4
|
|
1. Data Management
2. Data Pipeline development
3. Experiment Management
Preliminary ideas:
Hyperspace exploration [Tuning/Optimization] ? May be too low levelThere's a blog about this now.BookkeepingTracking (with Git): Rapid iterations. UPDATE: cases: Data Science Experiment Tracking #2782exp
+machine
+CML?)4. Production environments/ MLOps
4.1 DVC in Production
Training remotely
Deploying models (CLI or API)
Keep pipelines, artifacts in sync between environments
Batch scoring a.k.a. "DVC for ETL" - see #2512 (comment)
+ Distributed/parallel computing
4.2 ML Model Registry
Model lifecycle (training, shadow, active, inactive)
Automated/Continuous training (remotely)
Discovery and reusability
Deploying models
Batch scoring example
+ Real-time inference
4.3 Production Integrations
Databases (e.g. SQL dump versioning/preprocessing)
Spark (e.g. remote training)
AirFlow (e.g. batch scoring)
Kafka (e.g. real-time predictions)
4.4 End-to-end scenario with a combination from above, e.g.:
Importing data from Spark
Training remotely
Model Registry Ops
Batch scoring (AirFlow integration)
The text was updated successfully, but these errors were encountered: