Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIKE] cases: list of ideas (related to prod envs) #2490

Closed
jorgeorpinel opened this issue May 19, 2021 · 20 comments
Closed

[SPIKE] cases: list of ideas (related to prod envs) #2490

jorgeorpinel opened this issue May 19, 2021 · 20 comments
Assignees
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: cases Content of /doc/use-cases p1-important Active priorities to deal within next sprints

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented May 19, 2021

Most of the existing ideas summarized here have something to do with ML models, I think.

Extracted from #820

UPDATE: Jump to #2490 (comment)

@jorgeorpinel jorgeorpinel added the A: docs Area: user documentation (gatsby-theme-iterative) label May 19, 2021
@jorgeorpinel jorgeorpinel added the ✨ epic Placeholder ticket for multi-sprint direction, use story, improvement label May 19, 2021
@jorgeorpinel jorgeorpinel changed the title cases: new directions cases: MLOps direction May 19, 2021
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 19, 2021

And a side question: is this direction a higher priority than Experiments-related use cases atm? (see #2270)

@shcheklein
Copy link
Member

Some thoughts:

  • CI/CD is taken care of by @casperdcl . It takes time to iterate but we will get there. And I think we'll be fine, at lease with this specific title.
  • Deploying models for real-time inference - yep, feel too narrow, need to find a better angle
  • Model zoo - is too high level concept I think (model zoo is close to a product). I think we can start with a model registry?

Some ideas for this list:

  • Model Management and/or Model Lifecycle - explain DVC from the models angle - we capture all information that is relevant to models - data, weights, metrics, experiments - and allow people to navigate
  • Model Registry - discovery and reusability
  • Experiments tracking/management - here we should sell W&B, MlFlow, etc - rapid iterations, live metrics + other metrics + navigation

@shcheklein

This comment has been minimized.

@jorgeorpinel jorgeorpinel self-assigned this May 19, 2021
@jorgeorpinel jorgeorpinel changed the title cases: MLOps direction [SPIKE] cases: MLOps direction May 19, 2021
@jorgeorpinel
Copy link
Contributor Author

OK we're going to try to make this into a spike to come up with actionable items within 7 days or less hopefully. Please help if you can guys. I'll tag people via chat... ⌛

@dberenbaum
Copy link
Contributor

It might help to start with thesis statements instead of topics. Thesis statements would be like single-sentence use cases arguing for the utility of the products in given scenarios. Use cases are more persuasive writing compared to the explanatory writing of other docs, so a topic may not clarify what we plan to say about it. This will probably take more time and debate, but hopefully we will have more clarity in deciding which use cases to pursue and in writing the use cases. What do you think?

@jorgeorpinel jorgeorpinel removed the ✨ epic Placeholder ticket for multi-sprint direction, use story, improvement label May 19, 2021
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 19, 2021

title is confusing cases: MLOps direction for me ... why not just - cases: list of use case to write ?

Because we also have cases: Experiments #2270. That seemed like a totally different direction from all the previous ideas summarized here (mainly from #820), which I think at least somewhat relate to MLOps? Happy to change the title but this is not the a comprehensive list of use case ideas in all possible product directions.

@jorgeorpinel jorgeorpinel changed the title [SPIKE] cases: MLOps direction [SPIKE] cases: next scenario to write May 19, 2021
@jorgeorpinel jorgeorpinel changed the title [SPIKE] cases: next scenario to write [SPIKE] cases: next scenario to write (ML model related?) May 19, 2021
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 19, 2021

Model Management and/or Model Lifecycle - explain DVC from the models angle - we capture all information that is relevant to models - data, weights, metrics, experiments - and allow people to navigate
Model Registry - discovery and reusability

I have a feeling that model registries aren't different enough from data registries to write another full use case on that. But maybe it can be part of a Model Mgmt/Lifecycle use case. I like that idea! It could also cover or mention some of the topics above (training remotely, deployment, real-time predictions).

@shcheklein
Copy link
Member

Happy to change the title but this is not the a comprehensive list of use case ideas in all possible product directions.

The way I initially understood the title cases: new directions and the meaning of this research is to consolidate all possible ideas (w/o this split - experiments, ml models - which is hard for me to understand tbh - e.g. why experiments are not about models?).

The title for that ticket you mention about experiments was about one specific use case to my mind.

I have a feeling that model registries aren't different enough from data registries to write another full use case on that.

it's a matter of what we are optimizing here. I would not be trying to generalize by sacrificing the initial goal - more people come, see the high level title that resonates with them . It's fine that they will overlap internally.

In this specific case - I think model registry can be significantly different.

@jorgeorpinel jorgeorpinel changed the title [SPIKE] cases: next scenario to write (ML model related?) [SPIKE] cases: next direction (ML models related) May 20, 2021
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 20, 2021

why experiments are not about models

Sure, it all connects. But here I'm thinking mostly about solutions for deploying and using ml models via DVC/CML e.g. production environments, model deployment, etc. Sorry for the confusion...

So it looks like so far the better-defined scenarios are

  1. synchronizing between development and production ml models (cases: DVC in Production #862)
  2. ml model registry (construction? usage?)
  3. ml model lifecycle/management (see [SPIKE] cases: list of ideas (related to prod envs) #2490 (comment))

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 20, 2021

It might help to start with thesis statements instead of topics — single-sentence use cases arguing for the utility of the products in given scenarios.

@dberenbaum

  1. you can use DVC and CML to deploy ml models to production, and sync back results/status with the master repo
  2. you can package and ship (pre-trained) ml models to a central registry and make DVC projects downstream that use and depend on them.
  3. DVC helps you develop and manage ml models throughout their whole lifecycle (needs detailing)

Keep in mind a) this is not my area of expertise and b) this is based on preliminary understanding of the proposals, so my explanations above may be inexact.

@dberenbaum
Copy link
Contributor

Thanks, @jorgeorpinel! I didn't mean to suggest that you should bear responsibility for developing each thesis statement, or that each one needs to be perfected.

1. you can use DVC and CML to deploy ml models to production, and sync back the model learning to your development env/team

We have a few use case ideas around "production" and/or "deployment," and it's not clear to me what they mean. There are different scenarios that I have seen described as production deployments:
a. Automated training: Run a scheduled, automated training pipeline to keep your model updated with the latest data (this seems to be #862). The retrained model might then be used for the scoring scenarios below.
b. Batch scoring: Run a scheduled, automated scoring pipeline to always have updated predictions.
c. Real-time scoring: Submit data as needed to an API that returns model scores (see #2431).

I'd probably vote to focus on b since a solution for c might not be fully developed yet. a could maybe be included as part of it if it's not too complex, but to me it's being covered by the CI/CD use case in development.

3\. DVC helps you develop and manage ml models throughout their whole lifecycle (needs detailing)

As @shcheklein has mentioned, this can either be about a single model or many models, which might be different use cases.

For a single model, track, visualize, and analyze everything about your experiment, including code, parameters, metrics, plots, data, training DAG, and any other artifacts included in your repo.

For many models, try many different experiments and track them, enabling you to compare, select, reproduce, and iterate on any experiments.

@jorgeorpinel jorgeorpinel added the p1-important Active priorities to deal within next sprints label May 26, 2021
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 26, 2021

a. Automated training

This could or could not be considered related to "in production". Training somewhere seems rather like a pre-requisite. I think it has more to do with CI/CD (which can be part of a prod deployment workflow, so there's overlap). This can probably be covered initially in #2404 indeed. Cc @casperdcl

b. Batch scoring

Is this basically ETL where E=get chunk of data, T=run pre-trained model, L=store/upload scores ? That could be part of a use case but may still not be high-level enough.

c. Real-time scoring

Not sure I get how DVC play a part in this. Probably just in the way to deploy the model (e.g. via the DVC API which would be similar to this -- going back to the "model registry" idea). Still not high-level enough IMO but b and c def. seem related.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 26, 2021

ml model lifecycle/management

this can either be about a single model or many models

Hmmm... By many models do you mean actually different models with different goals (would relate with "model registry'), or multiple versions of a same model in development? I usually assume the typical ML pipeline/project ends up in a single model.

BTW can we clarify what we mean by "model lifecycle"? Maybe training, active, inactive (related to "in production") or planning, data eng, modeling (much broader topic). Cc @shcheklein

initial goal - more people come, see the high level title that resonates with them . It's fine that they will overlap

Going back to this (which is why titles are important too), I think "DVC in Production" is a really good umbrella concept to begin with, keeping in mind it would be the first use case in this direction. It can have a story (maybe sections) that cover several of the scenarios we've discussed above. Later on we could split into multiple use cases if that's better. WDYT?

UPDATE: See quick draft (idea) in #2506

@dberenbaum
Copy link
Contributor

Is this basically ETL where E=get chunk of data, T=run pre-trained model, L=store/upload scores ? That could be part of a use case but may still not be high-level enough.

Yup, although T could include other things in your pipeline (feature engineering).

Not sure I get how DVC play a part in this. Probably just in the way to deploy the model (e.g. via the DVC API which would be similar to this -- going back to the "model registry" idea). Still not high-level enough IMO but b and c def. seem related.

Right, other than the model registry idea, there's not much of a clear pattern here for how to use DVC.

Hmmm... By many models do you mean actually different models with different goals (would relate with "model registry'), or multiple versions of a same model in development? I usually assume the typical ML pipeline/project ends up in a single model.

Sorry, I meant many experiments from the same pipeline.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented May 29, 2021

More feedback (from https://iterativeai.slack.com/archives/C6YHPP2TB/p1621617453043300):

From @mnrozhkov

  • Batch Scoring project use case: it’s a common for large companies like Telecoms, Banks & FinTech
  • for production running we could use Airflow

From @dmpetrov

☝️ From these comments I take 1) there's support for covering the "batch scoring" scenario, 2) there's interest in certain integrations, specifically Airflow (I need to play with it ⌛) -- maybe also MLFlow? and 3) an e2e case could be a meaningful way to present some of these topics.


Also, @shcheklein shared https://neptune.ai/blog/model-registry-makes-mlops-work with me (on the "model registry" idea). I think this answers the Q of how model registries relate to MLOps/ "in production". Summary:

collaborative hub where teams can work together at different stages of the ML lifecycle [from (after) experimentation to production]... allows to publish, test, monitor, govern and share [models]
all the key values (data, config, env, code, versions, and docs) are in one place

centralized tracking system that stores lineage, versioning, and related metadata for published ML models.
(1) provide a mechanism to store model metadata
(2) connect independent model training and inference processes by acting as a communication layer
[metadata:] identifier, name, desc?, version, date, performance, path to the serialized model, and stage of deployment (dev, shadow-mode, prod, etc.)

@jorgeorpinel jorgeorpinel changed the title [SPIKE] cases: next direction (ML models related) [SPIKE] cases: list of ideas (mostly related to production environments) May 29, 2021
@dberenbaum
Copy link
Contributor

Nice, @jorgeorpinel! The comments on batch scoring and model registry use cases look good to me.

there's interest in certain integrations, specifically Airflow (I need to play with it ⌛) -- maybe also MLFlow?

Yes to Airflow since it is the default choice for pipeline orchestration, although might be worth looking into some alternatives like prefect (see https://neptune.ai/blog/best-workflow-and-pipeline-orchestration-tools).

MLFlow is probably better left for the experiment management use case since its focus is on tracking and comparing experiments rather than executing pipelines.

@jorgeorpinel jorgeorpinel changed the title [SPIKE] cases: list of ideas (mostly related to production environments) [SPIKE] cases: list of ideas (related to prod envs) Jun 3, 2021
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Jun 3, 2021

Summary (again)

Here's a list proposal with 4 big ideas that group most of the concepts we've discussed (with overlaps):

DVC in Production (rel. #2506) (intro to MLOps)
Training remotely
Deploying models (CLI or API)
Keep pipelines, artifacts in sync between environments
Batch scoring a.k.a. "DVC for ETL"
+ Distributed computing
+ Parallel exec?

ML Model Registry
Model lifecycle (training, shadow, active, inactive)
Automated/Continuous training (remotely)
Discovery and reusability
Deploying models
Batch scoring example
+ Real-time inference

Production Integrations
Databases (e.g. SQL dump versioning/preprocessing)
Spark (e.g. remote training)
AirFlow (e.g. batch scoring)
Kafka (e.g. real-time predictions)

End-to-end scenario with a combination from above e.g.:
Importing (versioning?) data from Spark
(Automated) Training remotely
MLOps via Model Registry
Batch scoring (AirFlow integration)

@shcheklein
Copy link
Member

Thanks @jorgeorpinel ! Sounds good, what/where can we get the full list of uses case that we write/consider to write, etc? (I assume that this ticket is still about "prod envs"?

E.g. where should we put "Experiments tracking/management" / "ML bookkeeping" case, for example?

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Jun 6, 2021

All use case ideas we have in GH have been consolidated here (see original desc.) — we could even close some/all — except #2270 (an epic itself) and #2512 (new, discussing).

I should prob make an epic/story ticket to close this and maybe some of the other issues linked above ⌛

@jorgeorpinel jorgeorpinel mentioned this issue Jun 8, 2021
5 tasks
@jorgeorpinel
Copy link
Contributor Author

Resulting list of ideas: #2544

Closing spike.

@iesahin iesahin added the C: cases Content of /doc/use-cases label Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: cases Content of /doc/use-cases p1-important Active priorities to deal within next sprints
Projects
None yet
Development

No branches or pull requests

4 participants