-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model CI #9002
Model CI #9002
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @Superjomn for considering a model testing solution. Is this a work in progress?
@@ -0,0 +1,47 @@ | |||
# Model CI | |||
|
|||
A simple Continuous Interation for Models, tracking the overall effect and performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan to run this CI? Are we going to bridge this CI with TeamCity, or set it up as a new configuration on Travis-CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the beginning, this is just a bunch of scripts, no relation with TeamCity or other CI platforms.
It might be integrated with TeamCity latter, but currently, just plan to be a while-loop process which keeps testing the last merged code and tracking the performance and precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on it, seems just a few days work. It will keep clear in the begging, and more factors that need to track can be added later.
But it needs some computation resources, such as a free GPU machine, to test several classical models both in CPU and GPU mode (single card).
@wangkuiyi
WIP, will reopen latter. |
@@ -0,0 +1,47 @@ | |||
# Model CI | |||
|
|||
A simple Continuous Interation for Models, tracking the overall effect and performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the model integration job needs to collect indicators in three aspects. The model evaluation(i.e. loss, accuracy), speed and memory cost.
For our regression test, we only need to run several batches(say 100 batches) when the training process is stable, then the speed and memory cost data are collected. We also can compare the first several batch losses to validate the model evaluation is correct.
There some problems need to be discussed.
-
The GPU/CUDA Version difference
Should we consider the different GPU/CUDA machine in regression test? Given a model and a fixed dataset, then some metrics are not changed when you change the training machine and some are changed. For instance, the loss and accuracy regression curves are fixed, but the speed and memory cost will be changed if you use the different version of CUDA and GPU. Our numbers onPascal architecture GPU
make no sense on other version GPUs. -
The mini batches size difference
For the online learning job or saving training resource, they need small batch size in training models. But for the training speed, they may need big batch size. And the convergence curve is different and the training speed cannot be referenced. Will we consider the different batch size in regression test? -
The training/inference difference
Currently, most users care more about the inference performance, because online service needs to ensure the performance. The inference is totally different with the training phase, will we consider that in regression test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the factors to track can be extended, in my understanding, the initial implementation will just include some general factors such as train cost
, validate cost
, duration of each batch
, more factors can be added by more people in the late.
|
||
the log format should like this | ||
|
||
for `train_cost` and `valid_cost`, each line is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use VisualDL make a baseline, then every regression test just compare the several first mini batches results.
fix #8903