-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized inference pipeline for Nano #4360
Comments
talked with @zhentaocc Keep updating according to latest comments. Trainer.compileWe don't need this method for inference API. trainer.quantize / trainer.traceQuantize/trace a model by a specific precision and accelerator and return a new model which only handles this specific accelerated inference model. # for bf16 or int8 low precision models
new_model = trainer.quantize(model,
precision="bf16"/"int8",
accelerator=None(pytorch)/"onnxruntime"/"openvino",
method="eager"/"fx"/"ipex"/"qlinear"...,
backend="inc"/"pot",
**kargs_inc,
**kargs_pot)
# for fp32 models backended on "onnxruntime" or "openvino"
new_model = trainer.trace(model,
accelerator="onnxruntime"/"openvino",
**kargs_accelerator) A normal user should take care of:
An expert user should take care of:
model.evalUsers don't need to call this method anymore but it does no harm to call it. model.statusa @Property in model return a dict to show which precision and which accelearator our users are using. >>> model.status
>>> {"precision": xxx, "accelerator": xxx} |
keep updating according to latest comments model.train()This function should not be called on a model.inference() -> trainer.inference()After our discussion, this method will be deleted since it is highly similar to another method trainer.savetrainer.save(model, dirname=…) This function will return a dictionary that indicates the saved path. So that our users will understand which file they can take away for further deployment. trainer.loadtrainer.load(model, dirname=…) same as above |
Discussed with @shane-huang earlier and we proposed another design to separate the quantized models/inference sessions from This design features a different behavior in In short, these two methods (trainer methods) will return a new model whose # for bf16 or int8 low precision models
new_model = trainer.quantize(model,
precision="bf16"/"int8",
accelerator=None(pytorch)/"onnxruntime"/"openvino",
method="eager"/"fx"/"ipex"/"qlinear"...,
backend="inc"/"pot",
**kargs_inc,
**kargs_pot)
# for fp32 models backended on "onnxruntime" or "openvino"
new_model = trainer.trace(model,
accelerator="onnxruntime"/"openvino",
**kargs_accelerator)
yhat = new_model(x) # x is a torch tensor Consequentially, some other methods will be changed. # .eval() is only used to change the accelerator's setting
new_model.eval(**kargs_accelerator)
# trianer.save will save the model's state and a meta data file to identify what precision and accelerator is used
trainer.save(new_model, dirname="...")
new_model = trainer.load(dirname="...") |
|
Yes
Yes
There are some cases a user might want to call .eval(), especially when they want to change the accelerator's(e.g. openvino/onnxruntime) option. They may call
Yes
>>> trainer.save(model, dirname=".")
>>> {"meta_data_path": "./model.meta",
"onnx_file_path": "./model.onnx"} |
Do we really need to support this? The user can always call |
Exactly, we (@zhentaocc ) talked about this and we agree that users can always call |
Openvino new api will be implemented in #4381. |
Can we rename We are also planning to add the new ipex logic here, since in the new ipex version, the usage changes to "model = ipex.optimize(model)". e.g. model = Trainer.optimize(model, accelerator='ipex') For accelerator ipex, the model can still be used for training; for accelerator onnx/openvino, the model can only be used for inference. |
We plan to use |
Where do you suggest ipex should go? |
Current API for model saving and loading. It can do:
Do you think this is a bit confusing for users accepting multiple types of models? @jason-dai
|
If it's for training, can we set it in |
In theory, we can override pytorch-lightning Trainer's fit method to add a "use_ipex" flag, but I am kind of afraid the usage changes too much from the original pytorch-lighting and make it too complex. For original pytorch-lightning, the user sets all the parameters in Trainer's constructor and only passes model and data in trainer = Trainer(accelerator='a', training_type='b', trick_1=True, trick-2=True, ...)
trainer.fit(model, data) What we have changed is:
So the problem I am afraid is that if we change For example, if the user wants to use some feature, there will be 4 possible places for him/her to look for a parameter. On the other hand, in the original pytorch-lightning case, he/she only has to look through the Trainer's constructor. |
I think we are adding new capabilities to
If a use case is already supported by PTL, we should follow its original API; so for training specific optimization, which one is the preferred API to extend - |
For IPEX plugin, I think |
Who needs to write and specify the callback - we or the user? |
We do this. |
It seems to me that |
possibly you can bind the ipex model to |
It's unclear to me how the user will write their code to use IPEX? |
Current status
Trainer.compile(…, onnx=T/F, quantize=T/F, openvino=T/F)
- bind relevant methods/variablesTrainer.quantize(…)
- generate quantized model (PyTorch/ONNX)Model.eval(quantize=T/F)
- forward using (quantized) PyTorch modelModel.eval_onnx(quantize=T/F)/eval_openvino()/exit_onnx()/exit_openvino()
- forward using (quantized) ONNX/OpenVINO modelDesired status
Trainer.compile()
– just bind all methods/variables?Trainer.quantize(precision=…, accelerator=…)
model.eval(precision=…, accelerator=…)
? – need to callquantize()
first?Trainer.openvino.export(precision=…)
? – how about onnx/quantized? need to be consistentmodel.load()/model.load_quantized_state_dict()
??? - need to have consistent APIsmodel.eval_status()
? – every model should maintain current/default mode, and report here?@TheaperDeng @zhentaocc @yangw1234 @shane-huang
The text was updated successfully, but these errors were encountered: