-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Implement encoder_hidden_states as input in GPT2_BeamSearch Node #18050
Comments
The convert generation supports encoder-decoder models (we tested T5, Bart). See the comments in the script for example uage:
ORT also support Whisper in beam search. See https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/whisper/README.md for detail. |
Perhaps I wrote not quite clear but I need to Bert/T5/GPT2 encoder has encoder_hidden_states from VisionEncoder (image embeddings for captioning implementation) as inputs in ths line onnxruntime/onnxruntime/python/tools/transformers/convert_generation.py Lines 702 to 704 in eb47008
I guess you need to add it in onnxruntime/onnxruntime/core/graph/contrib_ops/contrib_defs.cc Lines 1155 to 1170 in eb47008
|
Describe the feature request
I try to use the convert_generation.py script to create a GPT2 code generation model with beam search with encoder_hidden_states (timesformer output) as input (my base model is Neleac/timesformer-gpt2-video-captioning), but there's no such flags in scripts or node input in graph. So GPT2 coverting as separate model without link to timesformer output.
So I was wondering if there are any plans to implement this option. I've tried manually manipulating the graph and script to no avail.
Describe scenario use case
Usage of Encoder-Decoder (such as SpeechEncoderDecoderModel or VisionEncoderDecoderModel from HF)
The text was updated successfully, but these errors were encountered: