-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for complete conversion from pretrained huggingface model #611
Comments
@yuekaizhang Could you have a look at this issue? |
Let me share my build script for trt-llm.
|
@lionsheep24 #597 (comment), check this. You may need to align the prompt, beam_size, and other hyper-parameters to get the same outputs. There are several succuss integration of whisper trt-llm you may refer e.g. https://github.com/Wordcab/wordcab-transcribe/tree/main/src/wordcab_transcribe/engines/tensorrt_llm. Your export steps also look good to me. |
@yuekaizhang Same decoding results from different audio features, you mean? There were some values of -0.74171734 in hf-way but corresponding value of openai-way were 0. I switched compute_feature function to hf WhisperFeatureExtractor but tokenizer throws I reviewed link you shared but It seems to be similar with current repo. I'm not sure how transcription results can be same even though extracted features are different. |
Hi all! any updates here? I am curious about why the audio features extracted from the same audio array differ when using the Huggingface library compared to the method provided in this repository. Additionally, I want to confirm if it is correct for the values to be different. In my opinion, even if the model is converted, the input audio features should be same. When I input the features extracted using the Huggingface library into the TensorRT-LLM engine, I received a -1 token(which is different from Huggingface pipeline result), which seems to have caused an error during decoding. Feel free to let me know if you need any further adjustments or additional information included! |
Theoretically, the minor difference of feature values would not have a effect on the transcript results. We actually support huggingface distill whisper in tensorrt-llm, which uses the huggingface feature extractor to train. However, it could work with our feature extractor in inference. You may try replace the feature extractor if you think that is the root cause. |
Yeah I calculated differences of features from huggingface and tensorrt-llm example and the absolute difference was up to 0.74. I think it's not a minor difference. I tried to replace feature extractor to huggingface and feed feature to tensorrt-llm but I got -1 token from engine, as I mentioned earlier. |
Hello,
I have pretrained a model with huggingface and attempted to deploy it using the TRTLLM-Triton Server method as documented here. However, I've noticed that the transcription results differ significantly from the original model's performance when using the Transformer pipeline.
Upon further investigation, I compared the mel spectrograms and the decoding results between the TRT-LLM implementation and the original pipeline. Both comparisons showed noticeable differences, leading to degraded transcription accuracy in the TRT-LLM implementation. In some cases, it even returned a blank string.
Let me share my pipeline implementation
TRT-LLM implementation is same with the link , which I mentioned earlier, and the engine has built by below script. (trtllm version is
0.11.0.dev2024060400
)Client code for tensorrt-llm + tritonserver
Could anyone help me understand why these discrepancies are occurring and how to resolve them?
Thank you in advance for your assistance.
The text was updated successfully, but these errors were encountered: