-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence_Batching #533
Comments
Could you show the code related to |
the code is the config.txt file here https://github.com/k2-fsa/sherpa/blob/master/triton/model_repo_streaming/feature_extractor/config.pbtxt.template |
@yuekaizhang Could you have a look at this issue? |
We didn't tune the max_candidate_sequences here, it's just a random choice. Could you please explain why direct() would be much faster? We didn't try direct() yet. It would be great if direct() could speed up. @rizwanishaq |
@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way? |
@rizwanishaq Would you mind claring the questions? The max_sequence_idle_microseconds means: if "max_sequence_idle_microsseconds" is exceeded, the inference server will free the sequence slot allocated by the sequence by just discarded it. That would be great if direct() could be better. I would appreciate it if you have some spare time to attach some perf results between direct() and oldest() similar like this #306 (comment). That would be useful for us. |
I am checking with sequence_batching, and
sequence_batching{
max_sequence_idle_microseconds: 5000000
oldest {
max_candidate_sequences: 1024
max_queue_delay_microseconds: 5000
}
why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??
The text was updated successfully, but these errors were encountered: