Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
examples/qwen/README.md
.examples/phi/README.md
.examples/gpt/README.md
.numQueuedRequests
to the iteration stats log of the executor API.concurrency
argument forgptManagerBenchmark
.attention_qk_half_accumulation
knob fromtrtllm-build
command.max_num_tokens
knob to theExecutorConfig
andgptManagerBenchmark
.max_seq_len
is read from the HuggingFace mode config now.examples/high-level-api/README.md
.apps
examples using theLLM
APIs, please refer to theexamples/apps/READEME.md
for details.ModelRunner
[ModelRunner] Fix stop and bad words list contiguous for offsets #1815, thanks to the contribution from @Marks101.FAST_BUILD
, thanks to the support from @lkm2835 in Add FAST_BUILD comment at #endif #1851.