We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When building the engine with the '--gather_all_token_logits' option, there seems to be an issue.
If constructed with '--gather_all_token_logits', there is a high probability of garbled characters appearing in the first token.
However, if built without '--gather_all_token_logits' while keeping other parameters consistent, the first token appears normal.
Build Command:
python3 build.py --model_dir=/path/to/llama-7b-hf/ \ --dtype bfloat16 \ --use_gpt_attention_plugin bfloat16 \ --use_gemm_plugin bfloat16 \ --output_dir /path/to/llama-7b-trt/0.6.1-cf-pe1-gatl-mb-bf16-8_gpu-8k-2k-bs4 \ --world_size 8 \ --tp_size 8 \ --max_input_len 8192 \ --max_output_len 2048 \ --max_batch_size 4 \ --remove_input_padding \ --enable_context_fmha \ --parallel_build \ --multi_block_mode \ --gather_all_token_logits \ --use_parallel_embedding \ --embedding_sharding_dim 1
python3 build.py --model_dir=/path/to/llama-7b-hf/ \ --dtype bfloat16 \ --use_gpt_attention_plugin bfloat16 \ --use_gemm_plugin bfloat16 \ --output_dir /path/to/llama-7b-trt/0.6.1-cf-pe1-mb-bf16-8_gpu-8k-2k-bs4 \ --world_size 8 \ --tp_size 8 \ --max_input_len 8192 \ --max_output_len 2048 \ --max_batch_size 4 \ --remove_input_padding \ --enable_context_fmha \ --parallel_build \ --multi_block_mode \ --use_parallel_embedding \ --embedding_sharding_dim 1
Worth noting is that this issue has been tested in previous versions as well as in version 0.6.1.
The text was updated successfully, but these errors were encountered:
It should be a bug of 0.6.1, and should be fixed in latest main branch. Please take a try.
Sorry, something went wrong.
In testing with the new version, everything is fine. Thank you.
@StarrickLiu, wondering if you successfully got the logtis. Is the logits for outputed tokens or all token for vocab?
byshiue
No branches or pull requests
Problem Desciption:
When building the engine with the '--gather_all_token_logits' option, there seems to be an issue.
If constructed with '--gather_all_token_logits', there is a high probability of garbled characters appearing in the first token.
However, if built without '--gather_all_token_logits' while keeping other parameters consistent, the first token appears normal.
llama1 7B:
With --gather_all_token_logits'
Build Command:
Without --gather_all_token_logits'
Build Command:
Worth noting is that this issue has been tested in previous versions as well as in version 0.6.1.
The text was updated successfully, but these errors were encountered: