You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
case nvinfer1::DataType::kFLOAT: return std::make_unique<GptDecoder<float>>(vocabSize, vocabSizePadded, stream);
case nvinfer1::DataType::kHALF: return std::make_unique<GptDecoder<half>>(vocabSize, vocabSizePadded, stream);
default: returnnullptr;
}
Also I noticed that several components like decoder, dynamic decoder, topk etc all haven't instantiated bfloat16 template. Is this left there for some reason or did I miss something? thanks
Hi @elinx , thank you for reporting the issue. Indeed, we do not support decoder with bfloat16. Usually, we force lm_head and logits dtype to be either float32 or float16 and use the respective decoder type. However, this cast is missing in baichuan model definition. We'll include proper fix to the next release/update to the repo. Meanwhile, you can solve this issue by changing this line: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/baichuan/model.py#L271
to lm_logits.mark_output('logits', 'float32'), i.e. forcing your logits and decoder to be float32 type. I hope it'll help.
I was running bfloat16 baichuan-7b model for benchmark and run into segment falut:
after digging into the code, I found it's because the gpt decoder not created sussefullly if the dtype is bfloat16:
TensorRT-LLM/cpp/include/tensorrt_llm/runtime/gptDecoder.h
Lines 85 to 90 in 71a5b97
Also I noticed that several components like decoder, dynamic decoder, topk etc all haven't instantiated bfloat16 template. Is this left there for some reason or did I miss something? thanks
TensorRT-LLM/cpp/tensorrt_llm/runtime/gptDecoder.cpp
Lines 264 to 268 in 71a5b97
The text was updated successfully, but these errors were encountered: