You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, we're trying out build and inference for BERT models (specifically, BertForSequenceClassification). We would have request batch in which samples can vary largely on seq_len, and as the document says, the --remove_input_padding option which packs samples into 1D tensor without padding should be beneficial on performance.
However, we didn't found this parameter in examples/bert/build.py, and the engine built with the script seems to have "remove_input_padding": False in config.json. Also, we didn't see any implementation details about this in tensorrt_llm/models/bert/model.py, while the enc-dec model has. Is there a plan on supporting this feature for BERT models? Or are we missing the possible way?
The text was updated successfully, but these errors were encountered:
We made our implementation referring to enc_dec models. For anyone who may concern, basically things you need to modify are: (1) inputs, refer to enc_dec implementation; (2) set remove_input_padding to true in plugin_config; (3) the forward of the final pooler layer in BertForSequenceClassification to make it select first token in each sample, according to input_lengths. Closing this.
@QiJune hi, I made a PR here: #1834. Remind that this is only implemented and tested for BertForSequenceClassification models, and maybe the official team can work further on this :)
Hi, we're trying out build and inference for BERT models (specifically, BertForSequenceClassification). We would have request batch in which samples can vary largely on seq_len, and as the document says, the
--remove_input_padding
option which packs samples into 1D tensor without padding should be beneficial on performance.However, we didn't found this parameter in
examples/bert/build.py
, and the engine built with the script seems to have"remove_input_padding": False
in config.json. Also, we didn't see any implementation details about this intensorrt_llm/models/bert/model.py
, while the enc-dec model has. Is there a plan on supporting this feature for BERT models? Or are we missing the possible way?The text was updated successfully, but these errors were encountered: