You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
There are projects such as Bertin whose focus is to train and evaluate BERT-based models for the Spanish language. Models such as roberta, gpt-2, and gpt-3 require a BPE tokenizer. However, the current Morpheus inference pipeline currently uses the cuDF BERT tokenizer.
Describe your ideal solution
Develop a new CPU tokenizer/nlp-preprocessing stage for Morpheus. An adaptation of Morpheus current phishing training.
Describe any alternatives you have considered
cuDF project has an open feature request for GPU-accelerated BPE tokenizer support but without any known roadmap commitment.
Additional context
Useful for phishing detection in non-English content
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
There are projects such as Bertin whose focus is to train and evaluate BERT-based models for the Spanish language. Models such as roberta, gpt-2, and gpt-3 require a BPE tokenizer. However, the current Morpheus inference pipeline currently uses the cuDF BERT tokenizer.
Describe your ideal solution
Develop a new CPU tokenizer/nlp-preprocessing stage for Morpheus. An adaptation of Morpheus current phishing training.
Describe any alternatives you have considered
cuDF project has an open feature request for GPU-accelerated BPE tokenizer support but without any known roadmap commitment.
Additional context
Code of Conduct
The text was updated successfully, but these errors were encountered: