You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to fine tune the opus nt ar-en model using my own dataset, but I'm not sure what type of files my training data should be in? In the huggingface Marian tutorial (https://huggingface.co/docs/transformers/model_doc/marian) they just pass in lists of sentences, but I also read somewhere that I'm supposed to preprocess the data with Sentencepiece first. Or is sentencepiece "built in" into the arian tokenizer? All help is much appreciated.
The text was updated successfully, but these errors were encountered:
theamato
changed the title
Fine tuning opus nt ar-en using my own dataset
Fine tuning opus nmt ar-en using my own dataset
Feb 20, 2023
I do fine-tuning directly with MarianNMT. Maybe you could ask at the transformers git repository how to do finetuning with their library? If you use OPUS-MT models and marian-nmt then you would need the subword tokenisation on the fine-tuning data as well.
Hi,
I want to fine tune the opus nt ar-en model using my own dataset, but I'm not sure what type of files my training data should be in? In the huggingface Marian tutorial (https://huggingface.co/docs/transformers/model_doc/marian) they just pass in lists of sentences, but I also read somewhere that I'm supposed to preprocess the data with Sentencepiece first. Or is sentencepiece "built in" into the arian tokenizer? All help is much appreciated.
The text was updated successfully, but these errors were encountered: