-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to use the IWSLT2016 dataset #72
Comments
You can find some simple usages in the toy example. Basically, using Transformers
using Transformers.Datasets # utilities for dataset
using Transformers.Datasets: IWSLT # IWSLT datasets
# available language for iwslt2016: :en, :cs, :ar, :fr, :de
src_lang = :en
dst_lang = :de
iwslt2016 = IWSLT.IWSLT2016(src_lang, dst_lang) # Create dataset
# get vocabulary from training data
vocab = get_vocab(iwslt2016)
# create dataset object
# each one is a 2-tuple of channels containing src sentence and dst sentence
training_set = dataset(Train, iwslt2016)
dev_set = dataset(Dev, iwslt2016)
test_set = dataset(Test, iwslt2016) # usually test set won't contain ground truth, but iwslt2016 somehow does
# get datas
batch_size = 1
src_sent, dst_sent = get_batch(training_set, batch_size) # each one is a vector of sentences Once you run through all the data, |
Above example fails with following error message:
Looking at the website of IWSLT, it seems that the datasets moved to Google Drive instead. |
Looks like they no longer provide file links for specific translation pair, we would need to rewrite the datadeps base on that |
I thought I could fix this quickly by changing the download link and adapt the post_fetch_method to search for the translation pairs in the right subfolder, but it seems like DataDeps.jl doesn't support downloading from GoogleDrive (or maybe I did something wrong). |
Hi - I want to play around with some language translation tasks and saw that you've got
Transformers.Datasets.IWSLT.IWSLT2016
. How do I interact with this to get data that I can train a model on? I couldn't find anything in the documentation to help me out.The text was updated successfully, but these errors were encountered: