How to use Bert or Bart or Roberta or GPT for translation #1599

allhelllooz · 2020-01-07T11:24:05Z

I am running various architectures mentioned here in --arch option to benchmark
and I am using workpiece-tokenizer externally before pre-process step.

I was able to run following transformer based architectures with following command and able to inference as well.
transformer, transformer_iwslt_de_en, transformer_wmt_en_de, transformer_vaswani_wmt_en_de_big, transformer_vaswani_wmt_en_fr_big, transformer_wmt_en_de_big, transformer_wmt_en_de_big_t2t

Command I use ->
CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train /home/translation_task/mr2en_token_data --arch transformer_wmt_en_de --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9,0.98)' --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 10000 --dropout 0.3 --weight-decay 0.0001 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 4096 --update-freq 2 --max-source-positions 512 --max-target-positions 512 --skip-invalid-size-inputs-valid-test

Now I am trying to run following and not able to. Can someone suggest about the same.

bert_base, roberta_base, xlm_base (after removing --share-decoder-input-output-embed)

Traceback (most recent call last):
  File "/home/dh/anaconda3/bin/fairseq-train", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
  File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 354, in cli_main
    nprocs=args.distributed_world_size,
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 321, in distributed_main
    main(args, init_distributed=True)
  File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 51, in main
    model = task.build_model(args)
  File "/home/dh/swapnil/fairseq/fairseq/tasks/fairseq_task.py", line 185, in build_model
    return models.build_model(args, self)
  File "/home/dh/swapnil/fairseq/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/home/dh/swapnil/fairseq/fairseq/models/masked_lm.py", line 116, in build_model
    args.max_positions = args.tokens_per_sample
AttributeError: 'Namespace' object has no attribute 'tokens_per_sample'

bart_large, transformer_lm_gpt2_small

Traceback (most recent call last):                                                                                                        
  File "/home/dh/anaconda3/bin/fairseq-train", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
  File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 354, in cli_main
    nprocs=args.distributed_world_size,
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 321, in distributed_main
    main(args, init_distributed=True)
  File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 89, in main
    train(args, trainer, task, epoch_itr)
  File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 152, in train
    log_output = trainer.train_step(samples)
  File "/home/dh/swapnil/fairseq/fairseq/trainer.py", line 327, in train_step
    sample, self.model, self.criterion, self.optimizer, ignore_grad
  File "/home/dh/swapnil/fairseq/fairseq/tasks/fairseq_task.py", line 251, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dh/swapnil/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 57, in forward
    net_output = model(**sample['net_input'])
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dh/swapnil/fairseq/fairseq/models/fairseq_model.py", line 385, in forward
    return self.decoder(src_tokens, **kwargs)
  File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() got multiple values for argument 'prev_output_tokens'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/dh/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/home/dh/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated

Environment?

fairseq Version : '0.9.0'
PyTorch Version : '1.3.1'
OS (e.g., Linux): Ubuntu 16.04
How you installed fairseq (pip, source): pip
Python version: 3.7 anaconda
CUDA/cuDNN version: 10.0
GPU models and configuration: v100 x 4 (32 Gb)

The text was updated successfully, but these errors were encountered:

huihuifan · 2020-01-08T11:48:41Z

you have preprocessed data with source and target to run machine translation architectures, however roberta requires different data preprocessing. You can check examples/ folder for tutorials on how to preprocess data for roberta

allhelllooz · 2020-01-08T12:11:27Z

I want to run the translations only. And I wondering how to use roberta or bart for the same.

So once the preprocessing is done ... would there be changes in above command for roberta ?
Are you refering to this -> https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md

Also, what would be preprocessing steps for Bart ?
Couldn't find anything here -> https://github.com/pytorch/fairseq/tree/master/examples/bart

ButteredGroove · 2020-01-14T19:17:33Z

I want to run the translations only. And I wondering how to use roberta or bart for the same.

I'm interested in this too. Are there tutorials for using RoBERTa (as embeddings?) or BART pre-trained models along with train, test, and eval files to train and evaluate a model to do seq2seq translations?

mukhal · 2020-01-14T21:56:04Z

I am running into the same issues trying to use pre-trained XLM-R for translation. I think the main problem is that Roberta and XLM-R are encoder-only architectures. I think the solution is to use XLM-R or Roberta as an encoder for feature-extraction with a newly initialized decoder.

allhelllooz · 2020-01-15T04:40:29Z

BART is supposed to be built for translation according to its paper. And for Roberta it looks like a slim chance to use it for translation as it is not seq-2-seq.
@huihuifan I was able to run Roberta as its own not for translation. What would you suggest for BART.

ButteredGroove · 2020-01-15T18:41:55Z

What about using RoBERTa to generate embeddings for words to train seq2seq models?

tuhinjubcse · 2020-01-17T05:24:26Z

What was your gpu configuration for BART ?
https://github.com/pytorch/fairseq/blob/master/examples/bart/README.cnn.md
did you guy follow this configuration ?

ButteredGroove · 2020-01-17T23:17:04Z

What was your gpu configuration for BART ?
https://github.com/pytorch/fairseq/blob/master/examples/bart/README.cnn.md
did you guy follow this configuration ?

I tried that demo, but step 2) tries to process train.source, val.source, train.target, and val.target. No files with those names exist in the CNN-Dailymail files in step 1). However, in the CNN-Dailymail subdirectory finished_files there are train.bin and val.bin. Looks like the demo is missing a step?

yuchenlin · 2020-02-05T10:35:50Z

same problem here.. the example of using BART is so unclear. it would be much better if we can see some toy examples of using BART with simple input/output format for seq2seq tasks

loganlebanoff · 2020-02-11T19:32:14Z

To do the BART preprocessing, you have to look here #1391, specifically zhaoguangxiang's comment

Summary: …hod` to avoid unbounded local error. # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Adding initialization for `num_pipelines_per_node` in `infer_init_method` in `distributed/utils.py` ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#1599 Reviewed By: myleott Differential Revision: D26208044 Pulled By: girifb fbshipit-source-id: 98d3c0b70b59a5e0abb027850baa3bc44d9c3c78

…1599) Summary: …hod` to avoid unbounded local error. # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Adding initialization for `num_pipelines_per_node` in `infer_init_method` in `distributed/utils.py` ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1599 Reviewed By: myleott Differential Revision: D26208044 Pulled By: girifb fbshipit-source-id: 98d3c0b70b59a5e0abb027850baa3bc44d9c3c78

Summary: …hod` to avoid unbounded local error. # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Adding initialization for `num_pipelines_per_node` in `infer_init_method` in `distributed/utils.py` ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#1599 Reviewed By: myleott Differential Revision: D26208044 Pulled By: girifb fbshipit-source-id: 98d3c0b70b59a5e0abb027850baa3bc44d9c3c78

stale · 2022-04-17T20:20:35Z

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale · 2022-04-28T00:21:43Z

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

karin0018 · 2022-05-12T13:03:20Z

same problem here.. the example of using BART is so unclear. it would be much better if we can see some toy examples of using BART with simple input/output format for seq2seq tasks

couldn't agree more...

allhelllooz added needs triage question labels Jan 7, 2020

huihuifan removed the needs triage label Jan 8, 2020

stale bot added the stale label Apr 17, 2022

stale bot closed this as completed Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use Bert or Bart or Roberta or GPT for translation #1599

How to use Bert or Bart or Roberta or GPT for translation #1599

allhelllooz commented Jan 7, 2020

huihuifan commented Jan 8, 2020

allhelllooz commented Jan 8, 2020 •

edited

Loading

ButteredGroove commented Jan 14, 2020 •

edited

Loading

mukhal commented Jan 14, 2020 •

edited

Loading

allhelllooz commented Jan 15, 2020

ButteredGroove commented Jan 15, 2020

tuhinjubcse commented Jan 17, 2020

ButteredGroove commented Jan 17, 2020

yuchenlin commented Feb 5, 2020 •

edited

Loading

loganlebanoff commented Feb 11, 2020

stale bot commented Apr 17, 2022

stale bot commented Apr 28, 2022

karin0018 commented May 12, 2022

How to use Bert or Bart or Roberta or GPT for translation #1599

How to use Bert or Bart or Roberta or GPT for translation #1599

Comments

allhelllooz commented Jan 7, 2020

Environment?

huihuifan commented Jan 8, 2020

allhelllooz commented Jan 8, 2020 • edited Loading

ButteredGroove commented Jan 14, 2020 • edited Loading

mukhal commented Jan 14, 2020 • edited Loading

allhelllooz commented Jan 15, 2020

ButteredGroove commented Jan 15, 2020

tuhinjubcse commented Jan 17, 2020

ButteredGroove commented Jan 17, 2020

yuchenlin commented Feb 5, 2020 • edited Loading

loganlebanoff commented Feb 11, 2020

stale bot commented Apr 17, 2022

stale bot commented Apr 28, 2022

karin0018 commented May 12, 2022

allhelllooz commented Jan 8, 2020 •

edited

Loading

ButteredGroove commented Jan 14, 2020 •

edited

Loading

mukhal commented Jan 14, 2020 •

edited

Loading

yuchenlin commented Feb 5, 2020 •

edited

Loading